MySQL查询与MATCH和AGAINST挂起

I have multiple CURL scripts that collect data from many internal company sites into a shared database. Each script is called an "instance". Data is added to the database in the form of records. Each record has multiple "fields", which are key->value pair. As the keys for each record are dynamic and can be anything (even on the same instance) they are not hard-coded into the MySQL table.

So there are these tables:

records - contain a list of records, each associated with an instance
record_fields - contains a list of fields associated with a record
record_fields_labels - basically a list of labels. This is stored in the database to save space (i.e. instead of the record_fields have thousands of fields that have the label "Article Date", they will all have the number 8, which is the ID of the record on record_labels that has "Article Date" as its value).

record_fields and record_fields_labels are both MyISAM tables with a FULLTEXT index on "content" (the column in record_fields that contains the actual data) and "label" (the column in record_fields_labels that has the label name).

The database has millions of records - for each multiply the number of record fields... When the instances run, to check whether a record already exists in a database, they do the following SQL query:

SELECT r.id FROM records r INNER JOIN record_fields rf ON rf.record_id=r.id INNER JOIN record_fields_labels as rfl ON rf.label=rfl.id WHERE r.instance IN (120) AND MATCH (rf.content) AGAINST ('"http://xxxx.xxxx/xxx.xxx.xxx"' IN BOOLEAN MODE) AND MATCH (rfl.label) AGAINST ('"Article URL"' IN BOOLEAN MODE) GROUP BY r.id

In this example, http://xxxx.xxxx/xxx.xxx.xxx is the URL of an article a script would check if it already exists in the system.

TL;DR

The problem is this: when the database is huge (i.e. millions of records/record fields) - the above query simply hangs up. The query would run, even for hours, for no apparent reason. This same query is used to search for items in the collected data and that seems to work (or worked until recently).

All I want is to make it show whether such a record exists or not. It doesn't seem to be an indexing issue, but something specifically to do with MATCH AGAINST. I prefer to avoid having an additional index of all the content (in addition to the FULL TEXT index) to save space.

Does anybody know what causes this hang-up issue?

Thanks

Looks like you are using a FULL TEXT index where you don't have to, particularly for your labels. If these are simple and well defined a normal index will be fine. If you need to differentiate between "Article Date" and "Blog Date" for example use one field for content type and one for data type.

When you search for a phrase using MATCH AGAINST ... IN BOOLEAN MODE you are actually searching for the same words in the same order, not the complete string.. see DOCS

Searching for "http://xxxx.yyy/www.zzz.mmm" in your field content will actually match "some content here http, xxxx. yyy www! zzz mmm? yes please, more content" and that's assuming your full text minimum word length is 3 or less. For performance and logic, this is not the correct index to use.

I would seriously consider changing your data structure so that you aren't putting FULL TEXT indexes on urls and labels. This will probably save you plenty more space than avoiding using normal indexes.