高效的搜索查询

I have a table in a database with a structure like this Keywords

  • id int(11)
  • U_id int(11)
  • keywords text
  • create_date int(11)

U_id is a foreign key, id is the primary key

The keywords field is a list of words created by users separated by commas I was wondering if someone could suggest an efficient query to search such a table.

If you using MyISAM, you can create a fulltext index on field keywords. Then search using:

select * from keywords k where match('test') against(k.keywords);

Of course CSV in a database is just about the worst thing you can do. You should put keywords in a separate table. Make sure to use InnoDB for all tables.

Table tags
-------------
id integer auto_increment primary key
keyword_id integer foreign key references keywords(id)
keyword varchar(40)

Now you can select using:

SELECT k.* FROM keywords k
INNER JOIN tags t ON (t.keyword_id = k.id)
WHERE t.keyword LIKE 'test' //case insensitive comparison.

Much much faster than CSV.

You should change your database design so that you have a table called user_keyword and store each keyword in a separate row. You can then index this table and search it easily and efficiently:

WHERE keyword = 'foo'

If you can't modify the database then you can use FIND_IN_SET but it won't be very efficient:

WHERE FIND_IN_SET('foo', keywords) 

You have 2 options :

  1. re-structor your database, create an extra table called Keywords, and that should include a U_id which will be a foreign key mapped to your user table, and that way you can easily insert each keyword into the Keywords table and then search it using something :

SELECT * FROM Keywords WHERE keyword LIKE %KEYWORD%

  1. you can get the keywords field, seperate the keywords and put them into an array using your preferred language and then search the array.

Separate keywords in its own table, "connect" it to the old table via FOREIGN KEY, index it and you'll be able to search for exact keywords of keyword prefixes efficiently.

For example:

id U_id keywords create_date
1  -    A,B,C    -

Becomes:

PARENT_TABLE:
id U_id create_date
1  -    -

CHILD_TABLE:
id keyword
1  A
1  B
1  C

Provided there is an index on keyword, the following query should be efficient:

SELECT * FROM PARENT_TABLE
WHERE id IN (SELECT id FROM CHILD_TABLE WHERE keyword = ...)

---EDIT---

Based on Johan's comments below, it appears that InnoDB uses what is known as "index-organized tables" under Oracle or "clusters" under most other databases. Provided you don't need to query "from parent to child" (i.e. "give me all keywords for given id"), the PRIMARY KEY on CHILD_TABLE should be:

{keyword, id}

Since the keyword is the first field in the composite index, WHERE keyword = ... (or WHERE keyword LIKE 'prefix%') can use this index directly.