自动完成SQL查询建议(Ajax + PHP)

I have a question regarding SQL best practices when formulating a query for use in an Autocomplete form (jquery Axax + PHP).

Let us assume the following:

  • I have a database with the titles of books
  • Some books have titles without a definite article ("The" or "A") such as "Life of Pi"
  • Some books have titles with a definite article ("The" or "A") such as "The Catcher in the Rye"

As a result, users will input the title of the book either using "The" at the beginning or simply omitting the "The" and start the query without any definite article.

Three possible queries seem to exist:

SELECT 'title' FROM 'books' WHERE 'title' LIKE '%$string'

or

SELECT 'title' FROM 'books' WHERE 'title' LIKE '$string%'

or

SELECT 'title' FROM 'books' WHERE 'title' LIKE '%$string%'

When using the first query method (where the % is before the string), it is difficult to get any results, since the wildcard before the string seems to behave erroneously.

When using the second query, it seems to favor exact matches using "The" before a title. Thus, a user searching for "The Catcher in the Rye" will find the book, but a user searching for "Catcher in the Rye" will not.

The last result is the best one, since it has a wildcard before and after the string. However, it also gives the longest auto-complete list. The user will have to type a few letters to narrow down the search result.

Any ideas on implementing a more efficient query? Or is the third option the best one (seeing as it is not feasible to separate the definite article in the title of a book?

Thanks in advance,

$query = mysqi_query("SELECT title FROM books WHERE title REGEXP '$string'");
if($query->num_rows() == 0) {

     //First remove all the stop words like for, the, of, a from the search string.
     $stopWords = array('/\bfor\b/i', '/\bthe\b/i', '/\bto\b/i', '/\bof\b/i','/\ba\b/i');
     $string = preg_replace($stopWords, "", $string);

     //Then, use 
     mysqli_query("SELECT title FROM books WHERE title REGEXP '$string'");
}

If you're worried about the quantity of suggestions, can you modify the change event to only retrieve suggestions after they have typed some minimum number of characters in the field?

I would suggest using the third method with wildcards on either side of the string. If you are worried about the size of the returned result set, perhaps limit the results to a certain number, and as the user types the list will naturally get smaller and more specific.

you may also consider allowing searches for 'Catcher Rye' that should still match.

in this case - you would tokenize each word in the title as well as the words entered by the user and find the best matches.

otherwise only autocomplete after say 4 or more characters have been entered, and use option 3.

You can do a search using Regular Expressions (query result comes quickly) and do not forget to add limitation to your results.

a small example

SELECT title FROM books WHERE title REGEXP '$string' LIMIT 20

or you can use word boundaries

SELECT title FROM books WHERE title REGEXP '[[:<:]]$string[[:>:]]' LIMIT 20

see the documents http://dev.mysql.com/doc/refman/5.5/en/regexp.html