超快速部分文本匹配的数据库/语言选项[关闭]

I am building a project and require a super fast way of supplying an autocomplete feed with results based on a partial text match.

I will be indexing/searching on only one field in a database, though the database row will have additional data I won't be indexing those fields. I will have approx. 25k rows.

Requirements:

  • Must match anywhere in the field (Lorem Ipsum Dolor Sit Amet would be found when starting to type "Lor", "Ipsum", "olor", "Sit Amet")
  • Needs to be extremely quick at returning results in a JSON feed (though the original source of the data doesn't matter too much)
  • Scalable solution for high traffic

I have reviewed a few options...

  • Using MongoDB like such like query in mongoDB
  • ElasticSearch - not sure if a bit overkill for what I need to do, and haven't seen any exaples of matching the partial text as above
  • SQL LIKE query, but imagine this won't be nearly fast enough?

Programming language isn't too much of an issue but Python or PHP would be preferred.

As others have mentioned, a full-text index that performs linguistic and syntactic analysis (tokenizing, stemming, case and accent-normalization, etc) will give you the best results. But this won't come without a certain amount of setup and configuration.

Check out Solr's Suggester component: http://wiki.apache.org/solr/Suggester, and there is a new one - I think it's called AnalyzingSuggester or some such, which is available with Lucene only, I think, so if you want an in-memory solution you could use that (Java only though).

This sounds like a typical full-text searching thing. Depending on your application and the database the data is in, an in-process whoosh might do what you need (Like Lucene for Java).

You're right to say that an SQL LIKE query is going to perform horribly compared to an actual full-text index. MongoDB might not be a very good fit either, though is tunable to do roughly what you suggest.