如何制作“频繁搜索”引擎？

My first idea was start storing every word in the database ignoring words with 2 or fewer characters and every time a word is repeated just add one to a counter row (say importance) to appear first on the list of frequent searches. Sounds good until you think that normally you search for various words not just one word, eg.: for big house, you may want to store big house as a frequent search, not big and house.

I'm a little bit confused about how to do this and do it right. Has anyone done something similar? What do you think about the right way to do this?

Hmm, I would create 2 tables

Searches and SearchFrequency

Searches would contain all searches, and search frequency would be a list of searches that has been repeated so it would look like so

------------------------------------------------------
frequency_id     frequency_sid     frequency_counter
------------------------------------------------------
1                3                 33
2                56                66
3                33                128
.....

Then you can do

SELECT * FROM Searches,SearchFrequency WHERE search_id = frequency_sid ORDER BY frequency_counter DESC LIMIT 30

and just update the tables so,

id = INSERT INTO Searches ....
INSERT INTO SearchFrequency (frequency_sid,frequency_counter) VALUES (id,frequency_counter + 1);

This would keep both tables updated and you can also then track individal searchs with IP,Related Searches etc etc.

You can then also set up a SearchKeywordsFrequency table so that you can explode the searches and store the individual words, and then create a many-to-many relation ship with SearchFrequency

You need to store complete keywords either in the the index or a database(I would recommend you Index like Zend_Lucene or Swish that have very flexible APIs available). Then you have to apply Proximity searching i.e. find searches where two or more keywords are within certain distance . Zend Lucene and swish have builtin methods that will give sorted results according to their rank after applying proximity search.

Zend_search_Lucene Documentation is Listed here http://framework.zend.com/manual/en/zend.search.lucene.html. Please feel free to ask if you need implementation details.

Swish is available as a separate module which can be ran through CMD and it is also available as a php extension.

Also if you want custom implementation of Proximity algorithm then you can view its wiki for details http://en.wikipedia.org/wiki/Proximity_search_%28text%29

Edited: if you are going for database solution then you can create function that applies your own implementation of Proximity searching algorithm for fetching best related searches. You should also look at mySql Full-Text Searching.

My answer does not so contain algorithmic patterns, but behavioural ones you can fish for.

Turn on some search logging for a while.(what ppl are searching for)

Record which are the successful searches, those that actually find results (results found).

You could refine this idea by capturing which resources people actually click on when they search for a term.

That gives you: What people search for and what they likely meant.

Keep it going and then refine it with temporal data: "at weekends people search for this"

This will help build a picture of how your search is being used and puts you in a position to "intercept" search terms and interject with "did you mean?" style search helpers, and on your home page "popular at this time of year" search links.

So, initially a search table to capture what is going on:

term | results_cnt | daydate | session

Then later, when there is some data in there, group the phrases, look for patterns, stub single words - but i'd say to do this properly you need to have some human input, but it all depends on the size and subject matter of your site.