I have a search engine which scans all the words in a given web page and then shows their occurrence. Then they are ranked by the ranked by the amount of occurance the word appears in the document. But it doesn't return multiple term queries.
Below is my SQL Query. I would like to be able to have it check all the words inputted and then rank by the amount of times the words appear in the document. It is only working for single term queries at the moment.
$result = mysql_query(" SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
w.word_word = \"$keyword\"
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results" );
If you want to get all the words, then your join conditional will not allow you to do so
w.word_word = \"$keyword\"
Your query can be written as follows
$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
. "FROM page p "
. "INNER JOIN occurence o ON p.page_id = o.page_id "
. "INNER JOIN word w ON w.word_id = o.word_id "
. "GROUP BY p.page_id "
. "ORDER BY occurences DESC "
. "LIMIT {$results}";
$result = mysql_query($sql);
This will grab all the words in the word
table thus providing you with the results that (as I understand) need.
If you are interested in a few words then you can use the IN
statement (as suggested by Dev in the comments) and your query will become:
$my_keywords = array('apple', 'banana');
// This produces: "apple", "banana" and assumes that all of your
// keywords are in lower case. If not, you can transform them to lower
// case or if you don't want that, remove the LOWER() function below
// from the WHERE
$keywords = '"' . implode('","', $my_keywords) . '"';
$sql = "SELECT p.page_url as url, COUNT(*) as occurences "
. "FROM page p "
. "INNER JOIN occurence o ON p.page_id = o.page_id "
. "INNER JOIN word w ON w.word_id = o.word_id "
. "WHERE LOWER(w.word_word) IN ({$keywords}) "
. "GROUP BY p.page_id "
. "ORDER BY occurences DESC "
. "LIMIT {$results}";
$result = mysql_query($sql);
Finally, try using mysqli
instead of mysql
, or PDO.
HTH
I will go with MATCH-AGAINST which should be better for MySQL optimized search like search engines. You should view full text searcing: http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html
NOTE: in a MySQL table should be INDEX-ed as FULLTEXT of keyword row in a table of database. This would give a greater performance for searching.
Example:
Example of input keywords:
$keywords = '+Word+Word2+Word3';
SELECT p.page_url AS url,
COUNT(*) AS occurrences, MATCH('w.word_word') AGAINST ('$keywords') as keyword FROM page p, occurrence o, w.word WHERE MATCH
('w.word_word') AGAINST('{$keywords}' IN
BOOLEAN MODE)
AND p.page_id = o.page_id AND w.word_id = o.word_id
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results
In other de-optimized mode or risky for slowing performance server if your queries are not opitmized (too many groups, where clauses and conditionals). Instead of this you can use REGULAR EXPRESSION in MySQL for example:
REGEXP "/(honda)|(jazz)|(manual)/"
This will also get a good performances using regular expressions (not recommended for huge db):
Make a loop and count it than put in REGEXP:
$keywords = "keyword1,keyword2,keyword3";
$expl = explode("," $keywords);
if (count($expl) == 1)
{
$all = w.word_word REGEXP = '[[:<:]]$keywords[[:>:]]';
}
else
{
$all = '';
foreach ($expl as $keyone)
{
$all .= 'OR '.w.word_word REGEXP = '[[:<:]]$keyone[[:>:]]';
}
}
$sql = 'SELECT p.page_url AS url,
COUNT(*) AS occurrences
FROM page p, word w, occurrence o
WHERE p.page_id = o.page_id AND
w.word_id = o.word_id AND
$all
GROUP BY p.page_id
ORDER BY occurrences DESC
LIMIT $results';
$result_query = mysql_query($sql);