I will set up a scenario to best describe what I am trying to accomplish.
There is an autocomplete field. The autocomplete is for TV shows. The user inputs "The Wal" hoping to find "The Walking Dead".
The database:
CREATE TABLE `shows` (
`id` int(10) unsigned NOT NULL,
`name` varchar(250) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `tags` (
`tag` varchar(50) NOT NULL DEFAULT '',
`sid` int(10) unsigned NOT NULL,
KEY `sid` (`sid`),
KEY `alphabetizer` (`tag`),
CONSTRAINT `tags_ibfk_1` FOREIGN KEY (`sid`) REFERENCES `shows` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Where shows
is the table for all of the TV shows and tags
is the table for all of the tags tied to each TV show.
Each word in each show title is inserted as its own lowercase tag in the tags
table.
In the shows
table:
In the tags
table:
Goal: User inputs, "The Wal", user gets: "The Walking Dead". The query should return all results that meet the criteria, not just one. So if "The Walking Alive" were also a show with the corresponding tags, it should appear as well.
My issue: User inputs, "The Wal", user gets both shows. This is due to the OR clauses for the LIKE statement. I don't know how to fix this after trying for 2 days.
My current query:
SELECT name
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "The%" OR t.tag LIKE "Wal%"
One method is to use and
instead of or
. However, you need to use aggregation to get what you want:
SELECT name
FROM shows s JOIN
tags t
ON s.id = t.sid
WHERE t.tag LIKE 'The%' OR t.tag LIKE 'Wal%'
GROUP BY name
HAVING sum(t.tag LIKE 'The%') > 0 AND
sum(t.tag LIKE 'Wal%') > 0;
However, I don't think that solves your problem, because you don't know that all keywords will match. Instead, order by the number of keywords that do match and choose the best matching one:
SELECT name
FROM shows s JOIN
tags t
ON s.id = t.sid
WHERE t.tag LIKE 'The%' or t.tag LIKE 'Wal%'
GROUP BY name
ORDER BY (MAX(t.tag LIKE 'The%') +
MAX(t.tag LIKE 'Wal%')
) DESC
LIMIT 1;
Another way to run this query is to add an EXISTS
statement for each tag. This query can take advantage of a composite index on tag(sid,tag)
SELECT name
FROM shows s
WHERE EXISTS (
SELECT 1 FROM tags t
WHERE t.sid = s.id
AND tag LIKE 'The%'
) AND EXISTS (
SELECT 1 FROM tags t
WHERE t.sid = s.id
AND tag LIKE 'Wal%'
)
I don't think your approach is sound. But here is an explanation of what is happening.
When you join shows
to tags
you get one record back for every word that's been matched so far.
Per your example, with
1. The Walking Dead
2. The Wandering Penguin
And this query
SELECT *
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" /* note lower-case query */
You'll get results
ID NAME TAG
1 The Walking Dead the
1 The Walking Dead walking
2 The Wandering Penguin the
2 The Wandering Penguin wandering
If you had three words in your query that matched, you'd see different results
Query: The Wa Dead
SELECT *
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" OR t.tag LIKE "dead"
would give
ID NAME TAG
1 The Walking Dead the
1 The Walking Dead walking
1 The Walking Dead dead
2 The Wandering Penguin the
2 The Wandering Penguin wandering
You can eliminate the duplicates by using GROUP BY
and score the results by number of words matched using COUNT(*)
SELECT s.name, COUNT(*)
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" OR t.tag LIKE "dead"
GROUP BY NAME
ORDER BY COUNT(*) DESC
gives
NAME COUNT(*)
The Walking Dead 3
The Wandering Penguin 2
I think this indexing approach is probably not going to work well in the long term. Modern databases have built-in functions for this. Here is a link to MySQL's full-text index feature. This is a case where using the DB's native functionality will probably save you a major headache down the road, when your tags
table reaches millions of rows and the JOIN of shows
to tags
becomes unmanageable.
I think the accepted answer is overcomplicated. Just add an input parameter "tags_count" and use it:
SELECT sid
FROM tags t
WHERE t.tag LIKE "The%" OR t.tag LIKE "Wal%"
GROUP BY sid
HAVING count(distinct tag) = 2;
So we query only shows which contain all specified Tags