SQL匹配所有关键字的开头

I will set up a scenario to best describe what I am trying to accomplish.

There is an autocomplete field. The autocomplete is for TV shows. The user inputs "The Wal" hoping to find "The Walking Dead".

The database:

CREATE TABLE `shows` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(250) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `tags` (
  `tag` varchar(50) NOT NULL DEFAULT '',
  `sid` int(10) unsigned NOT NULL,
  KEY `sid` (`sid`),
  KEY `alphabetizer` (`tag`),
  CONSTRAINT `tags_ibfk_1` FOREIGN KEY (`sid`) REFERENCES `shows` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Where shows is the table for all of the TV shows and tags is the table for all of the tags tied to each TV show.

Each word in each show title is inserted as its own lowercase tag in the tags table.

In the shows table:

  • (id: 1) (name: The Walking Dead)
  • (id: 2) (name: The Wandering Penguin)

In the tags table:

  • (sid: 1) (tag: the)
  • (sid: 1) (tag: walking)
  • (sid: 1) (tag: dead)
  • (sid: 2) (tag: the)
  • (sid: 2) (tag: wandering)
  • (sid: 2) (tag: penguin)

Goal: User inputs, "The Wal", user gets: "The Walking Dead". The query should return all results that meet the criteria, not just one. So if "The Walking Alive" were also a show with the corresponding tags, it should appear as well.

My issue: User inputs, "The Wal", user gets both shows. This is due to the OR clauses for the LIKE statement. I don't know how to fix this after trying for 2 days.

My current query:

SELECT name
    FROM shows s
    JOIN tags t ON s.id = t.sid
    WHERE t.tag LIKE "The%" OR t.tag LIKE "Wal%"

One method is to use and instead of or. However, you need to use aggregation to get what you want:

SELECT name
FROM shows s JOIN
     tags t
     ON s.id = t.sid
WHERE t.tag LIKE 'The%' OR t.tag LIKE 'Wal%'
GROUP BY name
HAVING sum(t.tag LIKE 'The%') > 0 AND
       sum(t.tag LIKE 'Wal%') > 0;

However, I don't think that solves your problem, because you don't know that all keywords will match. Instead, order by the number of keywords that do match and choose the best matching one:

SELECT name
FROM shows s JOIN
     tags t
     ON s.id = t.sid
WHERE t.tag LIKE 'The%' or t.tag LIKE 'Wal%'
GROUP BY name
ORDER BY (MAX(t.tag LIKE 'The%') +
          MAX(t.tag LIKE 'Wal%')
         ) DESC
LIMIT 1;

Another way to run this query is to add an EXISTS statement for each tag. This query can take advantage of a composite index on tag(sid,tag)

SELECT name
FROM shows s 
WHERE EXISTS (
    SELECT 1 FROM tags t
    WHERE t.sid = s.id
    AND tag LIKE 'The%'
) AND EXISTS (
    SELECT 1 FROM tags t
    WHERE t.sid = s.id
    AND tag LIKE 'Wal%'
)

I don't think your approach is sound. But here is an explanation of what is happening.

When you join shows to tags you get one record back for every word that's been matched so far.

Per your example, with

1. The Walking Dead
2. The Wandering Penguin

And this query

SELECT *
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" /* note lower-case query */

You'll get results

ID  NAME                  TAG
 1  The Walking Dead      the
 1  The Walking Dead      walking
 2  The Wandering Penguin the
 2  The Wandering Penguin wandering

If you had three words in your query that matched, you'd see different results

Query: The Wa Dead

SELECT *
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" OR t.tag LIKE "dead"

would give

ID  NAME                  TAG
 1  The Walking Dead      the
 1  The Walking Dead      walking
 1  The Walking Dead      dead
 2  The Wandering Penguin the
 2  The Wandering Penguin wandering

You can eliminate the duplicates by using GROUP BY and score the results by number of words matched using COUNT(*)

SELECT s.name, COUNT(*)
FROM shows s
JOIN tags t ON s.id = t.sid
WHERE t.tag LIKE "the%" OR t.tag LIKE "wa%" OR t.tag LIKE "dead"
GROUP BY NAME
ORDER BY COUNT(*) DESC

gives

 NAME                COUNT(*)
 The Walking Dead      3
 The Wandering Penguin 2

What you should do

I think this indexing approach is probably not going to work well in the long term. Modern databases have built-in functions for this. Here is a link to MySQL's full-text index feature. This is a case where using the DB's native functionality will probably save you a major headache down the road, when your tags table reaches millions of rows and the JOIN of shows to tags becomes unmanageable.

I think the accepted answer is overcomplicated. Just add an input parameter "tags_count" and use it:

SELECT sid
  FROM tags t 
  WHERE t.tag LIKE "The%" OR t.tag LIKE "Wal%"
  GROUP BY sid
  HAVING count(distinct tag) = 2;

So we query only shows which contain all specified Tags