规范化数据库表“标签”

I have a table called tags with columns id, vid_id, name where id is a random id generated with md5(uniqid()); vid_id is the id of the video that tag is associated with, and name is the tag name. If a video has 5 tags, they all get stored in the tags table. I realized lately this is a bad table design because I have many duplicate tags. I created another table tag_map. It has three columns id, vid_id, tag_id. I want to basically implement the 'toxi' solution shown here http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html.

What I want to do is somehow transfer the data over from tags to tag_map, delete the vid_id column in tags and delete any extra entries in tags so each tag_map entry is mapped to only one tag entry. Anyone know an efficient way to do this? I keep thinking:

$sql = 'SELECT * FROM tags';
$stmt3 = $conn->prepare($sql);
$result=$stmt3->execute();
while ($row = $stmt3->fetch(PDO::FETCH_ASSOC)) {

$id=md5(uniqid());

    $sql = 'INSERT INTO tag_map VALUES (?,?,?)';
$insert = $conn->prepare($sql);
$result=$insert->execute(array($id,$row['vid_id'],$row['id']));
}

but then I get confused when I try to think of how I am going to remove the extra tags in table tags and map each tag_map entry to only one tags entry. Any advice would be greatly appreciated.

If I understand your post, then what you're after is a "Junction table" or "join table." (Upon re-reading, yes, this is the "Toxi" solution you're referring to. Not sure where it got the name "Toxi" but whatever.)

http://en.wikipedia.org/wiki/Junction_table

You want to have a table that has exactly one row for a given video. And another table that has exactly one row for each tag. Then you need a third table in between that has one row for every combination of video and tag. That way, you can query the third join table for all of the videos that match a given tag, or all the tags that match a given video.

I would create a new table "tag" and a table "video_tag".

foreach ( video_record in the video table ) {
    existingTags = read in list of tags for video_record from existing_tags_table
    foreach ( tag in existingTags ) {
        newTag = read tag from new tag table
        if ( newTag == null ) {
            save tag in new table
            newTag.name = existing tag name
            newTag.id = id from save procedure above
        }
        save entry in video_tag( video.id, newTag.id )
    }
}

Then delete the old tag table. If you want, do some renaming of the new tag tables.

One thing you could do is set the vid_id and tag_id pair as the primary key in your new tag_map table. That way, when you go through the algorithm you have listed above, duplicate map entries will not be created, but you'll still have one for each tag. Then, you can run a query to delete duplicates in the tags table like this one:

DELETE FROM TAGS WHERE ID NOT IN (SELECT tag_id FROM TAG_MAP)