This question already has an answer here:
I am trying to remove duplicate rows from my database so for that I am using this query
DELETE FROM data
WHERE data.ID NOT IN (
SELECT * FROM (
SELECT MIN(ID) FROM data GROUP BY Link
) AS p
)
It is working fine but the problem is my database has over 1 Million rows so when I use this it takes the hell of time like after 4 to 5 hours it was still at loading.. and then I just closed the tab. So Please if someone has a better query tell me. Thanks in Advace
Table Structure http://s29.postimg.org/bt57k5enb/image.jpg
</div>
One solution could be:
1) Create a temp table
2) Store single record for each Link column
3) Truncate "data" table
4) Alter the "data" table(add UNIQUE KEY CONSTRAINT)
5) Reimport data table back from temp table and delete tmp table
1&2) CREATE TABLE tmp AS SELECT * FROM data GROUP BY Link;
3) TRUNCATE TABLE data; -- disable foreign key constraints if any
4) ALTER TABLE data ADD UNIQUE KEY data_link_unique(Link);
5) INSERT INTO data SELECT * FROM tmp;