I have a table with many duplicate records:
shop
ID tax_id
1 10
1 10
1 11
2 10
2 12
2 10
2 10
I want to delete all duplicate records without creating a temporary table. After the update query, the table should look like:
shop
ID tax_id
1 10
1 11
2 10
2 12
Here's an in-place solution (but not one-liner)
Find out max id:
select max(id) as maxid
from shop;
Remember this value. Let's say it equals to 1000;
Re-insert unique values, with offset:
insert into shop (id, tax_id)
select distinct id + 1000, tax_id
from shop;
Drop old values:
delete from shop
where id <= 1000;
Restore normal ids:
update shop
set id = id - 1000;
PROFIT!
First off, you can prevent this by creating a Unique Index on those two fields, for future reference.
As for the solution, create a new table shopnew
with the same structure in mysql, or just delete every record from the table when the recordList is generated (make sure you have a backup!):
//Get every record from mysql
$sSQL = "Select ID, tax_id from shop";
$oRes = mysql_query($sSQL);
$aRecordList = array();
while($aRow = mysql_fetch_assoc($oRes)){
//If record is a duplicate, it will be 'overwritten'
$aRecordList[$aRow['id'].".".$aRow['tax_id']] =1;
}
//You could delete every record from shop here, if you dont want an additional table
//recordList now only contains unique records
foreach($aRecordList as $sRecord=>$bSet){
$aExpRecord = explode(".",$sRecord);
mysql_query("INSERT INTO shopnew set id=".$aExpRecord[0].", tax_id = ".$aExpRecord[1]
}
Working solution.
//Sql query to find duplicates
SELECT id, tax_id, count(*) - 1 AS cnt
FROM shop
GROUP BY id
HAVING cnt > 1
--- res
+------+--------+-----+
| id | tax_id | cnt |
+------+--------+-----+
| 1 | 10 | 2 |
| 2 | 10 | 3 |
+------+--------+-----+
//Iterate through results with your language of choice
DELETE
FROM shop
WHERE id=<res id>
AND tax_id=<res tax_id>
LIMIT <cnt - 1>
---res (iterated)
+------+--------+
| id | tax_id |
+------+--------+
| 1 | 10 |
| 1 | 11 |
| 2 | 12 |
| 2 | 10 |
+------+--------+
The two queries will require a small piece of php in order to carry out the deletes
$res = mysql_query("SELECT id, tax_id, count(*) - 1 AS cnt
FROM shop
GROUP BY id
HAVING cnt > 1")
while($row = mysql_fetch_assoc($res)){
mysql_query("DELETE
FROM shop
WHERE id=".$row['id']."
AND tax_id=". $row['tax_id']."
LIMIT ".$row['cnt'] -1 . ");
}
Edit: Revisited this recently, for what it's worth, here's an alternative solution using a temporary column, removing the need for a scripting language.
ALTER TABLE shop ADD COLUMN place INT;
SET @i = 1
UPDATE shop SET place = @i:= @i + 1;
DELETE FROM shop WHERE place NOT IN (SELECT place FROM items GROUP BY id, tax_id);
ALTER TABLE shop DROP COLUMN place;
In reality the question with its current limitations is quite a tricky challenge. I thought about the solution the whole evening (understanding that the solution will never be of use). I would not use the solution in wild nature, I just tried to find out if it is possible to do using MySQL only.
The question in my formulation: is it possible to write a series of DELETE statements that will remove duplicate rows from a two-column table without unique constraints?
Problems:
DELETE
that supports ORDER BY
can only have a WHERE
clause and does not support HAVING
. That is the order is applied after a condition is met.Suppose we have a table:
CREATE TABLE `tablename` (
`a_id` int(10) unsigned NOT NULL,
`b_id` int(10) unsigned NOT NULL,
KEY `Index_1` (`a_id`,`b_id`)
) ENGINE=InnoDB COLLATE utf8_bin;
I added a key (not UNIQUE or PRIMARY) to make lookups faster and hoping to use it in groupings.
You can feed the table with some values:
INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4);
INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4);
INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4);
As a side effect, the key became a coverage index and when we make SELECTs from the table the values displayed are sorted, but when we make deletions the values are read in the order we inserted them.
Now, let's look at the following query:
SELECT @c, @a_id as a, @b_id as b, a_id, b_id
FROM tablename, (SELECT @a_id:=0, @b_id:=0, @c:=0) as init
WHERE (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) >= 1
;
And its result:
@c, a, b, a_id, b_id
1, 1, 1, 1, 1
2, 1, 1, 1, 1
3, 1, 1, 1, 1
1, 1, 4, 1, 4
2, 1, 4, 1, 4
3, 1, 4, 1, 4
1, 2, 2, 2, 2
2, 2, 2, 2, 2
3, 2, 2, 2, 2
1, 2, 3, 2, 3
2, 2, 3, 2, 3
3, 2, 3, 2, 3
The results are automatically sorted using Index_1
, and duplicate pairs (a_id, b_id)
are enumerated in column @c
. That is our task now is to remove all rows where @c > 1
. The only problem we have is to force MySQL use Index_1
on deletion which is rather tricky without applying an additional conditions. But we can do this by using an equality check or multiple equality checks on a_id
:
DELETE FROM t
USING tablename t FORCE INDEX (Index_1)
JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init
WHERE a_id IN (1)
AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1;
DELETE FROM t
USING tablename t FORCE INDEX (Index_1)
JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init
WHERE a_id IN (2)
AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1;
SELECT * FROM tablename t;
a_id, b_id
1, 1
1, 4
2, 2
2, 3
I can not put all possible a_id
in IN()
because MySQL will understand that the index is useless in this case and the query will not remove all duplicates (only adjacent), but having say 10 different a_id
I can remove duplicates in two DELETE statements, each IN will have 5 explicit ids.
Hope, this might be useful to someone =)
Maybe this can help:
$query="SELECT * FROM shop ORDER BY id";
$rez=$dbh->query($query);
$multi=$rez->fetchAll(PDO::FETCH_ASSOC);
foreach ($multi as $key=>$row){
$rest=array_slice($multi,$key+1);
foreach ($rest as $rest){
if(($row['id']==$rest['id']) && ($row['tax_id']==$rest['tax_id'])){
$dbh->query("DELETE FROM shop WHERE id={$rest['id']} and tax_id= {$rest['tax_id']}");
}
}
}
First foreach
iterates every row, and second one doing comparation. I'm using PDO, but of course, you can do it in the procedural way.