I have a table that I use to store some systematically chosen "serial numbers" for each product that is bought...
The problem is, a CSV was uploaded that I believe contained some duplicate "serial numbers", which means that when the application tries to modify a row, it may not be modifying the correct one.
I need to be able to query the database and get all rows that are a double of the serial_number
column. It should look something like this:
ID, serial_number, meta1, meta2, meta3
3, 123456, 0, 2, 4
55, 123456, 0, 0, 0
6, 345678, 0, 1, 2
99, 345678, 0, 1, 2
So as you can see, I need to be able to see both the original row and the duplicate row and all of it's columns of data ... this is so I can compare them and determine what data is now inconsistent.
Some versions of MySQL implement in
with a subquery very inefficiently. A safe alternative is a join:
SELECT t.*
FROM t join
(select serial_number, count(*) as cnt
from t
group by serial_number
) tsum
on tsum.serial_number = t.serial_number and cnt > 1
order by t.serial_number;
Another alternative is to use an exists
clause:
select t.*
from t
where exists (select * from t t2 where t2.serial_number = t.serial_number and t2.id <> t.id)
order by t.serial_number;
Both these queries (as well as the one proposed by @fthiella) are standard SQL. Both would benefit from an index on (serial_number, id)
.
SELECT *
FROM
yourtable
WHERE
serial_number IN (SELECT serial_number
FROM yourtable
GROUP BY serial_number
HAVING COUNT(*)>1)
ORDER BY
serial_number, id