在mysql数据库中有一张人员信息表,数据量 100万以上,根据姓名、性别、身份证号等基本信息查询出重复的数据。如何能快速查出数据。
求教如何优化。
方案1:
select t.id, t.name, t.ssid from t_case_info t join (
select id, name, ssid from t_case_info where is_del='0' and reportdate >= '2003-01-01' and reportdate <= '2003-01-31' group by name,ssid having count(1) > 1
) r on t.name = r.name and t.ssid = r.ssid
where t.is_del='0' and t.reportdate >= '2003-01-01' and t.reportdate <= '2003-01-01';
方案2:
1、创建临时表,把重复的name和ssid存入临时表
CREATE TEMPORARY TABLE r_info
(
id varchar(32) NOT NULL,
name varchar(255),
ssid varchar(25)
KEY idx(ssid, name)
)ENGINE=MEMORY DEFAULT CHARSET=utf8 ;
2、关联查询
insert into r_info(id, name, ssid) select id, name, ssid from t_case_info where is_del='0' and reportdate >= '2003-01-01' and reportdate <= '2003-01-31' group by name,ssid having count(1) > 1;
select t.id, t.name, t.ssid from t_case_info t join r_info r on t.name = r.name and t.ssid = r.ssid
where t.is_del='0' and t.reportdate >= '2003-01-01' and t.reportdate <= '2003-01-01';
可以考虑加个字段mdv,字段是‘所有要验证重复字段加起来’的MD5值之类的
然后给这个字段建立索引
然后select * from t_case_info a where exists(select 1 from t_case_info b where a.mdv=b.mdv and a.id<>b.id)