----------------------大数据量sql问题

数据库：GP
问题：
A表中有一列a，记录总数为1000w；
B表中也有一列a，记录总数为500w；

现在需要塞选A.a在B.a出现过;

sql怎么写效率高？
select count(distinct a) from A where a in (select B.a from B);

select count(distinct A.a) from A left join B on A.a=B.a where B.a<>null;

select count(distinct A.a) from A，B where B.a =A.a;
用连接查询的效率比子查询高

这么大两个表，也就是只能join一起连接，或者用反范式的方式，把B的a列添加到A表，这样可以只用查A表，这样性能最好

select count(1) from A inner join B on a.a=b.a where A.a is not null
--在 A 表的 a, B表的a 上分别建立索引。

在两个表的a列建索引，其他都是小事了