hql语句的优化,广告方面的计算 hive hql 数据库

这个是根据如下表写的,计算出不同广告主的广告的点击率 展示,点击

CREATE TABLE ODS_GG_QL(
`ID` string COMMENT 'ID',
`USER_ID` string COMMENT '用户账号',
`DEVIICE_ID` string COMMENT '设备id',
`create_time` string COMMENT '时间',
`APP_ID` string COMMENT 'APPID',
`ADSPACE_CODE` string COMMENt '广告位',
`MATE_CODE` string COMMENT '广告素材编码',
`operate_type` string COMMENT '交互',
`source` string COMMENT '来源',
`IP` string COMMENT 'IP',
`ADS_CODE` string COMMENT '所属广告主',
`SHOW_TIME` string COMMENT '展示时间'
);
语句如下(怎么优化?)
```sql
create table ggz_yhxw
as 
select t1.c1,t2.c2,t3.c3
from
(select count(*) c1 from ods_gg_ql where operate_type='show' and create_time ='2022-02-19' group by ADS_CODe) t1,
(select count(*) c2 from ods_gg_ql where operate_type='clic' and create_time ='2022-02-19' group by ADS_CODe) t2,
(select
onefh.a1/oneqq.b1 c3
from
(select count(*)b1 from ods_gg_ql where operate_type='show' and create_time ='2022-02-19' group by ADS_CODe) as oneqq,
(select count(*)a1 from ods_gg_ql where operate_type='clic' and create_time ='2022-02-19' group by ADS_CODe) as onefh) t3;

  1. 首先,你这个代码本身应该存在问题,t1/t2/t3输出的数据不止一行,因为有"group by ADS_CODe",但你并没有写任何关联条件,这样会导致产生笛卡尔积,数据结果记录翻倍。
  2. 然后,要优化这个sql,很明显你这都是查的同一个表,而且聚合的维度一样,那么完全可以只用查一次,把不同的查询条件放到count里用case when 处理
    select ADS_CODe,
    count( case when operate_type='show' then 1 end ) c1,
    count( case when operate_type='clic' then 1 end ) c2,
    count( case when operate_type='clic' then 1 end )/count( case when operate_type='show' then 1 end ) c3
    from ods_gg_ql where  create_time ='2022-02-19' group by ADS_CODe
    

select
*
,a.clic/a.show as clic_rate
from(
select
create_time
,sum(case when operate_type='clic' then 1 else 0 end) as clic
,sum(case when operate_type='show' then 1 else 0 end) as show
from
ods_gg_ql
where create_time='2022-02-19'
group by
create_time)a;

您好,我是有问必答小助手,您的问题已经有小伙伴帮您解答,感谢您对有问必答的支持与关注!
PS:问答VIP年卡 【限时加赠:IT技术图书免费领】,了解详情>>> https://vip.csdn.net/askvip?utm_source=1146287632