hive表中有一个字段存储的是json数组
[{
"name": "max_operate_first_diff",
"rule": [{
"match": "max_operate_first_diff\u003c3972.0",
"priority": 1,
"description": "",
"result": -2,
"hit": "命中",
"value": 504
}, {
"match": "max_operate_first_diff\u003e=3972.0",
"priority": 2,
"description": "",
"result": 64,
"hit": "",
"value": 0
}]
}, {
"name": "up_sms",
"rule": [{
"match": " up_sms\u003c=1.0",
"priority": 1,
"description": "",
"result": 5,
"hit": "命中",
"value": 1
}, {
"match": " up_sms\u003e1.0",
"priority": 2,
"description": "",
"result": -13,
"hit": "",
"value": 0
}]
}]
需要统计每一个match对应的命中率,每一次传回来的rule条数不固定,match名字也不固定
首先,需要将存储在hive表中的json数组解析为结构化数据,可以使用hive内置的json_tuple函数进行解析,然后使用 lateral view explode将数组中的每个元素拆分成一条记录。
假设存储json数组的字段名为json_array,解析后得到的字段名为name, match, priority, description, result, hit和value。
然后,可以使用group by和count函数统计每个match对应的命中次数和总次数,最后计算每个match对应的命中率。
以下是hive查询语句的示例:
SELECT match,
SUM(CASE WHEN hit='命中' THEN 1 ELSE 0 END) AS hit_count,
COUNT(*) AS total_count,
SUM(CASE WHEN hit='命中' THEN 1 ELSE 0 END)/COUNT(*) AS hit_rate
FROM your_table
LATERAL VIEW explode(json_tuple(json_array, 'name', 'rule')) exploded AS name, rules
LATERAL VIEW explode(parse_json(rules)) exploded_rules AS match, priority, description, result, hit, value
GROUP BY match;
其中,parse_json函数用于解析json字符串。
需要注意的是,上述示例只是一种可能的实现方式,实际操作中还要根据具体的需求进行调整。