hive解析json数组,统计通过率

hive表中有一个字段存储的是json数组
[{
"name": "max_operate_first_diff",
"rule": [{
"match": "max_operate_first_diff\u003c3972.0",
"priority": 1,
"description": "",
"result": -2,
"hit": "命中",
"value": 504
}, {
"match": "max_operate_first_diff\u003e=3972.0",
"priority": 2,
"description": "",
"result": 64,
"hit": "",
"value": 0
}]
}, {
"name": "up_sms",
"rule": [{
"match": " up_sms\u003c=1.0",
"priority": 1,
"description": "",
"result": 5,
"hit": "命中",
"value": 1
}, {
"match": " up_sms\u003e1.0",
"priority": 2,
"description": "",
"result": -13,
"hit": "",
"value": 0
}]
}]

需要统计每一个match对应的命中率,每一次传回来的rule条数不固定,match名字也不固定

首先，需要将存储在hive表中的json数组解析为结构化数据，可以使用hive内置的json_tuple函数进行解析，然后使用 lateral view explode将数组中的每个元素拆分成一条记录。

假设存储json数组的字段名为json_array，解析后得到的字段名为name, match, priority, description, result, hit和value。

然后，可以使用group by和count函数统计每个match对应的命中次数和总次数，最后计算每个match对应的命中率。

以下是hive查询语句的示例：

SELECT match, 
       SUM(CASE WHEN hit='命中' THEN 1 ELSE 0 END) AS hit_count, 
       COUNT(*) AS total_count, 
       SUM(CASE WHEN hit='命中' THEN 1 ELSE 0 END)/COUNT(*) AS hit_rate
FROM your_table 
LATERAL VIEW explode(json_tuple(json_array, 'name', 'rule')) exploded AS name, rules
LATERAL VIEW explode(parse_json(rules)) exploded_rules AS match, priority, description, result, hit, value
GROUP BY match;

其中，parse_json函数用于解析json字符串。

需要注意的是，上述示例只是一种可能的实现方式，实际操作中还要根据具体的需求进行调整。