如何用hive实现词频统计

请教一下Hadoop如何用hive实现一个文本的词频统计?要有具体的步骤,明天上机考试。环境安装啥的我都已经安装好了

img

http://t.csdn.cn/b8yLn


1. 创建一个外部表,指向`input.txt`文件所在的目录:
CREATE EXTERNAL TABLE input_text (
  line STRING
)
LOCATION '/path/to/input/directory';
  1. 将每行文本拆分成单词,并计算每个单词出现的次数:
SELECT word, COUNT(*) AS count
FROM (
  SELECT explode(split(line, ' ')) AS word
  FROM input_text
) t
GROUP BY word;
  1. 将结果保存到一个新表中:
CREATE TABLE word_count (
  word STRING,
  count BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

INSERT INTO word_count
SELECT word, COUNT(*) AS count
FROM (
  SELECT explode(split(line, ' ')) AS word
  FROM input_text
) t
GROUP BY word;
您好,我是有问必答小助手,您的问题已经有小伙伴帮您解答,感谢您对有问必答的支持与关注!
PS:问答VIP年卡 【限时加赠:IT技术图书免费领】,了解详情>>> https://vip.csdn.net/askvip?utm_source=1146287632