我有一个多个txt文本数据,数据形式如下:
from collections import Counter
import pandas as pd
with open('test.txt', 'r') as file:
# 读取文件的每一行并存储为列表
lines = file.readlines()
print(lines)
arr = []
for line in lines:
nums = line.split(" ")
arr.extend(nums)
print(arr)
# 使用Counter统计重复项
counter_dict = dict(Counter(arr))
print(counter_dict)
# 转置 counter_dict,并指定列名
df = pd.DataFrame([counter_dict]).T.reset_index()
df.columns = ['Item', 'Count']
df.to_excel('output.xlsx', index=False)
直接用结巴 分词