题目要求:
读取一个英文txt文件,提取其中的单词并计算其频次
·按频次对单词由大到小排序
将单词和频次写入文件words.csv(Comma
Separated Values,逗号分隔值文件)
参考学习:
import re
from collections import Counter
import csv
# 读取文件
with open('filename.txt', 'r') as f:
text = f.read()
# 去除标点符号并分词
words = re.findall(r'\b[a-zA-Z]+\b', text)
word_count = Counter(words)
# 对单词按照频次排序
result = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
# 将结果写入CSV文件
with open('words.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['word', 'count'])
for row in result:
writer.writerow(row)
不知道你这个问题是否已经解决, 如果还没有解决的话:国际通用的一二维数据存储格式,一般.csv扩展名
一般索引习惯为 : ls[row][column],先行后列
根据一般习惯,外层列表每个元素是一行,按行存
从CSV格式的文件中读入数据:
fo = open(fname)
ls = []
for line in fo:
line = line.replace("\n","")
ls.append(line.split(","))
fo.close()
将二维列表写入到CSV格式的文件
ls = [[],[],[]] #二维列表
f = open(fname,'w')
for item in ls:
f.write(','.join(item) + '\n')
f.close()
读取二维列表
采用二层循环
ls = [[1,2],[3,4],[5,6]]
for row in ls:
for column in row:
print(column)
使用Python读取一个英文txt文件并计算单词频次,则可以按以下步骤进行:
with open('xxx.txt', 'r') as f:
content = f.read()
import string
translator = str.maketrans('', '', string.punctuation + '\n' + string.digits)
content = content.translate(translator).lower()
word_list = content.split(' ')
from collections import Counter
word_count = Counter(word_list)
sorted_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
import csv
with open('words.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Word', 'Count'])
writer.writerows(sorted_word_count)
完整代码如下:
import string
from collections import Counter
import csv
translator = str.maketrans('', '', string.punctuation + '\n' + string.digits)
with open('xxx.txt', 'r') as f:
content = f.read()
content = content.translate(translator).lower()
word_list = content.split(' ')
word_count = Counter(word_list)
sorted_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
with open('words.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Word', 'Count'])
writer.writerows(sorted_word_count)