问题是这样的,一个词频统计的练习
def get_txt():
txt = open('sentence.txt','rt').read()
txt = txt.lower()
return txt
s_txt = get_txt()
wordset = s_txt.split()
yalp = ['a','e','i','o','u']
counts = {}
for word in wordset:
for alp in word:
if alp in yalp:
counts[word] =counts.get(word,0) + 1
items = list(counts.items())
items.sort(key = lambda x:x[1],reverse = False)
for i in range(5):
word ,counts = items[i]
print('{0:<10}{1:>5}'.format(word ,counts))
Traceback (most recent call last):
word ,counts = items[i]
IndexError: list index out of range
only 1
运行结果却报错了,上网搜了一圈仍然没有解决,可以帮忙看看吗
另外
1.对于下面这行有更优化的方法吗(感觉很蠢)
for word in wordset:
for alp in word:
if alp in yalp:
2.这行是抄的,对于为什么lambda要这么用不是很理解
items.sort(key = lambda x:x[1],reverse = False)
求解答!
要把items排序的部分移到循环外面,不然字典都还没做好呢。而且题目要逆序输出,sort里面的reverse应该为True
wordset = s_txt.split()
yalp = ['a','e','i','o','u']
counts = {}
for word in wordset:
for alp in word:
if alp in yalp:
counts[word] =counts.get(word,0) + 1
items = list(counts.items())
items.sort(key = lambda x:x[1],reverse = True)
for i in range(5):
word ,counts = items[i]
print('{0:<10}{1:>5}'.format(word ,counts))
lambda就是为了省事,没什么特别的,不然key关键字后面要跟一个函数,你也可以自定义成这样:
def fun(x):
return (x[1])
items.sort(key = fun,reverse = True)
看格式也能知道lambda是怎么写的了吧。
至于你说的优化代码部分,可以考虑把所有元音字母转换成数字1,然后统计每个单词里的1的数量
yalp = {ord('a'):ord('1'),ord('e'):ord('1'),ord('i'):ord('1'),ord('o'):ord('1'),ord('u'):ord('1')}
counts = {}
for word in wordset:
a = word.translate(yalp)
counts[word] =a.count('1')
with open('sentence.txt','r') as f:
data = f.read() #一次读出全部文本
words = sum([w.split() for w in data.split('\n')],[]) #取出所有单词,这样写法支持多行文本文件
words = [i.lower() for i in words] #大写字母转小写,如Only 和 only算同一单词就需要这一行
vowel = 'aeiou'
dic = { w:sum(int(i in vowel) for i in w) for w in words } # 单词:元音数 为键值对组成字典
dic = dict(sorted(dic.items(), key=lambda x:x[1], reverse=True)) #排序
for k,v in dic.items(): #遍历字典占位17长度输出
if v>2:
print(f'{k:<17}{v}')