python实现词频统计问题

编写程序,统计下面英文短文中,每个单词出现的次数。其他要求:(1)忽略大小写;(2)去除标点符号,不能在单词中出现标点或出现对标点的统计;(3)按词频由高到低的顺序输出统计结果,每个词及其统计结果显示时固定宽度并右对齐,每行显示5个单词的统计结果,总体要求整齐,方便查看,参考代码行数:35行
文章:
In our world , one creature without any rivals is a lifeless creature. If a man lives without rivals, he is bound to be satisfied with the present and will not strive for the better. He would hold back before all difficulties and decline in inaction and laziness. Adverse environment tends to cultivate successful people. Therefore, your rivals are not your opponents or those you grudge. Instead , they are your good friends! In our lives, we need some rivals to "push us into the river", leaving us striving ahead in all difficulties and competitions. In our work, we need some rivals to be picky about us and supervise our work with rigorous requirements and standards. Due to our rivals, we can bring out our potential to the best; Due to our rivals, we will continuously promote our capabilities when competing with them!
预期效果:

img

import re

data = 'In our world , one creature without any rivals is a lifeless creature. If a man lives without rivals, he is bound to be satisfied with the present and will not strive for the better. He would hold back before all difficulties and decline in inaction and laziness. Adverse environment tends to cultivate successful people. Therefore, your rivals are not your opponents or those you grudge. Instead , they are your good friends! In our lives, we need some rivals to "push us into the river", leaving us striving ahead in all difficulties and competitions. In our work, we need some rivals to be picky about us and supervise our work with rigorous requirements and standards. Due to our rivals, we can bring out our potential to the best; Due to our rivals, we will continuously promote our capabilities when competing with them!'
data = data.lower()

rule = re.compile("[^a-zA-Z]")
data = rule.sub(' ', data)
setWords = data.split(' ')
result = {}
for item in setWords:
    if item == ' ' or item == '':
        continue
    result[item] = result.get(item, 0) + 1

result = sorted(result.items(), key=lambda x: x[1], reverse=True)

for index, item in enumerate(result, 1):
    print('{}->{}'.format(item[0].rjust(20, ' '), item[1]), end='')
    if index % 5 == 0:
        print()

img


txt = open('hamlet.txt','r').read()

# 将大写变小写,排除大小写差异的干扰
txt = txt.lower()

# 将文本中的特殊字符转化为空格,统一分割方式
for ch in ',./?;:'"<>=+-[]{}!~%@()#':
    txt.replace(ch, ' ')     

words = txt.split()    # 按空格分隔,列表形式返回
counts = {}         #计数器
for word in words:
    counts[word] = counts.get(word, 0) + 1

# 按照词频从高到低排序
counts = sorted(counts.items(), key = lambda x: x[1], reverse = True)

for i in range(10):
    word, count = counts[i]
    print('{0:<10}:{0:>5}'.format(word,count)

【仅供参考,期望对你有所帮助】

1. 思路

  • 遍历整个内容,找出其中的所有单词
  • 统计单词的频数
  • 单词的特点:以字母开头,以字母结尾

img

2. 效果

img

3. 程序

content = '''
In our world , one creature without any rivals is a lifeless creature.
If a man lives without rivals, he is bound to be satisfied with the present
and will not strive for the better. He would hold back before all difficulties
and decline in inaction and laziness. Adverse environment tends to cultivate
successful people. Therefore, your rivals are not your opponents or those you
grudge. Instead , they are your good friends! In our lives, we need some rivals
to "push us into the river", leaving us striving ahead in all difficulties and
competitions. In our work, we need some rivals to be picky about us and
supervise our work with rigorous requirements and standards.
Due to our rivals, we can bring out our potential to the best;
Due to our rivals, we will continuously promote our capabilities
when competing with them!
'''

def getLowLetter(c):
    if not c.isalpha() :
        return ''
    elif c >= 'A' and c <= 'Z':
        return c.lower();
    return c

words = {}
startWord = False
word = ""

for c in content:
    lc = getLowLetter(c)
    if lc != '' :
        if not startWord :
            startWord = True
        word += lc
    else:
        if startWord:
            startWord = False
            words[word] = words.get(word, 0) + 1
            word = ""
result = sorted(words.items(), key=lambda x: x[1], reverse=True)
for i in range(len(result)):
    print("{:>15s}:{:<3d}".format(result[i][0],result[i][1]),end="")
    if (i+1) % 5 == 0 :
        print()
from collections import Counter

def proc_punc(s):
    import string
    
    punc = string.punctuation
    table = ''.maketrans(punc, ' ' *len(punc))
    return s.lower().translate(table).split()

content ="""In our world , one creature without any rivals is a lifeless creature. If a man lives without rivals, he is bound to be satisfied with the present and will not strive for the better. He would hold back before all difficulties and decline in inaction and laziness. Adverse environment tends to cultivate successful people. Therefore, your rivals are not your opponents or those you grudge. Instead , they are your good friends! In our lives, we need some rivals to "push us into the river", leaving us striving ahead in all difficulties and competitions. In our work, we need some rivals to be picky about us and supervise our work with rigorous requirements and standards. Due to our rivals, we can bring out our potential to the best; Due to our rivals, we will continuously promote our capabilities when competing with them!"""

res = sorted(Counter(proc_punc(content)).items(), key = lambda x: (x[1], x[0]), reverse = True)
for i , (k, v) in enumerate(res, 1):
    if i % 6 == 0:
        print()
    else:
        print(f"{k:>20}-->{v:>2}", end ='')

img