要求进行单词出现次数统计和查找,但运行后单词中总带有标点,请问如何解决
因为没有对标点符号进行处理,split函数默认是使用空格进行分割的,英语单词的分割要自己处理句子中各种标点符号,不能简单用空格或者某一字符进行分割
看这个:
from collections import Counter
content = r"""Today I shed my old skin which hath, too long, suffered the bruises of failure and the wounds of mediority.
Today I am born anew and my birthplace is a vineyard where there is fruit for all.
Today I will pluck grapes of wisdom from the tallest and fullest vines in the vineyard,for these were planted by the wisest of my profession who have come before me,generation upon generation.
Today I will savor the taste of grapes from these vines and verily I will swallow the seed of success buried in each and new life will sprout within me.
The career I have chosen is laden with opportunity yet it is fraught with heartbreak and despair and the bodies of those who have failed, were they piled one atop another, would cast a shadow down upon all the pyramids of the earth.
Yet I will not fail, as the others, for in my hands I now hold the charts which will guide through perilous waters to shores which only yesterday seemed but a dream.
Failure no longer will be my payment for struggle. Just as nature made no provision for my body to tolerate pain neither has it made any provision for my life to suffer failure. Failure, like pain, is alien to my life. In the past I accepted it as I accepted pain. Now I reject it and I am prepared for wisdom and principles which will guide me out of the shadows into the sunlight of wealth, position, and happiness far beyond my most extravagant dreams until even the golden apples in the Garden of Hesperides will seem no more than my just reward."""
def proc_punc(s):
import string
P = string.punctuation
stb = s.maketrans(P, " " *len(P))
return s.translate(stb)
res = proc_punc(content)
result = sorted(Counter(res.split()).items(), key = lambda x: x[1], reverse = True)
for k, v in result:
print(k, v)
需要使用re.split()进行切割
对应的参数https://blog.csdn.net/qq_31672701/article/details/100711585