这个真的不会啊谁会啊

1.自己准备一个文本文档,进行词频统计后,将高频词的前5个,制作一个折线图、一个簇状柱形图、一个饼图、将三个图形放在一个绘图区域中。
2. 任选一个带有表格
的网页,进行数据爬取,并将爬取结果保存在一个excel文件中。
说明:需要提交源代码和运行结果截图,截图要清晰。

【第1题】

import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['font.sans-serif'] = ['SimHei']  #用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False    #用来正常显示负号

with open('zen.txt','r') as f:
    data = f.read()

data = data.replace(',','')
data = data.replace('.','')
data = data.replace('-','')
data = data.replace('*','')
data = data.replace('\n','')

lst = [word.lower() for word in data.split()]
dct = {word:lst.count(word) for word in lst}
dct = sorted(dct.items(), key=lambda x:-x[1])

X,Y = [d[0] for d in dct[:5]],[d[1] for d in dct[:5]]

plt.figure('词频统计',figsize=(12,5))
plt.subplot(1,3,1)
plt.title("折线图")
plt.ylim(0,12)
plt.plot(X, Y, color="red", label="图例一")
plt.legend()

plt.subplot(1,3,2)
plt.title("柱状图")
plt.ylim(0,12)
plt.bar(X, Y, label="图例二")
plt.legend()

plt.subplot(1,3,3)
plt.title("饼图")
exp = (0.02, 0.03, 0.04, 0.05, 0.1) #离圆心位置
plt.pie(Y, labels=X, explode = exp, autopct="(%1.1f%%)")
plt.legend(loc="lower left") #图例位置左下

plt.show()

img

测试文本zen.txt内容:
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The Zen of Python is written to a file named "zen.txt".

【第2题】爬取高校排名,写入xlsx文件。

from bs4 import BeautifulSoup as bs
from requests import get
import pandas as pd
import re

Agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
url = 'http://www.gaosan.com/gaokao/311315.html'
data = get(url,headers = {'User-Agent':Agent})
data.encoding='utf-8'

soup = bs(data.text,'html.parser')
table = soup.find('table')
colleges = table.find_all("td")

rol,row = [],[]
for i,n in enumerate(colleges):
        rol.append(re.findall(r'>(.+?)<', str(n))[0].strip())
        if i%4==3:
            row.append(rol)
            rol = []

xlsx = pd.ExcelWriter('college.xlsx')
text = pd.DataFrame(row)
text.to_excel(xlsx, header=None, index=None)
xlsx.save()
xlsx.close()
print('done!')

这不就是excel里透视图的应用么,选中表格全部,点击生成透视图,再去选择维度、指标、图例等,透视图里的类型应该都包含了你这些需求。
突然想起昨天有问大佬说excel ppt其实也是编程的 ide工具😄。