版本: 3.11python和2022.2.3的pycharm,win10系统
问题就是:AttributeError: 'BaiduTiebaSpider' object has no attribute 'url',具体问题看代码
刚学爬虫,教程是21年的 ,不知道是不是版本的问题
用代码块功能插入代码,请勿粘贴截图
"""抓取指定贴吧的指定页的数据,保存到本地文件"""
from urllib import request, parse
import time
import random
class BaiduTiebaSpider:
url: str
def _init_(self):
self.url = 'http://tieba.baidu.com/f?kw={}&pn={}'
self.headers = {'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; Tablet PC 2.0; .NET4.0E)'}
def get_html(self,url):
"""获取响应内容"""
req = request.Request(url=url, headers=self.headers)
res = request.urlopen(req)
html: object = res.read().decode()
return html
def parse_html(self):
"""解析提取数据的函数"""
pass
def save_html(self, filename, html):
"""数据处理函数"""
with open(fielname,'w') as f:
f.write(html)
def run(self) -> object:
"""程序入口函数"""
name = input('请输入贴吧名称:')
start = int(input('请输入起始页:'))
end = int(input('请输入终止页:'))
params = parse.quote(name)
# 1.拼接URL地址
for page in range(start, end + 1):
pn = (page - 1) * 50
url = self.url.format(params, pn)
# 发请求,解析,保存
html = self.get_html(url)
filename = '{}_第{}页.html'.format(name, page)
self.save_html(filename, html)
# 打印终端提示
print('第%d页抓取成功' % page)
if __name__ == '__main__':
spider = BaiduTiebaSpider()
spider.run()
运行结果及报错内容
Traceback (most recent call last):
File "C:\Users\D\Desktop\联系.py", line 54, in
spider.run()
^^^^^^^^^^^^
File "C:\Users\D\Desktop\联系.py", line 43, in run
url = self.url.format(params, pn)
^^^^^^^^
AttributeError: 'BaiduTiebaSpider' object has no attribute 'url'
进程已结束,退出代码1
我的解答思路和尝试过的方法
在网上搜到的内容信心有点繁杂,没有头绪,帮帮忙,