本人正致力于完成使用Scrapy爬取前程无忧,按照自己的代码进行尝试存在问题。于是按照老师的教程文章《Scrapy爬取前程无忧》进行实战操作,然而运行完main文件之后一直报错,并且报错内容与我的原生代码运行后结果一致。
```python
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
from dbutil.connection import MysqlConnection
from time import time
class QcwyPipeline:
def __init__(self):
self.start_time = time()
self.conn = MysqlConnection.getConnection()
self.sql = 'insert into qcwy values(null ,%s, %s, %s, %s, %s);'
self.count = 0
# self.cursor = self.conn.cursor()
def process_item(self, item, spider):
self.cursor = self.conn.cursor()
company = item['company']
job_name = item['job_name']
salary = item['salary']
requirement = item['requirement']
welfare = item['welfare']
print('{}: {}'.format(company, job_name))
self.count += self.cursor.execute(self.sql, (company, job_name, salary, requirement, welfare))
def close_spider(self, spider):
if self.cursor:
self.cursor.close()
self.conn.commit()
MysqlConnection.closeConnection(self.conn)
print('总共爬取{}条记录,耗时:{}秒'.format(self.count, time() - self.start_time))
```
File "C:\Users\zoo\qcwyzll\qcwyzll\spiders\qcwyCrawler.py", line 18, in parse
json_str = response.xpath('/html/body/script[2]/text()').extract_first()[29:]
TypeError: 'NoneType' object is not subscriptable
2022-12-06 15:10:30 [scrapy.core.engine] ERROR: Scraper close failure
Traceback (most recent call last):
File "D:\Python\lib\site-packages\twisted\internet\defer.py", line 662, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Users\zoo\qcwyzll\qcwyzll\pipelines.py", line 32, in close_spider
if self.cursor:
AttributeError: 'QcwyPipeline' object has no attribute 'cursor'
更换cursor位置;xpath路径反复确定。
成功启动项目进行爬取。