selenium抓取商品链接,只打印第一页数据
from selenium import webdriver
import json
import time, random
import re
driver = webdriver.Chrome()
url = ''
driver.get(url)
def parse_data():
response = driver.page_source
json_str = re.findall('g_page_config = (.*);', response)[0]
print(json_str)
time.sleep(3)
driver.find_element(By.XPATH, '//*[text()="下一页"]').click()
for page in range(0, 10):
print(f'---正在打印第{page}页---')
parse_data()
time.sleep(1)
response打印出来一直重复第一页数据
将这句改一下
driver.find_element(By.XPATH, '//*[text()="下一页"]').click()
改成:
driver.find_element_by_xpath( '//*[contains(text()="下一页")]').click()
[通过contains方法中的text属性定位 ]
手写有问题,你找到了下一页之后,并没有去保存,而是直接click了
看代码看上去是OK的,但是要确定下点击【下一页】时,浏览器是否有刷新,重新请求下一页中的URL
有段代码参考
from selenium import webdriver
from selenium.webdriver.common.by import By
import json
import time, random
import re
driver = webdriver.Chrome()
url = 'https://www.imooc.com/course/list'
driver.get(url)
def parse_data():
response = driver.page_source
print(driver.current_url) # 请求url
print(response)
time.sleep(3)
driver.find_element(By.XPATH, '//*[text()="下一页"]').click()
for page in range(0, 10):
print(f'---正在打印第{page}页---')
parse_data()
time.sleep(1)
Python爬虫使用selenium爬取天猫商品信息
https://blog.csdn.net/Python_sn/article/details/108816541