用 Selenium 爬取 pdf 文件,想要自动点击 Chrome 浏览器上的下载按钮,但是XPath 找不到那个按钮,请问怎么解决?
import os
import csv
import time
import random
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
driver = webdriver.Chrome()
driver.get("https://stanford.edu/~dkim04/assets/pdf/hartshorne/001.pdf")
download_button = WebDriverWait(driver, 10).until(
ec.presence_of_element_located((By.ID, 'download'))
)
download_button.click()
Traceback (most recent call last):
File "C:/Users/ChenHaoHai/Desktop/scrap2.py", line 17, in
download_button = WebDriverWait(driver, 10).until(
File "C:\Users\ChenHaoHai\AppData\Local\Programs\Python\Python39\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x00AD06F3+2492147]
Ordinal0 [0x00A69BD1+2071505]
Ordinal0 [0x00972478+1057912]
Ordinal0 [0x0099C964+1231204]
Ordinal0 [0x009C6B62+1403746]
Ordinal0 [0x009B57FA+1333242]
Ordinal0 [0x009C4F38+1396536]
Ordinal0 [0x009B568B+1332875]
Ordinal0 [0x009921D4+1188308]
Ordinal0 [0x0099302F+1191983]
GetHandleVerifier [0x00C567A6+1545030]
GetHandleVerifier [0x00D0105C+2243580]
GetHandleVerifier [0x00B5BC97+518199]
GetHandleVerifier [0x00B5AD80+514336]
Ordinal0 [0x00A6ED2D+2092333]
Ordinal0 [0x00A72EE8+2109160]
Ordinal0 [0x00A73022+2109474]
Ordinal0 [0x00A7CB71+2149233]
BaseThreadInitThunk [0x7637FA29+25]
RtlGetAppContainerNamedObjectPath [0x77C57A7E+286]
RtlGetAppContainerNamedObjectPath [0x77C57A4E+238]
首先用
css选择器定位到下载按钮然后click点击
Chrome 浏览器上的下载按钮是浏览器自身的,是无法定位的
下载的是PDF文件,可以通过设置浏览器的配置参数,实现预览PDF的时候,就自动下载PDF文件了。需要在初始化浏览器时加参数
chrome_options = Options()
chrome_options.add_argument('--headless')
prefs = {'profile.default_content_settings.popups': 0, 'download.default_directory': 'd:\\'}
chrome_options.add_experimental_option('prefs', prefs,)
chrome_options.add_argument("--window-size=1920,1080")
self.browser = webdriver.Chrome(options=chrome_options)
# self.browser = webdriver.Chrome()
self.browser.get("下载的地址")
self.browser.maximize_window()
self.bs = Base(self.browser)
bdd = self.browser.get_log('browser')