【Python|爬虫】如何爬取下一页

小说只爬取了59章还有其他目录页没有爬取，
目录页总共11页只爬取了1页

```python
import requests
from lxml import etree
 
url='https://www.qb5200.la/book/116524/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
 
res=requests.get(url,headers=headers)
html=etree.HTML(res.text)
chapter_name=html.xpath("//*/dl[@class='zjlist']/dd//text()")
href=html.xpath("//*/dl[@class='zjlist']/dd/a/@href")
base_url="https://www.qb5200.la/book/116524/"
for i in range(len(chapter_name)):
    
    data=requests.get(base_url+href[i],headers=headers)
    html=etree.HTML(data.text)
    content=html.xpath("//*/div[@id='content']//text()")
    with open(f'e:/123/{chapter_name[i]}.txt', 'w',encoding="utf-8") as f:
        for d in content:
            f.write(d.replace("\xa0\xa0\xa0\xa0",'\n'))

```

很简单加个循环就行了


import requests
from lxml import etree
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}

for n in range(1,12): 
    url=f'https://www.qb5200.la/book/116524/index_{n}.html'
    res=requests.get(url,headers=headers)
    html=etree.HTML(res.text)
    chapter_name=html.xpath("//*/dl[@class='zjlist']/dd//text()")
    href=html.xpath("//*/dl[@class='zjlist']/dd/a/@href")
    base_url="https://www.qb5200.la/book/116524/"
    for i in range(len(chapter_name)):
        data=requests.get(base_url+href[i],headers=headers)
        html=etree.HTML(data.text)
        content=html.xpath("//*/div[@id='content']//text()")
        with open(f'e:/123/{chapter_name[i]}.txt', 'w',encoding="utf-8") as f:
            for d in content:
                f.write(d.replace("\xa0\xa0\xa0\xa0",'\n'))
        print(f'"{chapter_name[i]}" 保存完毕')
        f.close()

分析每一页的规律，然后再循环中组装需要爬取的具体某一页的链接

你可以参考下这个问题的回答, 看看是否对你有帮助, 链接: https://ask.csdn.net/questions/7756126
这篇博客你也可以参考下：【Python】爬虫：图形验证码识别（一）
你还可以看下python参考手册中的 python- 创建虚拟环境
除此之外, 这篇博客: 【Python】题库练习一中的反思部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:
怎么说呢，目前小新的Python基础知识还在学习中，而且有些知识点学的时间较长，出现遗忘的现象。所以呢，以后小新可要夯实基础，锤炼本领！
您还可以看一下龙豪杰老师的Python网络爬虫课程中的设置循环下一页小节, 巩固相关知识点
以下回答由chatgpt基于相关博客总结生成:
问题的解决方案是为爬虫编写一个循环，使其可以自动获取下一页的数据。下面是一个基本的Python代码示例，展示如何实现这个功能：
```
import requests
from bs4 import BeautifulSoup

def get_page_data(url):
    # 发送HTTP请求获取页面内容
    response = requests.get(url)
    # 使用BeautifulSoup解析页面内容
    soup = BeautifulSoup(response.text, 'html.parser')

    # 在这里解析页面的数据，并进行相应的操作

    # 获取下一页的链接
    next_page_link = soup.find('a', {'class': 'next-page'})['href']

    return next_page_link

def crawl_data(start_url):
    current_page_url = start_url

    while current_page_url:
        # 获取当前页面的数据
        page_data = get_page_data(current_page_url)

        # 在这里处理页面数据，例如存储到数据库或文件中

        # 获取下一页的链接
        current_page_url = get_page_data(current_page_url)
```
这个例子中，get_page_data()函数用于发送HTTP请求获取页面内容，并使用BeautifulSoup解析页面的数据。在这个函数中，你可以根据网页结构和需要解析的数据来编写相应的代码。
crawl_data()函数用于爬取数据的主循环，它会不断调用get_page_data()函数来获取当前页面的数据，并对数据进行相应的处理。在这个例子中，我们假设每个页面都有一个类似于<a class="next-page" href="...">下一页</a>的链接，通过找到这个链接来获取下一页的链接。
请注意，这只是一个基本的示例代码，你需要根据实际情况修改和完善代码，例如处理异常、设置请求头、处理页面的数据等等。另外，你可能还需要使用一些工具或库来辅助你的爬虫程序，例如requests库用于发送HTTP请求，BeautifulSoup库用于解析HTML内容。
希望以上回答对你有所帮助，如果你有任何疑问，请随时追问。