爬取新闻页的所有新闻标题、详情、时间(包含翻页的所有数据)爬取新闻翻页的所有数据
使用了 requests 库发送 HTTP 请求,使用 BeautifulSoup 库解析网页内容。
可以参考示例代码:
import requests
from bs4 import BeautifulSoup
def get_news(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
news_list = []
# 提取新闻标题、详情和时间
news_elements = soup.find_all('div', class_='news-item')
for news_element in news_elements:
title = news_element.find('h3').text.strip()
detail = news_element.find('p', class_='detail').text.strip()
time = news_element.find('span', class_='time').text.strip()
news = {
'title': title,
'detail': detail,
'time': time
}
news_list.append(news)
return news_list
def crawl_news_pages(base_url, num_pages):
all_news = []
for page in range(1, num_pages + 1):
url = base_url + str(page)
news = get_news(url)
all_news.extend(news)
return all_news
# 示例:爬取网站的前3页新闻
base_url = 'https://example.com/news?page='
num_pages = 3
all_news = crawl_news_pages(base_url, num_pages)
# 打印爬取的新闻
for news in all_news:
print('标题:', news['title'])
print('详情:', news['detail'])
print('时间:', news['time'])
print('---')
另外你也可以参考一下这个实战:是用python来爬取人民日报文章的
https://juejin.cn/post/6987339985556865038
不知道你这个问题是否已经解决, 如果还没有解决的话:def wrapFun(func): def inner(a, b): print('function name:', func.__name__) r = func(a, b) return r return inner @wrapFun def myadd(a, b): return a + b print(myadd(2, 3))