救救孩子吧,用requests+re爬取彩虹岛主页信息

我刚学几天python,我也不会啊,
求源代码
例文图片说明图片说明

import requests
import re
url = 'http://tmall.chd.sdo.com/'
res= requests.get(url)
lt=re.findall('

(.*?)

',res.text,re.S)
print(lt)

爬取网页内容和简单,主要是看你想要里面的什么内容,然后通过xpath、bs4或者re去匹配即可

#-*- coding:utf-8 -*-

import requests
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36"}
url = 'http://tmall.chd.sdo.com/'
res= requests.get(url,headers=headers)
print(res.content.decode('utf-8'))

我用的java
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
String url ="https://www.baidu.com";
try {
Document doc = Jsoup.connect("https:www.baidu.com").get();
Element content = doc.body();
Elements links = content.getAllElements();
for (Element link : links) {
String linkText = link.text();
System.out.println(linkText);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
获取的页面数据,操作起来很像jq,很简单