beautifulsoup爬取职位信息

1.我用requests和BeautifulSoup 爬取51job招聘网的职位信息。

2.可以看到,每个职位的信息在class = ‘e’ 这个元素下边,但是当我在pycharm里写代码的时候,就拿出来的不是职位的信息。

3.换个标签,用它的父级标签,取出来是[] 空列表。

大神们,求救。

建议将代码贴出来。

肯定是存在多个class="e" 的div 呀

来个网址呗!另外这个是不是登录状态才能看到的?根据你的描述,可能有几个问题所在

1、可能是需要登录的才能查看的,你没有登录

2、网页时动态加载的,直接requests出来的数据静态页面

排查方法,把r.text写入文本文件中,后缀改为.html,使用浏览器打开看看,你请求到的页面是什么

import requests
from bs4 import BeautifulSoup # 导入包


headers = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Cookie':'guid=25c5f463a94b97aead0dc2b456889e11; _ujz=MTgyNDYzNzM4MA%3D%3D; ps=needv%3D0; adv=adsnew%3D0%26%7C%26adsnum%3D7093970%26%7C%26adsresume%3D1%26%7C%26adsfrom%3Dhttps%253A%252F%252Fwww.baidu.com%252Fother.php%253Fsc.Ks0000j2Nz72zIvLY4fwBYXcAJIeXC7Juu645YO5HM7XixN-udfCVmTdJ1y6Xgy5v9oF8Jn7Xgl-Z3-aNwg_veAnqzjJES4p3AAuuqE3GTtRiTua85xFfr05wGcpvm0Y_UCw7qgK53iOzWxzk-3MaenQhYyN7QoNY4sjxYSz637gLwEqGchILq-SwjtsesKto9RSDGdUcr1g16TzHrWvl-5ibOOm.DR_NR2Ar5Od66CHnsGtVdXNdlc2D1n2xx81IZ76Y_XPhOWEtUrorgAs1SOOo_9OxOBI5lqAS61kO56OQS9qxuxbSSjO_uPqjqxZOg7SEWSyWxSrOSFO_OguCOBxQetZO03x501SOOoCgOQxG9YelZ4EvOqJGMqEOggjSS4Wov_f_lOA7MuvyNdleQeAI1PMAeB-5Wo9Eu88lN2s1f_TTMHYv00.TLFWgv-b5HDkrfK1ThPGujYknHb0THY0IAYqPH7JUvc0IgP-T-qYXgK-5H00mywxIZ-suHY10ZIEThfqPH7JUvc0ThPv5HD0IgF_gv-b5HDdnWRvP16zrjn0UgNxpyfqnHfzP1f3rHD0UNqGujYknjb1rj0LP6KVIZK_gv-b5HDznWT10ZKvgv-b5H00pywW5R9awfKspyfqnHf0mv-b5Hnzn6KWThnqPjmsrHf%2526ck%253D8845.4.66.316.157.403.293.222%2526dt%253D1620446990%2526wd%253D51job%2526tpl%253Dtpl_12273_24677_20875%2526l%253D1525678283%2526us%253DlinkName%25253D%252525E6%252525A0%25252587%252525E9%252525A2%25252598-%252525E4%252525B8%252525BB%252525E6%252525A0%25252587%252525E9%252525A2%25252598%252526linkText%25253D%252525E3%25252580%25252590%252525E5%25252589%2525258D%252525E7%252525A8%2525258B%252525E6%25252597%252525A0%252525E5%252525BF%252525A751Job%252525E3%25252580%25252591-%25252520%252525E5%252525A5%252525BD%252525E5%252525B7%252525A5%252525E4%252525BD%2525259C%252525E5%252525B0%252525BD%252525E5%2525259C%252525A8%252525E5%25252589%2525258D%252525E7%252525A8%2525258B%252525E6%25252597%252525A0%252525E5%252525BF%252525A7%2521%252526linkType%25253D%26%7C%26; slife=lastvisit%3D230400%26%7C%26lowbrowser%3Dnot%26%7C%26lastlogindate%3D20210509%26%7C%26securetime%3DAz8GM1I2WDwHYQI4X2MBaltqAjE%253D; 51job=cuid%3D182463738%26%7C%26cusername%3DqztxKdYEOulRMJrZD533%252F39eMT5%252F6vuvifjn1nKbb7A%253D%26%7C%26cpassword%3D%26%7C%26cname%3DEIT2kiAlaAvlC5OUMCGpYg%253D%253D%26%7C%26cemail%3DNdhsaCKrEJnwwL82WkU2G1Wcmy7XnCKk3UV0XtkIH7Q%253D%26%7C%26cemailstatus%3D0%26%7C%26cnickname%3D%26%7C%26ccry%3D.0C2DUWzxv1nw%26%7C%26cconfirmkey%3Dha%252FFlXtosa6y2%26%7C%26cautologin%3D1%26%7C%26cenglish%3D0%26%7C%26sex%3D0%26%7C%26cnamekey%3DhaU4BhlA2iEII%26%7C%26to%3Ddc9c53a49c8960c90d8eb43a26859d5260960f30%26%7C%26; search=jobarea%7E%60000000%7C%21ord_field%7E%600%7C%21recentSearch0%7E%60000000%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FApython%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21recentSearch1%7E%60020000%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FApython%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21recentSearch2%7E%60170200%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FApython%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21recentSearch3%7E%60170200%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FAjava%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21recentSearch4%7E%60170200%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA00%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA0%A1%FB%A1%FA%A1%FB%A1%FA2%A1%FB%A1%FA1%7C%21; nsearch=jobarea%3D%26%7C%26ord_field%3D%26%7C%26recentSearch0%3D%26%7C%26recentSearch1%3D%26%7C%26recentSearch2%3D%26%7C%26recentSearch3%3D%26%7C%26recentSearch4%3D%26%7C%26collapse_expansion%3D',
'Host':'search.51job.com',
'Referer':'https://www.51job.com/',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.4098.3 Safari/537.36'
}# 伪装请求头

url ='https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
res = requests.get(url,headers = headers) # 访问链接
res = res.text # 拿到数据
soup = BeautifulSoup(res,'html.parser') # 解析数据
work_list = soup.find_all('div',class_="e") # 提取职位信息 返回的是所有职位,封装一个list。
print(work_list)

 

是你解析的时候还在加载。

那个链接怎么拿到啊???