爬取网站时,xpath出错了

问题遇到的现象和发生背景

在第26行,xpath表达式不正确

问题相关代码,请勿粘贴截图

from lxml import etree

import requests

if __name__ == '__main__':
    url = 'https://m.58.com/bj/ershoufang/?reform=pcfront'
    # UA伪装
    head = {
        'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Mobile Safari/537.36'
    }
    # universal crawler
    page_text = requests.get(url=url, headers=head).text
    # xpath
    parser = etree.HTMLParser(encoding='utf-8')
    tree = etree.HTML(page_text, parser=parser)
    print(tree)
    li_list = tree.xpath('//ul[@class="list"]/li[@class="item-wrap"]')
    print(li_list)
    with open(r'../gotpages/58secondhand_houses.txt', 'w', encoding='utf-8') as stream:
        for li in li_list:
            house_name = li.xpath('./span[@class="content-title"]/text()]')
            #print(house_name)
            stream.write(house_name)
            print(house_name)



运行结果及报错内容
F:\pythonfiles\PycharmProjects\CRAWLER\venv\Scripts\python.exe "F:/pythonfiles/PycharmProjects/CRAWLER/focused crawler-Data analysis/crawler_58com realization in xpath.py"
Traceback (most recent call last):
  File "F:\pythonfiles\PycharmProjects\CRAWLER\focused crawler-Data analysis\crawler_58com realization in xpath.py", line 26, in <module>
    house_name = li.xpath('./span[@class="content-title"]/text()]')
  File "src\lxml\etree.pyx", line 1597, in lxml.etree._Element.xpath
  File "src\lxml\xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
  File "src\lxml\xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression

Process finished with exit code 1



我的解答思路和尝试过的方法
我想要达到的结果

多了个右中括号],删除,xpath也有问题

img

改下面这样就可以了,house_name = li.xpath('.//span[@class="content-title"]/text()')[0]

img

import requests
from lxml import etree
if __name__ == '__main__':
    url = 'https://m.58.com/bj/ershoufang/?reform=pcfront'
    # UA伪装
    head = {
        'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Mobile Safari/537.36'
    }
    # universal crawler
    page_text = requests.get(url=url, headers=head).text
    # xpath
    parser = etree.HTMLParser(encoding='utf-8')
    tree = etree.HTML(page_text, parser=parser)
    print(tree)
    li_list = tree.xpath('//ul[@class="list"]/li[@class="item-wrap"]')
    print(li_list)
    with open(r'gotpages/58secondhand_houses.txt', 'w', encoding='utf-8') as stream:
        for li in li_list:
            house_name = li.xpath('.//span[@class="content-title"]/text()')[0]
            #print(house_name)
            stream.write(house_name)
            print(house_name)
 
 

img


有帮助或启发麻烦点下【采纳该答案】

有个叫xpath helper的插件了解一下,好用哦

改下xpath吧,可以合成1个的

    li_list = tree.xpath('//ul[@class="list"]/li[@class="item-wrap"]//span[@class="content-title"]')
    print(li_list)
    with open(r'./58secondhand_houses.txt', 'w', encoding='utf-8') as stream:
        for li in li_list:
            print(li.text)