python利用xpath想取未包含在标签内的文字

在爬贝壳网数据时遇到几个问题:
问题1::面积、户型、朝向信息都没有包含在节点内,放一个网页源代码截图:

img

我试着用following-sibling去爬两个/之间的文字,但是好像不能写在循环里,一直报错“list index out of range”,如果把./div改为//div可以运行,但是这样爬到的数据不对,想知道要怎么修改

size = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[1]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[1]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node())]')[0]
 direction = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node())]')[0]
 pattern = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/span[@class="hide"]/following-sibling::node())]')[0]

问题2:楼层的数据被hide住了,一直取不出来,网页源代码截图:

img

我正常用xpath去取,一直取出来为空:


floor = div.xpath('./div[@class="content__list--item--main"]/p[2]/span[@class="hide"]/text()')[0]

全部源代码如下:

import requests
import csv
from lxml import etree
if __name__ == "__main__":
    headers={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0'
    }
    url='https://sz.zu.ke.com/zufang/futianqu/pg%drt200600000001/#contentList'
    fp = open('beike.csv','w',newline='',encoding='utf-8')
    writer = csv.writer(fp)
    writer.writerow(['名称','面积','户型','区','街道','地点','价格','朝向','楼层','维护时间','标签','链接'])
    for pageNum in range(0,1):
        new_url=format(url%pageNum)
        page_text = requests.get(url=new_url,headers=headers).text
        #解析页面
        tree = etree.HTML(page_text)
        div_list = tree.xpath('//div[@class="content__list"]/div[@class="content__list--item"]')
        for div in div_list:
            title = div.xpath('./div[@class="content__list--item--main"]/p[1]/a/text()')[0]
            link = 'https://sz.zu.ke.com'+div.xpath('./div[@class="content__list--item--main"]/p[1]/a/@href')[0]
            district_1=div.xpath('./div[@class="content__list--item--main"]/p[2]/a[1]/text()')[0]
            district_2=div.xpath('./div[@class="content__list--item--main"]/p[2]/a[2]/text()')[0]
            district_3=div.xpath('./div[@class="content__list--item--main"]/p[2]/a[3]/text()')[0]
            size = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[1]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[1]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node())]')[0]
            direction = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[2]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node())]')[0]
            pattern = div.xpath('./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node()[position() <count(./div[@class="content__list--item--main"]/p[2]/i[3]/following-sibling::node())-count(./div[@class="content__list--item--main"]/p[2]/span[@class="hide"]/following-sibling::node())]')[0]
            floor = div.xpath('./div[@class="content__list--item--main"]/p[2]/span[@class="hide"]/text()')[0]
            label_is = div.xpath('./div[@class="content__list--item--main"]/p[3]')[0]
            label = label_is.xpath('string(.)')
            time = div.xpath('./div[@class="content__list--item--main"]/p[4]/span[@class="content__list--item--time oneline"]/text()')[0]
            price = div.xpath('./div[@class="content__list--item--main"]/span[@class="content__list--item-price"]/em/text()')[0]
            print(title,link,district_1,district_2,district_3,size,floor,time,label,price)
            house=[title,size,district_1,district_2,district_3,price,floor,time,label,link]
            writer.writerow(house)


直接在页面源代码里右键选择copy xpath
//*[@id="content"]/div[1]/div[1]/div[10]/div/p[2]/text()[3]
你会发现中括号是在里面的