在使用xpath获取li标签时,pycharm和jupyter返回结果不一致

最近在学习爬虫xpath语法,在获取所有li标签时发现,在pycharm中运行,各个element中的内容并不是每一个li标签,而是该li标签开始,直到最后的html标签,但是同样的程序,在jupyter中运行又没有问题,代码如下
from lxml import etree

html_str = '''
<div class="level_one on">
    <ul>
        <li> <a href="/index/index/view/id/1.html" title="什么是Java" class="on">什么是Java</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="Java的版本">Java的版本</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="Java API文档">Java API文档</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="JDK的下载">JDK的下载</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="JDK的安装">JDK的安装</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="配置JDK">配置JDK</a> </li>
    </ul>
</div>
'''

html = etree.HTML(html_str)
node_all = html.xpath('//*')
print('数据类型:', type(node_all))
print('数据长度:', len(node_all))
print('数据内容:', node_all)
print('节点名称:', [i.tag for i in node_all])

li_all = html.xpath('//li')
print('所有li节点:', li_all)
print('获取指定li节点:', li_all[1])
li_txt = etree.tostring(li_all[1], encoding='utf-8')
print('获取指定节点HTML代码:', li_txt.decode('utf-8'))

pychram运行结果如下
D:\Anaconda\python.exe G:/Python/Code/网络爬虫/8_xpath/8.4.py
数据类型: <class 'list'>
数据长度: 16
数据内容: [<Element html at 0x1a22b141700>, <Element body at 0x1a22b4a4c80>, <Element div at 0x1a22b4a4cc0>, <Element ul at 0x1a22b4a4dc0>, <Element li at 0x1a22b4a4e00>, <Element a at 0x1a22b4a4e80>, <Element li at 0x1a22b4a4ec0>, <Element a at 0x1a22b4a4f00>, <Element li at 0x1a22b4a4f40>, <Element a at 0x1a22b4a4e40>, <Element li at 0x1a22b4a4f80>, <Element a at 0x1a22b4a4fc0>, <Element li at 0x1a22b4ae040>, <Element a at 0x1a22b4ae080>, <Element li at 0x1a22b4ae0c0>, <Element a at 0x1a22b4ae100>]
节点名称: ['html', 'body', 'div', 'ul', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a']
所有li节点: [<Element li at 0x1a22b4a4e00>, <Element li at 0x1a22b4a4ec0>, <Element li at 0x1a22b4a4f40>, <Element li at 0x1a22b4a4f80>, <Element li at 0x1a22b4ae040>, <Element li at 0x1a22b4ae0c0>]
获取指定li节点: <Element li at 0x1a22b4a4ec0>
获取指定节点HTML代码: <li> <a href="javascript:" onclick="login(0)" title="Java的版本">Java的版本</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="Java API文档">Java API文档</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="JDK的下载">JDK的下载</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="JDK的安装">JDK的安装</a> </li>
        <li> <a href="javascript:" onclick="login(0)" title="配置JDK">配置JDK</a> </li>
    </ul>
</div>
</body></html>

jupyter运行结果如下

```python
数据类型: <class 'list'>
数据长度: 16
数据内容: [<Element html at 0x7fbb2ea7b9c8>, <Element body at 0x7fbb2d596948>, <Element div at 0x7fbb2d596888>, <Element ul at 0x7fbb2d596988>, <Element li at 0x7fbb2d5969c8>, <Element a at 0x7fbb2d596a48>, <Element li at 0x7fbb2d596a88>, <Element a at 0x7fbb2d596ac8>, <Element li at 0x7fbb2d596b08>, <Element a at 0x7fbb2d596a08>, <Element li at 0x7fbb2d596b48>, <Element a at 0x7fbb2d596b88>, <Element li at 0x7fbb2d596bc8>, <Element a at 0x7fbb2d596c08>, <Element li at 0x7fbb2d596e88>, <Element a at 0x7fbb2d596ec8>]
节点名称: ['html', 'body', 'div', 'ul', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a', 'li', 'a']
所有li节点: [<Element li at 0x7fbb2d5969c8>, <Element li at 0x7fbb2d596a88>, <Element li at 0x7fbb2d596b08>, <Element li at 0x7fbb2d596b48>, <Element li at 0x7fbb2d596bc8>, <Element li at 0x7fbb2d596e88>]
获取指定li节点: <Element li at 0x7fbb2d596a88>
获取指定节点HTML代码: <li> <a href="javascript:" onclick="login(0)" title="Java的版本">Java的版本</a> </li>

各自运行结果截图如下

img

img

按理说返回结果应该一致,并且是和jupyter一致,但不知为什么pycharm会出错,并且尝试了其他html的文档,或者直接网页响应返回的文档,结果都是一致,都是当获取一个标签时,内容会包含到最后的html标签

img


你的代码我在我的pychram运行正常的,你pycharm设置的py解释器和你jupyter的py解释器是否一致