跟着书里的例子学习bs4,通过向bs4.beautifulSoup()传递一个File对象后,type()其类型。发现问题:
exampleFile = open('example.html')
# print(exampleFile.read())
exampleSoup = bs4.BeautifulSoup(exampleFile,'html.parser')
print(type(exampleSoup))
elems = exampleSoup.select('#author')
print(type(elems))
print(len(elems))
print(type(elems[0]))
print(elems[0].getText())
print(str(elems[0]))
print(elems[0].attrs)
问题1. 第二行返回类型不是书中说的list
<class 'bs4.BeautifulSoup'>
<class 'bs4.element.ResultSet'>
1
<class 'bs4.element.Tag'>
Song Wei
<span id="author">Song Wei</span>
{'id': 'author'}
[Finished in 327ms]
问题2. 加入print(exampleFile.read())后,程序报了list越界错
<html>
<head><title>The Website Title</title></head>
<body>
<p>Download my <strong>Python</strong> book from <a href="http://www.baidu.com">my site</a>.</p>
<p class="slogan">Learn python the easy way!</p>
<p>By <span id="author">Song Wei</span></p>
</body>
</html>Traceback (most recent call last):
<class 'bs4.BeautifulSoup'>
<class 'bs4.element.ResultSet'>
0
File "D:\chwlsw\py-test\chapter11_web\mapIt.py", line 38, in <module>
print(type(elems[0]))
IndexError: list index out of range
[Finished in 301ms]
请帮忙勘误,感谢!
正常select后应该返回list,这是select的根源码:
具体情况可能需要知道你的bs4版本,我的是4.7.1版本的