def html_processpage(path):
h=open(path,'rb')
soup = BeautifulSoup(h,'lxml',parse_only=SoupStrainer("a"))
h.close()
for a in soup.find_all('a', href=True):
if a['href'] and not a['href'].startswith("#"):
print(a['href'])
a['href'] ="javascript:hrefblocked('"+a['href']+"');"
with open(path, "w",encoding='utf-8') as file:
file.write(str(soup))
当页面较大(100k左右,含有base64后的图片)时速度非常缓慢,请问各位该如何优化?
运行环境:华为云学生机 2g 1核心 Ubuntu 18.04.2 LTS
链接:https://idealdoc.idealbroker.cn/1.page
不知道你这个问题是否已经解决, 如果还没有解决的话: