如何用BeautifulSoup方法提取网页源代码中的标签,将提取出的数据,依次放入excel文件中
import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
wookbook = xlwt.Workbook() #创建工作簿
sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True) #创建sheet,名字为Sheet_one
headlist = ['序号','列名','英文名'] #表头数据
row = 0
col = 0
# 写入表头数据
for head in headlist:
sheet1.write(row, col, head)
col = col + 1
html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>
"""
soup = BeautifulSoup(html, 'lxml')
tr_list = soup.find_all('tr')[1:]
for th in soup.select('th'):
print(th['data-field'])
headlist=th['data-field']
row = 1 # 从表格的第二行开始写入数据
for c, top in enumerate(headlist):
sheet1.write(row, 2, top) # rou代表列,col代表行,top.text写入值
row += 1
#
wookbook.save(r'D:\test.xls')
需要的数据是输出出来了,但是我导入excel文件时就导入失败了
我尝试各种思路去取出数据,也曾把他们弄成列表,但是他们是每个数据各成一个列表,然后再导入的话,只导入第一个,不能依次导入成功。
import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
wookbook = xlwt.Workbook() #创建工作簿
sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True) #创建sheet,名字为Sheet_one
headlist = ['序号','列名','英文名'] #表头数据
row = 0
col = 0
# 写入表头数据
for head in headlist:
sheet1.write(row, col, head)
col = col + 1
html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>
"""
soup = BeautifulSoup(html, 'lxml')
for th in soup.select('th'):
headlist=th['data-field']
A = headlist.split()
print(A)
row = 1 # 从表格的第二行开始写入数据
for c, top in enumerate(A):
sheet1.write(row, 2, top) # rou代表列,col代表行,top.text写入值
row += 1
#
wookbook.save(r'D:\test.xls')
结果输入了一个列表,而且那三个列表我感觉是一个列表名。。
headlist是一个列表,直接往后面加就不对了
import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
wookbook = xlwt.Workbook() #创建工作簿
sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True) #创建sheet,名字为Sheet_one
headlist = ['序号','列名','英文名'] #表头数据
row = 0
col = 0
# 写入表头数据
for head in headlist:
sheet1.write(row, col, head)
col = col + 1
html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>
"""
datalist = []
data = {}
index = 0
soup = BeautifulSoup(html, 'lxml')
for th in soup.select('th'):
index = index+1
data = {'序号':index, '列名':'列名', '英文名':th['data-field']}
datalist.append(data)
print(data)
print(datalist)
row = 1 # 从表格的第二行开始写入数据
for c in datalist:
sheet1.write(row, 2, c['英文名']) # rou代表列,col代表行,top.text写入值
row += 1
wookbook.save(r'D:\test.xls')
既然每个数据都是一个列表,那可以作为列表来处理,
比如数据1是list1=[name,age,class],那么,可以用list1[0]取出name单独存入excel,最好是能发出来你的数据看一下具体情况
每个数据都是一个列表,那可以作为列表来处理,
比如数据1是list1=[name,age,class],那么,可以用list1[0]取出name单独存入excel,
我是用openpyxl的,对于一个列表a,shht.append(a),就可以顺利给一张表添加一行