如何用BeautifulSoup方法提取网页代码中的标签,将提取出的数据,依次放入excel文件中

如何用BeautifulSoup方法提取网页源代码中的标签,将提取出的数据,依次放入excel文件中

问题相关代码
import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
    wookbook  = xlwt.Workbook() #创建工作簿
    sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True)  #创建sheet,名字为Sheet_one
    headlist = ['序号','列名','英文名'] #表头数据
    row = 0
    col = 0
    # 写入表头数据
    for head in headlist:
        sheet1.write(row, col, head)
        col = col + 1

html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>

"""
soup = BeautifulSoup(html, 'lxml')
tr_list = soup.find_all('tr')[1:]
for th in soup.select('th'):
 print(th['data-field'])
headlist=th['data-field']

row = 1  # 从表格的第二行开始写入数据
for c, top in enumerate(headlist):
 sheet1.write(row, 2, top)  # rou代表列,col代表行,top.text写入值
row += 1
#

wookbook.save(r'D:\test.xls')

运行结果及报错内容

需要的数据是输出出来了,但是我导入excel文件时就导入失败了

img

img

我的解答思路和尝试过的方法

我尝试各种思路去取出数据,也曾把他们弄成列表,但是他们是每个数据各成一个列表,然后再导入的话,只导入第一个,不能依次导入成功。

曾经将数据转换成列表,然后再导入:

import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
    wookbook  = xlwt.Workbook() #创建工作簿
    sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True)  #创建sheet,名字为Sheet_one
    headlist = ['序号','列名','英文名'] #表头数据
    row = 0
    col = 0
    # 写入表头数据
    for head in headlist:
        sheet1.write(row, col, head)
        col = col + 1

html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>

"""
soup = BeautifulSoup(html, 'lxml')
for th in soup.select('th'):
    headlist=th['data-field']
    A = headlist.split()
    print(A)



row = 1  # 从表格的第二行开始写入数据
for c, top in enumerate(A):
 sheet1.write(row, 2, top)  # rou代表列,col代表行,top.text写入值
row += 1
#

wookbook.save(r'D:\test.xls')

img

img

结果输入了一个列表,而且那三个列表我感觉是一个列表名。。

我想要达到的结果

img

headlist是一个列表,直接往后面加就不对了


 
import xlwt
from bs4 import BeautifulSoup
if __name__=="__main__":
    wookbook  = xlwt.Workbook() #创建工作簿
    sheet1 = wookbook.add_sheet('Sheet_one',cell_overwrite_ok=True)  #创建sheet,名字为Sheet_one
    headlist = ['序号','列名','英文名'] #表头数据
    row = 0
    col = 0
    # 写入表头数据
    for head in headlist:
        sheet1.write(row, col, head)
        col = col + 1
 
html = """
<thead>
<tr>
<th class="bs-checkbox " style="width: 36px; " data-field="ck" tabindex="0"><div class="th-inner ">
<input name="btSelectAll" type="checkbox"></div><div class="fht-cell"></div></th>
<th style="" data-field="isInside" tabindex="0"><div class="th-inner ">属于移动网内还是网外号码</div><div class="fht-cell"></div></th>
<th style="" data-field="businessCategory" tabindex="0"><div class="th-inner ">业务类别(呼入接入方式)</div><div class="fht-cell"></div></th>
 
"""
datalist = []
data = {}
index = 0
soup = BeautifulSoup(html, 'lxml')
for th in soup.select('th'):
    index = index+1
    data = {'序号':index, '列名':'列名', '英文名':th['data-field']}
    datalist.append(data)
    print(data)
 
print(datalist)
 
 
row = 1  # 从表格的第二行开始写入数据
for c in datalist:
    sheet1.write(row, 2, c['英文名'])  # rou代表列,col代表行,top.text写入值
    row += 1
 
wookbook.save(r'D:\test.xls')

既然每个数据都是一个列表,那可以作为列表来处理,
比如数据1是list1=[name,age,class],那么,可以用list1[0]取出name单独存入excel,最好是能发出来你的数据看一下具体情况

每个数据都是一个列表,那可以作为列表来处理,
比如数据1是list1=[name,age,class],那么,可以用list1[0]取出name单独存入excel,

我是用openpyxl的,对于一个列表a,shht.append(a),就可以顺利给一张表添加一行