分组:2 打印结果依次为 i,b1,P1,breakpoint,b2,P2
和生物燃料的线性关系
IL6,IL10无断点
Version:1.0 StartHTML:0000000107 EndHTML:0000004359 StartFragment:0000000127 EndFragment:0000004341
] "此时变量y 为:"
[1] "IL1b"
[1] "IL1b" "-0.107648809523819" "0.339014229656206" "22.7457597598328" "0.103020721767575"
[6] "0.361098065608572"
[1] "breakpoint:"
[1] 22.74576
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
[1] "此时变量y 为:"
[1] "IL5"
[1] "IL5" "0.0393546517271973" "0.507875620155004" "44.9984318923382" "-0.0739995182265517"
[6] "0.25727508099701"
[1] "breakpoint:"
[1] 44.99843
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
[1] "此时变量y 为:"
[1] "IL13"
[1] "IL13" "0.376060723360758" "0.0653383537180709" "49.9999861005274" "-0.542856877790935"
[6] "0.0169611166504877"
[1] "breakpoint:"
[1] 49.99999
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
这个word文件,我想把 "IL1b" "-0.107648809523819" "0.339014229656206" "22.7457597598328" "0.103020721767575" 这行中的字符串内容提取出来,还有下面那行[6]后面的 "0.361098065608572"提取出来,
把这6项存入csv表格的一行中。
然后再依次把"IL5" 、"IL13"后面的字符串也提出来,存入CSV中,请问,应该如何操作。
用python-docx模块读取word文档,
再用正则表达式提取需要的内容保存为列表再存入CSV
代码如下:{如果对你有帮助,可以给我个采纳吗,谢谢!! 点击我这个回答右上方的【采纳】按钮}。
import re
import csv
# 我先用字符串代替了,你可以改为用python-docx模块读取word文档,
wordtext = '''
IL6,IL10无断点
Version:1.0 StartHTML:0000000107 EndHTML:0000004359 StartFragment:0000000127 EndFragment:0000004341
] "此时变量y 为:"
[1] "IL1b"
[1] "IL1b" "-0.107648809523819" "0.339014229656206" "22.7457597598328" "0.103020721767575"
[6] "0.361098065608572"
[1] "breakpoint:"
[1] 22.74576
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
[1] "此时变量y 为:"
[1] "IL5"
[1] "IL5" "0.0393546517271973" "0.507875620155004" "44.9984318923382" "-0.0739995182265517"
[6] "0.25727508099701"
[1] "breakpoint:"
[1] 44.99843
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
[1] "此时变量y 为:"
[1] "IL13"
[1] "IL13" "0.376060723360758" "0.0653383537180709" "49.9999861005274" "-0.542856877790935"
[6] "0.0169611166504877"
[1] "breakpoint:"
[1] 49.99999
[1] "-----------------------(end)-----------------------"
[1] "-----------------------(start)-----------------?------"
'''
li = re.findall(r'\[1\]\s*"(.+?)"\s*"(.+?)"\s*"(.+?)"\s*"(.+?)"\s*"(.+?)"\s*\[6\]\s*"(.+?)"',wordtext, re.M)
print(*li,sep="\n")
with open("data.csv","w",newline="") as fileObj:
csv.writer(fileObj).writerows(li)
您的问题已经有小伙伴解答了,请点击【采纳】按钮,采纳帮您提供解决思路的答案,给回答的人一些鼓励哦~~
ps:开通问答VIP,享受5次/月 有问必答服务,了解详情↓↓↓
【电脑端】戳>>> https://vip.csdn.net/askvip?utm_source=1146287632
【APP 】 戳>>> https://mall.csdn.net/item/52471?utm_source=1146287632
太感谢了。请问,re.M这个参数是指的什么?