python新手,纯小白,想从txt中批量提取日期,求助论坛中的高手,多谢

日期是8位数字,在collection_date:之后,例如/collection_date:20120403
使用pycharm编写了程序:

import re

with open ('3215 HBV1.txt') as f:
   for dates in f:
      collection = re.findall(r'collection_date:(\d\d\d\d\d\d\d\d)',dates)
print(collection[0])


总是报错:
Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/untitled2/TXTextract.py", line 7, in <module>
print(collection[0])
IndexError: list index out of range


请高人指教,程序应该如何调整,多谢

IndexError: list index out of range,说明你的collection根本就没有0这个下标,打印整个collection先看看它的结构

应该没有找到匹配的字段,collection是一个空的列表

朋友,正如上面两位所说, collection为空,其实就是你运用函数 re.findall失败了。没有读取相应的数据。
我没用过 re.findall这个函数,如果可以请你把txt一行的所有数据贴一贴,只有知道txt的数据排布,才能相应的提取想要的数据

import re
from datetime import datetime

with open('HBV1.txt') as f:
    for line in f:
        # TODO the date string is maybe not a date, the regular expression can be enhanced here
        collection_date = re.findall(r'collection_date:(\d{8})', line)

        if collection_date:
            print(collection_date[0])
            collection_date = datetime.strptime(collection_date[0], '%Y%m%d')
            print(datetime.strftime(collection_date, '%Y-%b-%d'))
        else:
            print('this line has no collection date, line content {}'.format(line))


re正则表达式这个括号括起来的是从1开始的,比如你括了3个括号,那第一个括号括起来的就是collect(1),你可以试试