pd.to_datetime报错Out of bounds nanosecond timestamp

我在使用pd.to_datetime的时候出现了以下的问题:
第一个是我的测试代码,用df = read_chunks.get_chunk()读取200000行的数据,然后用pd.to_datetime将日期做标准化处理,能够正常运行得到结果。

read_chunks  = pd.read_csv(r'D:训练用数据.csv',encoding='gbk', iterator=True,chunksize=200000)
df = read_chunks.get_chunk()        # 读取当前的分块
df['日期'] = pd.to_datetime(df['日期'])

# 两个时间之差
cha = (df['日期'] - datetime(2018,3,1)).dt.days

df['day'] = df['日期'].dt.day
df['weekday'] = df['日期'].dt.weekday
df['week'] = (cha//7)+1
df['hour'] = df['时间'].apply(lambda x: int(x.split(':')[0]))      # str.split("[")[1]. split("]")[0]输出的是 [ 后的内容以及 ] 前的内容。

print(df)

但是我对全部数据进行处理的时候,使用了for循环遍历并将chunksize改为了10000000(完整的数据有1.4亿行),却出现了报错:


read_chunks  = pd.read_csv(r'D:训练用数据.csv',encoding='gbk', iterator=True, chunksize=10000000)
# 这里加了iterator=True,df应该就不是dataframe的类型了,他能够使用df.get_chunk(chunksize)来分块读取
# 参数说明:
# iterator=True :开启迭代器
# chunksize=10000000:指定一个chunksize分块的大小来读取文件,此处是读取10000000个数据为一个块。

chunk_list = list()

# 遍历每一个分块,并且将分块放入chunk_list中
for df in read_chunks:
    df['日期'] = pd.to_datetime(df['日期'])

    # 两个时间之差
    cha = (df['日期'] - datetime(2018, 3, 1)).dt.days

    df['day'] = df['日期'].dt.day
    df['weekday'] = df['日期'].dt.weekday
    df['week'] = (cha // 7) + 1
    df['hour'] = df['时间'].apply(lambda x: int(x.split(':')[0]))  # str.split("[")[1]. split("]")[0]输出的是 [ 后的内容以及 ] 前的内容。

    chunk_list.append(df)

df_all = pd.concat(chunk_list, ignore_index=False)
print(df_all)

报错信息如下:


Traceback (most recent call last):
  File "D:\05tools\pycharm\xxxx.py", line 2187, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data.ravel("K"))
  File "pandas\_libs\tslibs\conversion.pyx", line 359, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/05tools/pycharm/xxxx.py", line 16, in <module>
    df['日期'] = pd.to_datetime(df['日期'])
  File "D:\05tools\pycharm\PycharmProjects\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 883, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "D:\05tools\pycharm\PycharmProjects\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 195, in _maybe_cache
    cache_dates = convert_listlike(unique_dates, format)
  File "D:\05tools\pycharm\PycharmProjects\venv\lib\site-packages\pandas\core\tools\datetimes.py", line 401, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "D:\05tools\pycharm\PycharmProjects\venv\lib\site-packages\pandas\core\arrays\datetimes.py", line 2193, in objects_to_datetime64ns
    raise err
  File "D:\05tools\pycharm\PycharmProjects\venv\lib\site-packages\pandas\core\arrays\datetimes.py", line 2175, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas\_libs\tslib.pyx", line 379, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 606, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 602, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 557, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslibs\conversion.pyx", line 516, in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject
  File "pandas\_libs\tslibs\np_datetime.pyx", line 120, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 18-04-01 00:00:00

Process finished with exit code 1

其中,日期是这样的:

img

想向各位请教一下,这种错误是什么引起的呢?该如何解决?谢谢~