用duplicated去重的时候外部导入表格后,指定列会报错如:
指定外部Excel中name列查重
程序如下:
import pandas as pd
df = pd.read_excel(r'C:\Users\35059\Desktop\爬虫\333.xlsx')
print(df)
#验证重复数据
print(df.duplicated(subset=['name']))
报错:
Traceback (most recent call last):
File "C:\Users\35059\Desktop\爬虫\数据案例处理.py", line 13, in <module>
print(df.duplicated(subset=['name']))
File "D:\ANACONDA\lib\site-packages\pandas\core\frame.py", line 4885, in duplicated
raise KeyError(diff)
KeyError: Index(['name'], dtype='object')
但是在程序里自己敲个表格就不会报错:
from pandas import DataFrame
from pandas import Series
df = DataFrame({'age':Series(([1,5,6,5,5])),'name':Series(['ben','john','jerry','john','john'])})
print(df.duplicated('name'))
结果:
0 False
1 False
2 False
3 True
4 True
dtype: bool```
https://jingyan.baidu.com/article/02027811287e4e1bcd9ce55c.html