方法很多,举一个例子:
from pandas import Series,DataFrame
list = ["canola", "soybeans", "soybeans", "wheat", "canola", "wheat", "canola", "soybeans", "canola", "wheat", "canola", "soybeans", "canola", "soybeans", "wheat", "canola",
"soybeans", "canola","wheat", "canola","wheat", "soybeans","soybeans", "wheat", "soybeans", "canola", "soybeans", "canola"]
frame = DataFrame(list)
df = frame.drop_duplicates()
print(df)
# 输出如下所示:
# 0
# 0 canola
# 1 soybeans
# 3 wheat
去重代码段:
print('原内容:'+cell.value)
result = ""
list = []
for i in cell.value.split(','): # 分隔符为,
if list.count(trim(i)) == 0:
list.append(trim(i))
result += trim(i)+','
print('目标内容:'+result[:-1])
可以用numpy的unique,比如有一个字符串列表
np.unique(strlist)
主要的代码:
df2_name = df1_name.drop_duplicates(subset=['username'], keep='first',inplace=False)
1
常用的参数就这三个:
subset
根据哪个或者哪些字段进行去重
[‘a’] 对a列进行去重
[‘a’, ‘b’] 对 a与b的组合 进行去重
keep {‘first’, ‘last’, False}
数据保留的原则
first : 保留第一个
last : 保留最后一个
False : 只要是重复的都不要
inplace {True, False}
在原本的dataframe上做修改还是复制一份后,对复制的数据进行修改
————————————————
版权声明:本文为CSDN博主「limingxin007」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/limingxin007/article/details/118699404