python如何去除dataframe某一单元格中重复的单词?

问题遇到的现象和发生背景

img


例如某一单元格中有("canola", "soybeans", "soybeans", "wheat", "canola", "wheat", "canola", "soybeans", "canola", "wheat", "canola", "soybeans", "canola", "soybeans", "wheat", "canola",
"soybeans", "canola","wheat", "canola","wheat", "soybeans","soybeans", "wheat", "soybeans", "canola", "soybeans", "canola"),
现在我想让这个单元格变为("canola","soybeans","wheat"),不具有重复项

问题相关代码,请勿粘贴截图
运行结果及报错内容
我的解答思路和尝试过的方法
我想要达到的结果

方法很多,举一个例子:


from pandas import Series,DataFrame

list = ["canola", "soybeans", "soybeans", "wheat", "canola", "wheat", "canola", "soybeans", "canola", "wheat", "canola", "soybeans", "canola", "soybeans", "wheat", "canola",
"soybeans", "canola","wheat", "canola","wheat", "soybeans","soybeans", "wheat", "soybeans", "canola", "soybeans", "canola"]


frame = DataFrame(list)

df = frame.drop_duplicates()

print(df)

# 输出如下所示:
#           0
# 0    canola
# 1  soybeans
# 3     wheat

去重代码段:

print('原内容:'+cell.value)
result = ""
list = []
for i in cell.value.split(','):  # 分隔符为,
    if list.count(trim(i)) == 0:
        list.append(trim(i))
        result += trim(i)+','
print('目标内容:'+result[:-1])

可以用numpy的unique,比如有一个字符串列表

np.unique(strlist)

主要的代码:

df2_name = df1_name.drop_duplicates(subset=['username'], keep='first',inplace=False)

1
常用的参数就这三个:

subset
根据哪个或者哪些字段进行去重
[‘a’] 对a列进行去重
[‘a’, ‘b’] 对 a与b的组合 进行去重
keep {‘first’, ‘last’, False}
数据保留的原则
first : 保留第一个
last : 保留最后一个
False : 只要是重复的都不要
inplace {True, False}
在原本的dataframe上做修改还是复制一份后,对复制的数据进行修改
————————————————
版权声明:本文为CSDN博主「limingxin007」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/limingxin007/article/details/118699404