Python数据处理与清洗

df['collection'] = df['collection'].astype('string').str.strip()
df['collection'] = [int(str(i).replace('万','0000')) for i in df['collection']]
df['text'] = [str(i)[3:] for i in df['text']]
df['comments'] = [0 if '评论' in str(i).strip() else int(i) for i in df['comments']]

ValueError Traceback (most recent call last)
Input In [44], in()
1 df['collection'] = df['collection'].astype('string').str.strip()
----> 2 df['collection'] = [int(str(i).replace('万','0000')) for i in df['collection']]
3 df['text'] = [str(i)[3:] for i in df['text']]
4 df['comments'] = [0 if '评论' in str(i).strip() else int(i) for i in df['comments']]

Input In [44], in (.0)
1 df['collection'] = df['collection'].astype('string').str.strip()
----> 2 df['collection'] = [int(str(i).replace('万','0000')) for i in df['collection']]
3 df['text'] = [str(i)[3:] for i in df['text']]
4 df['comments'] = [0 if '评论' in str(i).strip() else int(i) for i in df['comments']]

ValueError: invalid literal for int() with base 10: '收藏'

小魔女参考了bing和GPT部分内容调写:
Python数据处理与清洗是一种常用的数据分析方法，它可以帮助我们更好地理解和分析数据。它可以将原始数据转换为可用的数据，并且可以将数据清洗成更容易分析的格式。

Python数据处理与清洗的一些常用技术包括：

数据类型转换：将数据类型从字符串转换为数值类型，或者将数值类型转换为字符串类型，可以使用astype()函数。例如：
```
df['collection'] = df['collection'].astype(str).str.strip('0')
```
数据格式化：将数据格式化为更容易分析的格式，可以使用replace()函数。例如：
```
df['collection'] = [int(str(i).replace('万', '0000')) for i in df['collection']]
```
数据清洗：将数据清洗为更容易分析的格式，可以使用str()和strip()函数。例如：
```
df['text'] = [str(i)[3] for i in df['text']]
df['comments'] = [0 if '评论' in str(i).strip() else int(i) for i in df['comments']]
```
回答不易，记得采纳呀。

这个错误是由于有些df['collection']列中的值只包含字符串"收藏"，无法转换成整数。您可以通过在列表推导式中添加条件语句来避免这个问题，例如：

df['collection'] = [int(str(i).replace('万','0000')) if isinstance(i, str) and '万' in i else i for i in df['collection']]

这个代码块将仅对包含"万"的字符串值进行转换，对于仅包含"收藏"字符串值的行，将保留原始值。