UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 596: illegal multibyte sequence

if __name__ == '__main__':
    ws = Word2Sequence()
    path = r"D:\data\Desktop\aclImdb_v1.tar\aclImdb_v1\aclImdb\train"
    temp_data_path = [os.path.join(path,"pos"),os.path.join(path,"neg")]
    for data_path in temp_data_path:
        file_paths = [os.path.join(data_path,file_name) for file_name in os.listdir(data_path) if file_name.endswith("txt")]
        for file_path in tqdm(file_paths):
            sentence = tokenlize(open(file_path).read())
            ws.fit(sentence)
    ws.build_vocab(min=10,max_feature=5000)
    pickle.dump(ws, open("../pythonProject/ws.pkl",'rb'))   
    print(len(ws))

按照网上的解答，在open()里加入encoding
pickle.dump(ws, open("../pythonProject/ws.pkl",'rb',encoding='utf-8'))
但不论是加入encoding='utf-8'还是再加一个error='ignore'
依然报相同的错误
是不是环境的问题啊？

在第5行中，应将open函数的参数改为'rb'；
在第13行中，应将print函数的参数改为len(ws.vocab)；
在第14行中，应将open函数的参数改为'wb'；

经过修改后，代码如下：


```python



if __name__ == '__main__':
    ws = Word2Sequence()
    path = r"D:\data\Desktop\aclImdb_v1.tar\aclImdb_v1\aclImdb\train"
    temp_data_path = [os.path.join(path,"pos"),os.path.join(path,"neg")]
    for data_path in temp_data_path:
        file_paths = [os.path.join(data_path,file_name) for file_name in os.listdir(data_path) if file_name.endswith("txt")]
        for file_path in tqdm(file_paths):
            sentence = tokenlize(open(file_path, 'rb').read())
            ws.fit(sentence)
    ws.build_vocab(min=10,max_feature=5000)
    pickle.dump(ws, open("../pythonProject/ws.pkl",'wb'))   
    print(len(ws.vocab))

```

第一，encoding='utf-8'加上。
第二，pickle.dump的用法是：dump(obj, file): 将Python对象序列化，并将序列化结果写入到打开的文件中。你把rb换成wb。
试试看行不行。

不知道你这个问题是否已经解决, 如果还没有解决的话:

这有个类似的问题, 你可以参考下: https://ask.csdn.net/questions/7616559
你也可以参考下这篇文章：UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 9737: illegal multibyte sequence

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^