UnicodeDecodeError：utf-8编码出错

问题遇到的现象和发生背景

使用Bert-base-Chinese实现聊天机器人，原来使用的是英文模型

问题相关代码，请勿粘贴截图


train_data, validation_data, test_data = TabularDataset.splits(
    path='datasets/',
    format='csv',
    train='sample_IM5000-6000.csv',
    validation='sample_IM5000-6000.csv',
    test='sample_IM5000-6000.csv',
    skip_header=False,
    fields=g_data_fields)

运行结果及报错内容

Some weights of the model checkpoint at /content/drive/MyDrive/Bert_GPT2/ChatBotBertTransformer/bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "model.py", line 70, in <module>
    fields=g_data_fields)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 78, in splits
    os.path.join(path, train), **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 271, in __init__
    examples = [make_example(line, fields) for line in reader]
  File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 271, in <listcomp>
    examples = [make_example(line, fields) for line in reader]
  File "/usr/local/lib/python3.7/dist-packages/torchtext/utils.py", line 143, in unicode_csv_reader
    for line in csv.reader(unicode_csv_data, **kwargs):
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 28: invalid continuation byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 28: invalid continuation byte

我的解答思路和尝试过的方法

我想要达到的结果

报错内容是什么

哥，一样的问题，您解决了吗