使用Bert-base-Chinese实现聊天机器人,原来使用的是英文模型
train_data, validation_data, test_data = TabularDataset.splits(
path='datasets/',
format='csv',
train='sample_IM5000-6000.csv',
validation='sample_IM5000-6000.csv',
test='sample_IM5000-6000.csv',
skip_header=False,
fields=g_data_fields)
Some weights of the model checkpoint at /content/drive/MyDrive/Bert_GPT2/ChatBotBertTransformer/bert-base-chinese were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "model.py", line 70, in <module>
fields=g_data_fields)
File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 78, in splits
os.path.join(path, train), **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 271, in __init__
examples = [make_example(line, fields) for line in reader]
File "/usr/local/lib/python3.7/dist-packages/torchtext/data/dataset.py", line 271, in <listcomp>
examples = [make_example(line, fields) for line in reader]
File "/usr/local/lib/python3.7/dist-packages/torchtext/utils.py", line 143, in unicode_csv_reader
for line in csv.reader(unicode_csv_data, **kwargs):
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 28: invalid continuation byte
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 28: invalid continuation byte
报错内容是什么
哥,一样的问题,您解决了吗