a bytes-like object is required, not 'str' 报错

原来的代码


Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\notepad2.txt", line 5, in <module>
    article = file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 61: illegal multibyte sequence

报错 UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 61: illegal multibyte sequence

我改成


with open('file.txt', 'r') as file:
    keyword = [word.strip() for word in file.readlines()]

with open('file.docx', 'rb') as file:
    article = file.read()

for word in keyword:
    article = article.replace(word, f"<b>{word}</b>")

print(article)

  File "C:\Users\Administrator\Desktop\notepad2.txt", line 8, in <module>
    article = article.replace(word, f"<b>{word}</b>")
TypeError: a bytes-like object is required, not 'str'

怎么才能不报错?

我晕,with open('file.docx', 'rb') as file:
article = file.read()
这个word是二进制文件,不能直接这么读取,我的代码只能用于文本文件。
读取word需要专门的库

  • 这有个类似的问题, 你可以参考下: https://ask.csdn.net/questions/7616559
  • 这篇博客也不错, 你可以看下UnicodeDecodeError: 'gbk' codec can't decode byte 0xb4 in position 14: illegal multibyte sequence
  • 除此之外, 这篇博客: 解决UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0x84 in position 285: illegal multibyte sequence中的 问题描述 部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:
  • 之前运行yolov5 train.py没出现错误,今天再次运行报错UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x84 in position 285: illegal multibyte sequence

    D:\Anaconda\envs\pytorch\python.exe H:/PycharmProject/yolov5-5.0/train.py
    github: skipping check (not a git repository)
    YOLOv5  2022-3-29 torch 1.11.0 CUDA:0 (NVIDIA GeForce 940MX, 2047.875MB)
    
    Namespace(weights='yolov5s.pt', cfg='', data='data/coco128.yaml', hyp='data/hyp.scratch.yaml', epochs=300, batch_size=8, img_size=[640, 640], rect=False, resume=False, nosave=False, notest=False, noautoanchor=False, evolve=False, bucket='', cache_images=False, image_weights=False, device='', multi_scale=False, single_cls=False, adam=False, sync_bn=False, local_rank=-1, workers=0, project='runs/train', entity=None, name='exp', exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, upload_dataset=False, bbox_interval=-1, save_period=-1, artifact_alias='latest', world_size=1, global_rank=-1, save_dir='runs\\train\\exp9', total_batch_size=8)
    tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
    hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
    Traceback (most recent call last):
      File "H:\PycharmProject\yolov5-5.0\train.py", line 546, in <module>
        train(hyp, opt, device, tb_writer)
      File "H:\PycharmProject\yolov5-5.0\train.py", line 64, in train
        data_dict = yaml.load(f, Loader=yaml.SafeLoader)  # data dict
      File "D:\Anaconda\envs\pytorch\lib\site-packages\yaml\__init__.py", line 79, in load
        loader = Loader(stream)
      File "D:\Anaconda\envs\pytorch\lib\site-packages\yaml\loader.py", line 34, in __init__
        Reader.__init__(self, stream)
      File "D:\Anaconda\envs\pytorch\lib\site-packages\yaml\reader.py", line 85, in __init__
        self.determine_encoding()
      File "D:\Anaconda\envs\pytorch\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
        self.update_raw()
      File "D:\Anaconda\envs\pytorch\lib\site-packages\yaml\reader.py", line 178, in update_raw
        data = self.stream.read(size)
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 285: illegal multibyte sequence
    
    Process finished with exit code 1
    

  • 以下回答由chatgpt基于相关博客总结生成:

    根据参考资料中提供的信息,问题的报错信息是UnicodeDecodeError: 'gbk' codec can't decode byte 0x84 in position 61: illegal multibyte sequence,报错原因是字符编码问题。解决这个问题的方法是修改文件的编码方式。

    解决方案如下:

    1. 打开报错的文件,定位到报错的位置。根据问题描述,报错的位置是第61个字符。
    2. 使用合适的编码方式读取文件。根据报错信息,可以尝试使用'utf-8'编码。
    3. 使用open()函数来打开文件,并指定编码方式为'utf-8',如下所示:
    with open('your_file.txt', 'r', encoding='utf-8') as f:
        # 执行你的文件操作
    

    此处将'your_file.txt'替换为你的文件路径。使用这种方式打开文件,可以指定文件的编码方式为'utf-8',从而避免了报错。

    通过这种方式来处理文件的编码问题,你的代码应该能够正常运行而不再报错。