python 编码报错

需要解决问题：
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

movies_title = ['MovieID', 'Title', 'Genres']
movies = pd.read_table('./ml-1m/movies.dat',sep='::',header=None, names=movies_title, engine = 'python')
movies.head()

报错：

文件内容：

这个问题的回答你可以参考下: https://ask.csdn.net/questions/7487242
你也可以参考下这篇文章：Python 编码问题：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position
除此之外, 这篇博客: Python——报错UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x89 in position 0: invalid start byte中的 报错：UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x89 in position 0: invalid start byte 解决办法 部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:

from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()

@app.get("/{path}")
def index(path):
    print(path)
    with open(f"WebServer_html/{path}","r",encoding="UTF-8") as f:
        content = f.read()
        return Response(content=content)
        
uvicorn.run(app,host="192.168.82.163",port=9999)

运行后出错：

在这里插入图片描述

更改成gbk格式也不行

from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()

@app.get("/{path}")
def index(path):
    print(path)
    with open(f"WebServer_html/{path}","r",encoding="GBK") as f:
        content = f.read()
        return Response(content=content)
        
uvicorn.run(app,host="192.168.82.163",port=9999)

在这里插入图片描述
我试了网上的各种方法：包括如下

在代码前加上两行，实操不行

#-*- coding : utf-8-*-
# coding:utf-8
from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()

@app.get("/{path}")
def index(path):
    print(path)
    with open(f"WebServer_html/{path}","r",encoding="UTF-8") as f:
        content = f.read()
        return Response(content=content)
        
uvicorn.run(app,host="192.168.82.163",port=9999)

用编码格式 “unicode_escape”
运行确实不出错了，但是运行后中文变成乱码，不行

from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()

@app.get("/{path}")
def index(path):
    print(path)
    with open(f"WebServer_html/{path}","r",encoding="unicode_escape") as f:
        content = f.read()
        return Response(content=content)
        
uvicorn.run(app,host="192.168.82.163",port=9999)

以下回答由chatgpt基于相关博客总结生成:
对于UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte的问题，原因是出现了无法进行转换的二进制数据造成的。可以写一个小的脚本来判断下，是否是整体的字符集参数选择上出现了问题，还是出现了部分的无法转换的二进制块。具体操作代码如下：
```
f = open("file_path", "rb")
i = 0
while True:
    i += 1 
    print(i)
    line = f.readline()
    if not line:
        break
    else:
        try:
            line.decode('utf8')
            # 为了暴露出错误，最好此处不print
        except:
            print(str(line))
```
在读取时也可以使用二进制模式打开的文件（包括模式参数中的'b'）将内容作为字节对象，而不进行任何解码。然后使用line.decode('gbk', errors='ignore')解码，其中的errors参数需要改为ignore。
另外针对'utf-8' codec can't decode byte 0xd6 in position xx: invalid continuation byte的问题，可以在read_csv中添加encoding='ISO-8859-1'参数来解决。具体代码如下：
```
import pandas as pd
data = pd.read_csv('file_path', encoding='ISO-8859-1')
```
对于'str' object has no attribute 'decode'问题，需要对字符串先编码成二进制，再解码。具体代码如下：
```
'张俊'.encode('utf-8').decode('utf-8')
```
需要注意的是，如果使用'张俊'.encode('utf-8').decode('gbk')解码，结果会与上面不同。

http://t.csdn.cn/JfXx2