需要解决问题:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte
movies_title = ['MovieID', 'Title', 'Genres']
movies = pd.read_table('./ml-1m/movies.dat',sep='::',header=None, names=movies_title, engine = 'python')
movies.head()
报错:
文件内容:
from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()
@app.get("/{path}")
def index(path):
print(path)
with open(f"WebServer_html/{path}","r",encoding="UTF-8") as f:
content = f.read()
return Response(content=content)
uvicorn.run(app,host="192.168.82.163",port=9999)
运行后出错:
更改成gbk格式也不行
from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()
@app.get("/{path}")
def index(path):
print(path)
with open(f"WebServer_html/{path}","r",encoding="GBK") as f:
content = f.read()
return Response(content=content)
uvicorn.run(app,host="192.168.82.163",port=9999)
我试了网上的各种方法:包括如下
#-*- coding : utf-8-*-
# coding:utf-8
from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()
@app.get("/{path}")
def index(path):
print(path)
with open(f"WebServer_html/{path}","r",encoding="UTF-8") as f:
content = f.read()
return Response(content=content)
uvicorn.run(app,host="192.168.82.163",port=9999)
from fastapi import FastAPI,Response
import uvicorn
app=FastAPI()
@app.get("/{path}")
def index(path):
print(path)
with open(f"WebServer_html/{path}","r",encoding="unicode_escape") as f:
content = f.read()
return Response(content=content)
uvicorn.run(app,host="192.168.82.163",port=9999)
对于UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte的问题,原因是出现了无法进行转换的二进制数据造成的。可以写一个小的脚本来判断下,是否是整体的字符集参数选择上出现了问题,还是出现了部分的无法转换的二进制块。具体操作代码如下:
f = open("file_path", "rb")
i = 0
while True:
i += 1
print(i)
line = f.readline()
if not line:
break
else:
try:
line.decode('utf8')
# 为了暴露出错误,最好此处不print
except:
print(str(line))
在读取时也可以使用二进制模式打开的文件(包括模式参数中的'b')将内容作为字节对象,而不进行任何解码。然后使用line.decode('gbk', errors='ignore')
解码,其中的errors参数需要改为ignore
。
另外针对'utf-8' codec can't decode byte 0xd6 in position xx: invalid continuation byte的问题,可以在read_csv中添加encoding='ISO-8859-1'
参数来解决。具体代码如下:
import pandas as pd
data = pd.read_csv('file_path', encoding='ISO-8859-1')
对于'str' object has no attribute 'decode'问题,需要对字符串先编码成二进制,再解码。具体代码如下:
'张俊'.encode('utf-8').decode('utf-8')
需要注意的是,如果使用'张俊'.encode('utf-8').decode('gbk')解码,结果会与上面不同。