🔥、🌹、👏这类符号在win10自带的记事本中显示乱码,在notepad++中直接显示编码。在使用python读取含有这类符号的文本(编码方式为UTF-8)时,使用UTF-8编码方式读取报编码错误,但是使用ISO-8859-1编码就能去读成功。然后特地把这些符号的二进制编码摘出来试着用UTF-8解码,还是报编码错误。
# 一 读取时编码报错
with open('./utf8.csv', 'r', encoding='utf-8') as f:
text = f.read()
运行后提示错误如下:
Traceback (most recent call last):
File "G:/PycharmProject/audit/utf8_to_ansi/test.py", line 27, in <module>
text = f.read()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 4037: invalid continuation byte
# 二 单独编码时报错
b1 = b'\xF0\x9F\x91\x8F'
b2= b'\xF0\x9F\x94\xA5'
print(b1)
print(b2)
print(b2.decode('utf-8'))
运行后错误提示如下:
Traceback (most recent call last):
File "G:/PycharmProject/audit/utf8_to_ansi/test.py", line 24, in <module>
print(b'\xed\xa0\xbd\xed\xb4\xa5'.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
可能是你的csv文件保存的编码格式是ansi,而不是utf8,另存为utf-8格式,然后用utf-8或者utf-8-sig进行解码即可,示例如下:
b1 = b'\xF0\x9F\x91\x8F'
b2 = b'\xF0\x9F\x94\xA5'
print(b1)
print(b2)
print(b2.decode('utf-8-sig'))
#输出#🔥