大家好,我用python读取txt文档,部分记录显示为乱码;
该文档是用ansi编码的,大概40万+条记录,其中有8条记录是乱码。
下面是我的代码
file_read = open(path, errors='ignore')
need = []
while True:
text = file_read.readline()
need.append(text)
if not text:
break
file_read.close()
执行的结果:
q
q * qs)#! P P P
" . L ) 9 8 9 8 -08-27@
" . L ) 9 8 9 8 17300@1100@-6000@99829811@24220@CNY@2021-08-27@
2021-U S B 8068086@SA201@08032272@S@S@5@2705@2695@2706@-1100@-100@18806631@24354@CNY@2021-08-27@
2021-09-02@8008068086@SA201@00498433@S@S@2@2714@2695@2706@-440@320@1880663 R ` r( $ " R U S B D i s k 2 . 0 (7 7 7 7 2 1 6 8 4 A 4 0 7 5 2 3 2 2 6 @xV VendorCoProductCode 2.00 q
2021-09-02@9080001233@CF201@00136384@B@S@1@17595@17245@17 R ` r( $ " R U S B D i s k 2 . 0 (7 7 7 7 2 1 6 8 4 A 4 0 7 5 2 3 2 2 6 @xV VendorCoProductCode 2.00 q
2021-09-02@9080001233@CF201@00165905@B@S@1@17595@17245@1U S B -1475@99829811@6055@CNY@2021-08-27@
网上查了一些资料:
file_read = open("path,","r", encoding='ISO-8859-1')
need = []
while True:
text = file_read.readline()
text = text.encode("iso-8859-1").decode('gbk')
need.append(text)
if not text:
break
file_read.close()
报错了:UnicodeDecodeError: 'gbk' codec can't decode byte 0x9a in position 97: illegal multibyte sequence
各位,谢谢
file_read = open("path,","r", encoding='gbk')
试试
from chardet import detect
with open(file_path, "rb")as file:
encoding = detect(file.read())["encoding"]
with open(file_path, "r", encoding=encoding)as file:
content = file.read()
print(content)