Python对xml文件进行修改报错UnicodeDecodeError

问题遇到的现象和发生背景

使用Python对xml文件进行编辑,定位到某节点然后将其删除掉,但是使用remove删除节点的语句处一直被报错UnicodeDecodeError。在网上找相似的问题解决方案也不能解决我的问题。我的python版本为3.9.7.

问题相关代码,请勿粘贴截图
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

with open(r'E:\03_ZD_Private\03_PyPrjs\createACfileAndWrite\createACfileAndWrite\MCAL_Modules.arxml',encoding='utf-8') as mcal_arxml:
    mcalTempLines=mcal_arxml.readlines() # 将arxml文件内容载入内存
   

with open(r'E:\03_ZD_Private\03_PyPrjs\createACfileAndWrite\createACfileAndWrite\McalModules000.xml','w',encoding='utf-8') as test_xml:
    test_xml.writelines(mcalTempLines)

test_tree = ET.parse('McalModules000.xml')# 载入,返回解析树
test_root = test_tree.getroot() # 获取root元素
print("the test_root tag is ", test_root) # 仅仅打印根元素的tag
print(test_root.tag, ":", root.attrib)  # 打印根元素的tag和属性
yy = test_root[0][8][0].text # 将root的第一个子节点的第八个子节点的第一个子节点的值赋给yy
print('the firs value short name: ',yy)# 打印root里第一个子节点的第八个子节点的第一个子节点也就是SHORT-NAME

ar_packages = 'AR-PACKAGES'# '{http://autosar.org/schema/r4.0}AR-PACKAGES'
ar_package  = 'AR-PACKAGE'
 # '{http://autosar.org/schema/r4.0}AR-PACKAGE'
shortName   = 'SHORT-NAME'# '{http://autosar.org/schema/r4.0}SHORT-NAME'

for ar_package in test_root.findall('.//AR-PACKAGE'):# ('.//{http://autosar.org/schema/r4.0}AR-PACKAGE'):
    print(ar_package.tag)
    # children=ar_package.getchildren()
    # print("these are the children of arpack: ------",children)
    packNm = ar_package[0].text
    packDefRef = ar_package[1][0][1].tag
    print(packNm)
    print(packDefRef)
    cnt1=cnt1+1
    if packNm=='Can':
        canModNum=cnt1
        print("find This Package !!!!!!!!!!",canModNum)
        test_root.remove(ar_package)

test_root.write('temp00.xml',encoding='utf-8',xml_declaration=True)

运行结果及报错内容

报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 732: invalid start byte

我的解答思路和尝试过的方法

img

试试别的编码格式
https://www.cnblogs.com/huangchenggener/p/10983812.html

有些数据是字节类型utf-8无法对其进行解码码,,如果这些数据无关紧要可以使用
test_root.write('temp00.xml',encoding='utf-8',xml_declaration=True, errors='ignore')忽略报错
或者使用decode解码

img


byte = b'\xd7\xd6\xbd\xda\xbd\xe2\xc2\xeb'
# print("UTF-8 解码:", byte.decode('UTF-8', 'strict'))
print("GBK 解码:", byte.decode('GBK', 'strict'))


我用记事本打开我的.py文件,右下角显示的编码方式是ANSI;另存为同样的文件名,编码格式改为utf-8即可

img