Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
601 views
in Technique[技术] by (71.8m points)

python - UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

And then I get the UnicodeDecodeError. I already tried to put the special comment # -*- coding: utf-8 -*- on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8') but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.

UPDATE

Traceback: Using only the special comment on the first line of the script

Traceback (most recent call last):
  File "D:Pythonlsecreatexml.py", line 151, in <module>
    tree.write("D:\python\lse\xmls" + items[ctr][0] + ".xml")
  File "C:Python27libxmletreeElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:Python27libxmletreeElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:Python27libxmletreeElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:Python27libxmletreeElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:Python27libxmletreeElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

Traceback: Using .encode('utf-8')

Traceback (most recent call last):
  File "D:Pythonlsecreatexml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

I used .decode('utf-8') and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to decode data from input string into unicode, before using it, to avoid encoding problems.

field.text = data.decode("utf8")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...