You are trying to encode bytestrings:
>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)
Python is trying to be helpful, you can only encode a Unicode string to bytes, so to encode Python first implictly decodes, using the default encoding.
The solution is to not encode data that is already encoded, or first decode using a suitable codec before trying to encode again, if the data was encoded to a different codec than what you needed.
If you have a mix of unicode and bytestring values, decode just the bytestrings or encode just the unicode values; try to avoid mixing the types. The following decodes byte strings to unicode first:
def ensure_unicode(v):
if isinstance(v, str):
v = v.decode('utf8')
return unicode(v) # convert anything not a string to unicode too
output_string = u'
'.join([ensure_unicode(line) for line in output_lines])
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…