Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
293 views
in Technique[技术] by (71.8m points)

json - Convert escaped characters with python

First sorry for my english

I have to convert strings from a json file like the following:

{"detalle":"el Expediente Nu00b0u00a030 de la Resoluciu00f3n 11..."}

In something like:

{"detalle":"el Expediente N° 30 de la Resolución 11..."}

to then write it in a txt.

I tried:

json.dumps({"detalle":"el Expediente Nu00b0u00a030 de la Resoluciu00f3n 11..."}, ensure_ascii=False).encode('utf8')

that returns

'{"detalle": "el Expediente N\\u00b0\\u00a030 de la Resoluci\\u00f3n 11..."}'

How can I convert it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

(In this answer, I'm assuming you use Python 2.)

First, let me explain why your snippet returns something different than you expect:

r1 = json.dumps({"detalle":"el Expediente Nu00b0u00a030 de la Resoluciu00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r1)
r2 = json.dumps({"detalle":u"el Expediente Nu00b0u00a030 de la Resoluciu00f3n 11..."}, ensure_ascii=False).encode('utf8')
print(r2)

This outputs:

{"detalle": "el Expediente N\u00b0\u00a030 de la Resoluci\u00f3n 11..."}
{"detalle": "el Expediente N°?30 de la Resolución 11..."}

The difference is, that in the first case, the input string is ascii code, with slashes and other characters to represent special characters, and in the second case, the string is a unicode string with unicode characters. The second case is what you want.

Based on this, here is what I understand from your problem:

Normally when you read a JSON file with the json module, the strings (which are escaped in the JSON file) are unescaped by the parser. If you still see escaped characters, that indicates that the strings were (accidentally?) double escaped in the JSON file. In that case, try an extra unescape with s.decode('unicode-escape'):

data["detalle"] = data["detalle"].decode('unicode-escape')

Once you have proper unicode strings loaded in Python, converting them to bytes with s.encode('utf8') and writing the result to a file, is correct.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...