python - In what world would \u00c3\u00a9 become é?

Question

Welcome To Ask or Share your Answers For Others

python - In what world would \u00c3\u00a9 become é?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - In what world would \u00c3\u00a9 become é?

I have a likely improperly encoded json document from a source I do not control, which contains the following strings:

du00c3u00a9cor

businessu00e2u20acu2122 active accounts 

the u00e2u20acu0153Made in the USAu00e2u20acu009d label

From this, I am gathering they intend for u00c3u00a9 to beceom é, which would be utf-8 hex C3 A9. That makes some sense. For the others, I assume we are dealing with some types of directional quotation marks.

My theory here is that this is either using some encoding I've never encountered before, or that it has been double-encoded in some way. I am fine writing some code to transform their broken input into something I can understand, as it is highly unlikely they would be able to fix the system if I brought it to their attention.

Any ideas how to force their input to something I can understand? For the record, I am working in Python.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:40+0000

You should try the ftfy module:

>>> print ftfy.ftfy(u"du00c3u00a9cor")
décor
>>> print ftfy.ftfy(u"businessu00e2u20acu2122 active accounts")
business' active accounts
>>> print ftfy.ftfy(u"the u00e2u20acu0153Made in the USAu00e2u20acu009d label")
the "Made in the USA" label
>>> print ftfy.ftfy(u"the u00e2u20acu0153Made in the USAu00e2u20acu009d label", uncurl_quotes=False)
the “Made in the USA” label

Categories

python - In what world would \u00c3\u00a9 become é?

python - In what world would \u00c3\u00a9 become é?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - In what world would \u00c3\u00a9 become &#233;?

python - In what world would \u00c3\u00a9 become &#233;?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

python - In what world would \u00c3\u00a9 become é?

python - In what world would \u00c3\u00a9 become é?