I have a likely improperly encoded json document from a source I do not control, which contains the following strings:
du00c3u00a9cor
businessu00e2u20acu2122 active accounts
the u00e2u20acu0153Made in the USAu00e2u20acu009d label
From this, I am gathering they intend for u00c3u00a9
to beceom é
, which would be utf-8 hex C3 A9
. That makes some sense. For the others, I assume we are dealing with some types of directional quotation marks.
My theory here is that this is either using some encoding I've never encountered before, or that it has been double-encoded in some way. I am fine writing some code to transform their broken input into something I can understand, as it is highly unlikely they would be able to fix the system if I brought it to their attention.
Any ideas how to force their input to something I can understand? For the record, I am working in Python.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…