Python unicode escapes either are 4 hex digits (uabcd
) or 8 (Uabcdabcd
); for a codepoint beyond U+FFFF you need to use the latter (a capital U), make sure to left-fill with enough zeros:
>>> 'U0001D15D'
'??'
>>> 'U0001D15D'.encode('unicode_escape')
b'\U0001d15d'
(And yes, the U+1D15D codepoint (MUSICAL SYMBOL WHOLE NOTE) is in the above example, but your browser font may not be able to render it, showing a place-holder glyph (a box or question mark) instead.
Because you used a uabcd
escape, you replaced a
in abc
with two characters, the codepoint U+1D15 (?
, latin letter small capital ou), and the ASCII character D
. Using a 32-bit unicode literal works:
>>> import re
>>> print(re.sub('a', 'U0001D15D', 'abc' ))
??bc
>>> print(re.sub('a', u'U0001D15D', 'abc' ).encode('unicode_escape'))
b'\U0001d15dbc'
where again the U+1D15D codepoint could be displayed by your font as a placeholder glyph instead.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…