For the missing hex-encoding in the related question:
$output = preg_replace_callback('/[x{80}-x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8');
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return $entity;
}, $input);
This is similar to @Baba's answer using UTF-32BE and then unpack
and vsprintf
for the formatting needs.
If you prefer iconv
over mb_convert_encoding
, it's similar:
$output = preg_replace_callback('/[x{80}-x{10FFFF}]/u', function ($match) {
list($utf8) = $match;
$binary = iconv('UTF-8', 'UTF-32BE', $utf8);
$entity = vsprintf('&#x%X;', unpack('N', $binary));
return $entity;
}, $input);
I find this string manipulation a bit more clear then in Get hexcode of html entities.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…