revo's comment above was very helpful to find a solution:
If your PHP isn't shipped with a PCRE build for UTF-16 then you can't perform such a match. From PHP 7.0 on, you're able to use Unicode code points following this syntax u{XXXX}
e.g. preg_replace("~u{1F600}~", '', $str);
(Mind the double quotes)
Since I am using PHP 7, echo "u{1F602}";
outputs ?? according to this PHP RFC page on unicode escape. This proposal was in essence:
A new escape sequence is added for double-quoted strings and heredocs.
u{ codepoint-digits }
where codepoint-digits
is composed of hexadecimal digits.
This implies that the matching string in preg_replace
(normally single-quoted for not messing up with double-quoted strings variable expansion), now needs some preg_quote
magic. This is the solution I came up with:
preg_replace(
// single point unicode list
"/[x{2600}-x{26FF}".
// http://www.fileformat.info/info/unicode/block/miscellaneous_symbols/list.htm
// concatenates with paired surrogates
preg_quote("u{1F600}", '/')."-".preg_quote("u{1F64F}", '/').
// https://www.fileformat.info/info/unicode/block/emoticons/list.htm
"]/u",
'',
$str
);
Here's the proof of the above in 3v4l.
EDIT: a simpler solution
In another comment made by revo, it seems that by placing unicode characters directly into the regex character class, single-quoted strings and previous PHP versions (e.g. 4.3.4) are supported:
preg_replace('/[?-???-??]/u','YOINK',$str);
For using PHP 7's new feature though, you still need double-quotes:
preg_replace("/[u{2600}-u{26FF}u{1F600}-u{1F64F}]/u",'YOINK',$str);
Here's revo's proof in 3v4l.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…