Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

regex - PHP: reverting escaped unicode characters


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First of all, as mentioned in Right way to escape backslash [ ] in PHP regex? 4 slashes should be used to match a backslash. The regex, therefore, becomes "/\\X\w{2}\\/".

As for the decoding, the easiest way I found was to convert the escaped characters to the HTML entity format and use the html_entity_decode() function. The code, therefore, ended up as follows:

function unescapeText(string $str)
{
    return preg_replace_callback(
        "/\\X\w{2}\\/",
        fn($m) => html_entity_decode('&#x'.substr($m[0], 2, 2).';', ENT_NOQUOTES, 'UTF-8'),
        $str
    );
}

Lastly, a word of advice: I had some trouble at first because double quotes converted the string to binary; single quotes escaped double backslashes to one (XE7\XE3 would, therefore, become XE7XE3). That caused all sorts of issues. Using Nowdoc syntax finally made the text be interpreted literally, as I had intended.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...