So, the problem is as presumed. You are not using the /u
modifier. This means that PCRE will not look for UTF-8 characters.
In any case, this is how it should be done:
var_dump(preg_match('/^[p{L}s]+$/u', "?"));
And works on all my versions. There might be a bug in others, but that's not likely here.
Your problem is that this also works:
var_dump(preg_match('/^[p{L}s]+$/', utf8_decode("?")));
Notice that this uses ISO-8859-1 instead of UTF-8, and leaves out the /u
modifier. The result is int(1)
. Obviously PCRE interprets the Latin-1 ?
as matching p{L}
when in non-/u
nicode mode. (Most of the single-byte xA0-xFF are letter symbols in Latin-1, and the 8-bit code point as the same as in Unicode, so that's actually ok.)
Conclusion: Your input is actually ISO-8859-1. That's why it accidentally worked for you without the /u
. Change that, and be eaxact with input charsets.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…