php - preg_match and (non-English) Latin characters?

Question

Welcome To Ask or Share your Answers For Others

php - preg_match and (non-English) Latin characters?

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:56:03+0000

So, the problem is as presumed. You are not using the /u modifier. This means that PCRE will not look for UTF-8 characters.

In any case, this is how it should be done:

var_dump(preg_match('/^[p{L}s]+$/u', "?"));

And works on all my versions. There might be a bug in others, but that's not likely here.

Your problem is that this also works:

var_dump(preg_match('/^[p{L}s]+$/', utf8_decode("?")));

Notice that this uses ISO-8859-1 instead of UTF-8, and leaves out the /u modifier. The result is int(1). Obviously PCRE interprets the Latin-1 ? as matching p{L} when in non-/unicode mode. (Most of the single-byte xA0-xFF are letter symbols in Latin-1, and the 8-bit code point as the same as in Unicode, so that's actually ok.)

Conclusion: Your input is actually ISO-8859-1. That's why it accidentally worked for you without the /u. Change that, and be eaxact with input charsets.

Categories

php - preg_match and (non-English) Latin characters?

php - preg_match and (non-English) Latin characters?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags