Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
804 views
in Technique[技术] by (71.8m points)

regex - Does w match all alphanumeric characters defined in the Unicode standard?

Does Perl's w match all alphanumeric characters defined in the Unicode standard?

For example, will w match all (say) Chinese and Russian alphanumeric characters?

I wrote a simple test script (see below) which suggests that w does indeed match "as expected" for the non-ASCII alphanumeric characters I tested. But the testing is obviously far from exhaustive.

#!/usr/bin/perl                                                                                                                                                                                                  

use utf8;

binmode(STDOUT, ':utf8');

my @ok;
$ok[0] = "abcdefghijklmnopqrstuvwxyz";
$ok[1] = "éè?áà???????í?ń??áy?ó?????";
$ok[2] = "??ü??ai?ó?ń???íáυσνχατ???η";
$ok[3] = "τσιαιγολοχβ?αν???????тераб";
$ok[4] = "иневоаслк??иневоцеда?еволс";
$ok[5] = "рглсывызтоμ??κινα??γο";

foreach my $ok (@ok) {
    die unless ($ok =~ /^w+$/);
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

perldoc perlunicode says

Character classes in regular expressions match characters instead of bytes and match against the character properties specified in the Unicode properties database. w can be used to match a Japanese ideograph, for instance.

So it looks like the answer to your question is "yes".

However, you might want to use the p{} construct to directly access specific Unicode character properties. You can probably use p{L} (or, shorter, pL) for letters and pN for numbers and feel a little more confident that you'll get exactly what you want.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...