For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
For more details, see:
Those Unicode Properties and blocks are written p{Name}
, where "Name" is the name of the property or block.
When it is an uppercase "P" like this P{Name}
, then it is the negation of the property/block, i.e. it matches anything else.
There are e.g. some properties (only a short excerpt):
- L ==> All letter characters.
- Lu ==> Letter, Uppercase
- Ll ==> Letter, Lowercase
- N ==> All numbers. This includes the Nd, Nl, and No categories.
- Pc ==> Punctuation, Connector
- P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
- Sm ==> Symbol, Math
There are e.g. some blocks (only a short excerpt):
- 0000 - 007F ==> IsBasicLatin
- 0400 - 04FF ==> IsCyrillic
- 1000 - 109F ==> IsMyanmar
What I used in the solution:
P{L}
is a character property that is matching any character that is not a letter ("L" for Letter)
p{IsBasicLatin}
is a Unicode block that matches the code points 0000 - 007F
So your regex would be:
^[P{L}p{IsBasicLatin}]+$
In plain words:
This matches a string from the start to the end (^
and $
), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
A short c# test method:
string[] myStrings = { "Foobar",
"Foo@bar!"§$%&/()",
"F?obar",
"fóóè"
};
Regex reg = new Regex(@"^[P{L}p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
Prints:
matched ==> Foobar
matched ==> Foo@bar!"§$%&/()
failed ==> F?obar
failed ==> fóóè
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…