On page 74 of the ANTRL4 book it says that any Unicode character can be used in a grammar simply by specifying its codepoint in this manner:
'uxxxx'
where xxxx
is the hexadecimal value for the Unicode codepoint.
So I used that technique in a token rule for an ID token:
grammar ID;
id : ID EOF ;
ID : ('a' .. 'z' | 'A' .. 'Z' | 'u0100' .. 'u017E')+ ;
WS : [
]+ -> skip ;
When I tried to parse this input:
G?nter
ANTLR throws an error, saying that it does not recognize ?
. (The ? character is hex 016D, so it is within the range specified)
What am I doing wrong please?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…