First of all, ANTLR's lexer will tokenize the input from top to bottom. So tokens defined first have a higher precedence than the ones below it. And in case rule have overlapping tokens, the rule that matches the most characters will take precedence (greedy match).
The same principle holds within parser rules. Rules defined first will also be matched first. For example, in rule foo
, sub-rule a
will first be tried before b
:
foo
: a
| b
;
Note that in your case, the 2nd rule isn't matched, but tries to do so, and fails because there is no trailing line break, producing the error:
line 0:-1 mismatched input '<EOF>' expecting NEW_LINE
So, nothing is matched at all. But that is odd. Because you've set the backtrack=true
, it should at least backtrack and match:
first_rule
("First token here")
any_left_over_tokens
("line-break")
any_left_over_tokens
("Second token here")
if not match first_rule
in the first place and not even try to match second_rule
to begin with.
A quick demo when doing the predicates manually (and disabling the backtrack
in the options { ... } section) would look like:
grammar T;
options {
output=AST;
//backtrack=true;
memoize=true;
}
rule_list_in_order
: ( (first_rule)=> first_rule {System.out.println("first_rule=[" + $first_rule.text + "]");}
| (second_rule)=> second_rule {System.out.println("second_rule=[" + $second_rule.text + "]");}
| any_left_over_tokens {System.out.println("any_left_over_tokens=[" + $any_left_over_tokens.text + "]");}
)+
;
first_rule
: FIRST_TOKEN
;
second_rule
: FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE
;
any_left_over_tokens
: NEW_LINE
| FIRST_TOKEN
| SECOND_TOKEN
;
FIRST_TOKEN : 'First token here';
SECOND_TOKEN : 'Second token here';
NEW_LINE : ('
'?'
');
WS : (' '|''|'u000C') {$channel=HIDDEN;};
which can be tested with the class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "First token here
Second token here";
ANTLRStringStream in = new ANTLRStringStream(source);
TLexer lexer = new TLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TParser parser = new TParser(tokens);
parser.rule_list_in_order();
}
}
which produces the expected output:
first_rule=[First token here]
any_left_over_tokens=[
]
any_left_over_tokens=[Second token here]
Note that it doesn't matter if you use:
rule_list_in_order
: ( (first_rule)=> first_rule
| (second_rule)=> second_rule
| any_left_over_tokens
)+
;
or
rule_list_in_order
: ( (second_rule)=> second_rule // <--+--- swapped
| (first_rule)=> first_rule // <-/
| any_left_over_tokens
)+
;
, both will produce the expected output.
So, my guess is that you may have found a bug.
Yout could try the ANTLR mailing-list, in case you want a definitive answer (Terence Parr frequents there more often than he does here).
Good luck!
PS. I tested this with ANTLR v3.2