antlr3 - Antlr rule priorities

Question

Welcome To Ask or Share your Answers For Others

antlr3 - Antlr rule priorities

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

antlr3 - Antlr rule priorities

Firstly I know this grammar doesn't make sense but it was created to test out the ANTLR rule priority behaviour

grammar test;

options 
{

output=AST;
backtrack=true;
memoize=true;

}

rule_list_in_order :
    (
    first_rule
    | second_rule
    | any_left_over_tokens)+
    ;


first_rule
    :
     FIRST_TOKEN
    ;


second_rule:     
    FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE;


any_left_over_tokens
    :
    NEW_LINE
    | FIRST_TOKEN
    | SECOND_TOKEN;



FIRST_TOKEN
    : 'First token here'
    ;   

SECOND_TOKEN
    : 'Second token here';

NEW_LINE
    : ('
'?'
')   ;

WS  : (' '|''|'u000C')
    {$channel=HIDDEN;}
    ;

When I give this grammar the input 'First token here Second token here', it matches the second_rule.

I would have expected it to match the first rule then any_left_over_tokens because the first_rule appears before the second_rule in the rule_order_list which is the start point. Can anyone explain why this happens?

Cheers

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:41+0000

First of all, ANTLR's lexer will tokenize the input from top to bottom. So tokens defined first have a higher precedence than the ones below it. And in case rule have overlapping tokens, the rule that matches the most characters will take precedence (greedy match).

The same principle holds within parser rules. Rules defined first will also be matched first. For example, in rule foo, sub-rule a will first be tried before b:

foo
  :  a
  |  b
  ;

Note that in your case, the 2^nd rule isn't matched, but tries to do so, and fails because there is no trailing line break, producing the error:

line 0:-1 mismatched input '<EOF>' expecting NEW_LINE

So, nothing is matched at all. But that is odd. Because you've set the backtrack=true, it should at least backtrack and match:

first_rule ("First token here")
any_left_over_tokens ("line-break")
any_left_over_tokens ("Second token here")

if not match first_rule in the first place and not even try to match second_rule to begin with.

A quick demo when doing the predicates manually (and disabling the backtrack in the options { ... } section) would look like:

grammar T;

options {
  output=AST;
  //backtrack=true;
  memoize=true;
}

rule_list_in_order
  :  ( (first_rule)=>  first_rule  {System.out.println("first_rule=[" + $first_rule.text + "]");}
     | (second_rule)=> second_rule {System.out.println("second_rule=[" + $second_rule.text + "]");}
     | any_left_over_tokens        {System.out.println("any_left_over_tokens=[" + $any_left_over_tokens.text + "]");}
     )+ 
  ;

first_rule
  :  FIRST_TOKEN
  ;

second_rule
  :  FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE
  ;

any_left_over_tokens
  :  NEW_LINE
  |  FIRST_TOKEN
  |  SECOND_TOKEN
  ;

FIRST_TOKEN  : 'First token here';   
SECOND_TOKEN : 'Second token here';
NEW_LINE     : ('
'?'
');
WS           : (' '|''|'u000C') {$channel=HIDDEN;};

which can be tested with the class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        String source = "First token here
Second token here";
        ANTLRStringStream in = new ANTLRStringStream(source);
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        parser.rule_list_in_order();
    }
}

which produces the expected output:

first_rule=[First token here]
any_left_over_tokens=[
]
any_left_over_tokens=[Second token here]

Note that it doesn't matter if you use:

rule_list_in_order
  :  ( (first_rule)=>  first_rule 
     | (second_rule)=> second_rule
     | any_left_over_tokens
     )+ 
  ;

or

rule_list_in_order
  :  ( (second_rule)=> second_rule // <--+--- swapped
     | (first_rule)=>  first_rule  // <-/
     | any_left_over_tokens
     )+ 
  ;

, both will produce the expected output.

So, my guess is that you may have found a bug.

Yout could try the ANTLR mailing-list, in case you want a definitive answer (Terence Parr frequents there more often than he does here).

Good luck!

PS. I tested this with ANTLR v3.2

Categories

antlr3 - Antlr rule priorities

antlr3 - Antlr rule priorities

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags