Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
755 views
in Technique[技术] by (71.8m points)

antlr4 - Antlr Grammar for making boolean operators must if second operand exists

If user inputs A and B, then the parsing happens correctly. But if user inputs A B (without logical operator), then parser doesnot complain about missing "AND" token. I have tried to use this for parser rule: booleanAndExpression : simpleFilter ((AND)+ simpleFilter )* #SimF ;

But then parser stops considering second operand. Means if user enters "A B", AST is generated only for first operand "A" and B is not parsed.

I want the parser to complain about missing boolean operator if user enter "A B" instead of "A AND B" ANy help would e greatly appreciated as I have been trying to resolve this but in vain

Grammar file is mentioned below

grammar Filter;

/* Parser rules */

filter

    :   LPAREN logicalExpression RPAREN   #ParenthesizedFilter
    |   logicalExpression EOF           #CmopositeFilterHash;

logicalExpression
       : booleanAndExpression ( OR booleanAndExpression )*                     #LogExp
       ;

booleanAndExpression
    :  simpleFilter ((AND) ? simpleFilter )*                                  #SimF
    ;

simpleFilter
    :NOT simpleFilter                                                        #NegateFilter
    | key=STR op=EQ value=(STR | DECIMAL)                                   #EqualsFilter
    |   key=STR op=NE value=(STR | DECIMAL)                                         #NotEqualsFilter
    |   key=STR op=GT value=(STR | DECIMAL)                                         #GreaterThanFilter
    |   key=STR op=GE value=(STR | DECIMAL)                            #GreaterThanEqualsFilter
    |   key=STR op=LT value=(STR | DECIMAL)                                         #LessThanFilter
    |   key=STR op=LE value=(STR | DECIMAL)                                         #LessThanEqualsFilter
    |   key=STR op=IN referenceValue                                                #InReferenceFilter
    |   key=STR op=NOT op=IN referenceValue                                         #NotInReferenceFilter
    |   key=STR op=BETWEEN fromValue=(STR | DECIMAL) AND toValue=(STR | DECIMAL)    #RangeFilter
    |   key=STR NOT BETWEEN fromValue=(STR | DECIMAL) AND toValue=(STR | DECIMAL)   #OutsideRangeFilter
    |   key=STR op=IN array                                                         #InArrayFilter
    |   key=STR NOT IN array                                                        #NotInArrayFilter
    |   key=STR op=CONTAINS value=(STR | DECIMAL)                                   #ContainsFilter
    |   key=STR NOT CONTAINS value=(STR | DECIMAL)                                  #NotContainsFilter
    |   DOLLAR LCURLY FILTER SEP id=(DECIMAL | STR) SEP name=STR RCURLY             #ReferenceFilter;

array
    :   LBRACKET (arrayItem SEP*)+ RBRACKET ;

arrayItem
    :   value=(STR | DECIMAL)
    |   referenceValue ;

referenceValue
    :   DOLLAR LCURLY type=LIST SEP id=(STR | DECIMAL) SEP name=(STR | DECIMAL) RCURLY ;


/* Lexer Rules*/

SEP: 
',' | ';';

LIST
: 'List' ;

FILTER: 'Filter';

DOLLAR: '$';
LCURLY: '{';
RCURLY: '}';
LPAREN: '(';
RPAREN: ')';
LBRACKET: '[';
RBRACKET: ']';
GT: '>';
GE: '>=';
LT: '<';
LE: '<=';
EQ: '=';
NE: '!=';
NOT: '!' | 'not' | 'NOT';
AND: 'AND' | 'and';
OR: 'OR' | 'or';
IN: 'in' | 'IN';
BETWEEN: 'between' | 'BETWEEN';
CONTAINS: 'contains' | 'CONTAINS';
AND_MUST: 'AND';

DECIMAL: '-'? DIGIT+ ( '.' DIGIT+ )? ;
fragment DIGIT: [0-9] ;
STR: ALPHANUMERIC+ | '"' (ALPHANUMERIC | [ ] | '""')+ '"';
fragment ALPHANUMERIC: [a-z0-9A-Z!.:@#$%&^*'+/?_`~-];

WS  : [ 
f]+ -> skip ;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This grammar does not parse "A and B", let alone "A B", starting from booleanAndExpression or filter. I suspect that your driver code--which you don't provide--does not check error conditions from the parse and lex. I cannot tell the target you are using, so it is hard to know what your build environment is, and whether it is working or not. But, I would also look into that.

Here is a complete C# app with your grammar as is (link). The driver code is generated by my Antlr4BuildTasks.Templates, and uses Antlr v4.9 and Antlr4BuildTasks, which I also wrote. The driver checks for parse errors, prints out the tokens and trees for your two examples. In both "A and B" and "A B", the parse fails. Here is the output from running your grammar on these inputs.

A and B
line 1:2 no viable alternative at input 'Aand'
line 1:2 no viable alternative at input 'Aand'
line 1:7 no viable alternative at input 'B'
line 1:7 no viable alternative at input 'B'
error in parse.
[@0,0:0='A',<25>,1:0]
[@1,2:4='and',<18>,1:2]
[@2,6:6='B',<25>,1:6]
[@3,7:6='<EOF>',<-1>,1:7]
( booleanAndExpression
  ( simpleFilter
  )
  ( simpleFilter
    ( STR i=0 txt=A tt=25 DEFAULT_TOKEN_CHANNEL
  ) )
  ( AND i=1 txt=and tt=18 DEFAULT_TOKEN_CHANNEL
  )
  ( simpleFilter
  )
  ( simpleFilter
    ( STR i=2 txt=B tt=25 DEFAULT_TOKEN_CHANNEL
) ) )

A B
line 1:2 no viable alternative at input 'AB'
line 1:2 no viable alternative at input 'AB'
error in parse.
[@0,0:0='A',<25>,1:0]
[@1,2:2='B',<25>,1:2]
[@2,3:2='<EOF>',<-1>,1:3]
( booleanAndExpression
  ( simpleFilter
  )
  ( simpleFilter
    ( STR i=0 txt=A tt=25 DEFAULT_TOKEN_CHANNEL
  ) )
  ( simpleFilter
    ( STR i=1 txt=B tt=25 DEFAULT_TOKEN_CHANNEL
) ) )

Here are a few suggestions to fix your grammar. I've made some of these changes here.

  1. You should get in the habit of looking at the warning messages from the Antlr tool and address them. The Antlr Java tool warns ...Filter.g4:76:0: One of the token AND_MUST values unreachable. AND is always overlapped by token AND. The rule AND_MUST: 'AND'; can never match because the lexer always matches the first rule. Since rules AND and AND_MUST match the same string, and AND occurs before AND_MUST, it will always match, and AND_MUST will never match. You should remove the AND_MUST rule. (a) Antlr tries to match as many input chars as possible when creating a token; (b) When two or more lexer rules can match the same input, the one defined first wins.

  2. Antlr parses the input string until the rule is satisfied. If a partial string works, it will do so and not complain about the rest of the input. A parse from the booleanAndExpression, which is not augmented with EOF, will parse "A" in the input "A B" and return a valid parse. To prevent this behavior, always parse with an EOF-augmented rule, e.g., filter. The original filter rule allowed partial input matching, so I adjusted the rule with parentheses (see link).

  3. When I read the rules for logicalExpression and booleanAndExpression, it looks like you are using factoring to set up the precedence of the OR and AND operators. But, later on, you use alt-rule precedence for EQ, NE, GT, GE, and other operators. It looks like you copied the expression rules from two different grammars and tried to combine them for your purpose. You should choose one way or the other to set up the precedence for your operators. Factoring is easier to control, but alt-rule precedence results in smaller trees and faster parses.

  4. As I mentioned before, the rule booleanAndExpression : simpleFilter ((AND)? simpleFilter )* ; is wrong. If you use AND?, then the sentential form simpleFilter simpleFilter is valid, so A B would be valid. I changed this in the modified grammar, but you can check the behavior by modifying the updated grammar I provide with (AND)? or just AND? and rerun.

  5. Rule simpleFilter contains an error: it must include the base case for STR.

  6. You sprinkle Antlr labels throughout the grammar. I would not recommend that you do this. It clutters the grammar and makes it unportable to other parse generators. To distinguish between one alt from another, use the Antlr accessor functions that the tool generates. For example, to distinguish between "( A )" vs. "A" in the first rule, check whether "context.LPAREN()" is not null.

  7. Instead of using "-> skip" in lexer rules, use an off-channel "-> channel(OFF_CHANNEL)". This allows the error messages to avoid "run-ins", e.g., instead of line 1:2 no viable alternative at input 'AB', it will appear line 1:2 no viable alternative at input 'A B'. However, this would require you to split the combined grammar into separate lexer and parser grammars, probably not required at this point.

  8. There was another "Answer" posted after my post, which recommends adding error productions to the grammar. I would not recommend that you do this. Instead, if you want better error messages, you should modify the error reporter for the parse. You can see the example I provide. A CFG should define the language. It should not be perverted for an implementation detail.

Regards, Ken


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...