Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
273 views
in Technique[技术] by (71.8m points)

antlr - How do I make a TreeParser in ANTLR3?

I'm attemping to learn language parsing for fun...

I've created a ANTLR grammar which I believe will match a simple language I am hoping to implement. It will have the following syntax:

<FunctionName> ( <OptionalArguments>+) {
     <OptionalChildFunctions>+
 }

Actual Example:

ForEach(in:[1,2,3,4,5] as:"nextNumber") {
   Print(message:{nextNumber})
}

I believe I have the grammar working correctly to match this construct, and now I am attemping to build an Abstract Syntax Tree for the language.

Firstly, I must admit I'm not exactly sure HOW this tree should look. Secondly, I'm at a complete loss how to do this in my Antlr grammar...I've been trying without much success for hours.

This is the current idea I'm going with for the tree:

                   FunctionName
                  /          
           Attributes         
               /           /   
            ID    /    ChildFunctions
           /    ID etc
          /   
  Attribute  AttributeValue
        Type

This is my current Antlr grammar file:

grammar Test;

options {output=AST;ASTLabelType=CommonTree;}

program : function ;
function : ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)?;

attribute : ID COLON datatype;

datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX (datatype (COMMA datatype)* )? CLOSE_BOX ;
lookup  : OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE;

NUMBER
 : ('+' | '-')? (INTEGER | FLOAT)
 ;

STRING
    :  '"' ( ESC_SEQ | ~('\'|'"') )* '"'
    ;

BOOLEAN
 : 'true' | 'TRUE' | 'false' | 'FALSE'
 ;

ID  : (LETTER|'_') (LETTER | INTEGER |'_')*
    ;

COMMENT
    :   '//' ~('
'|'
')* '
'? '
' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WHITESPACE : (' ' | '' | '
' | '
') {$channel=HIDDEN;} ;

COLON : ':' ;
COMMA : ',' ;
PERIOD  :  '.' ;

OPEN_BRACKET : '(' ;
CLOSE_BRACKET : ')' ;

OPEN_BRACE : '{' ; 
CLOSE_BRACE : '}' ;

OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;

fragment
LETTER
 : 'a'..'z' | 'A'..'Z' 
 ;

fragment
INTEGER
 : '0'..'9'+
 ;

fragment
FLOAT
 : INTEGER+ '.' INTEGER*
 ;

fragment
ESC_SEQ
    :   '\' ('b'|'t'|'n'|'f'|'r'|'"'|'''|'\')
    ;

ANY help / advice would be great. I've tried reading dozens of tutorials and nothing about the AST generation seems to stick :(

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Step 1 is to make the tree look like the little graph that you posted. Right now, you don't have any tree construction operators, so you're going to end up with a flat list.

See tree construction on the antlr.org website.

You can use ANTLRWorks to see what your getting for a parse tree and AST. Start adding tree construction operators and watch how things change.

EDIT / Additional Info:

Here's a process you can follow to give you a rough idea of how to do it:

  1. Download ANTLRWorks and use it's graphing facilities. You will definitely want to see the parse tree and the AST before and after you make changes. Once you understand how everything works, then you can use any IDE or editor you want.
  2. There are two basic operators for tree construction - The exclamation mark ! which tells the compiler to not place the node within the AST, and the carot ^, which tells ANTLR to make something the root node. Start by going through each non-terminal rule and deciding which elements don't need to be in the AST. For example, you don't need commas or parenthesis. Once you have all the information you can populate the a structure (or create your own AST structure) that provides all the information. Commas don't help any more, so add a ! to them. For example:

    function: ID (OPEN_BRACKET! (attribute (COMMA!? attribute)*)? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;

  3. Take a look at the AST in ANTLRWorks before and after. Compare.

  4. Now decide which element should be the root node. It looks like you want ID to be the root node, so add a ^ after ID and compare in ANTLRWorks.

Here's a few changes that bring it closer to what I think you want:

program : function ;
function : ID^ (OPEN_BRACKET! attributeList? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;
attributeList:  (attribute (COMMA!? attribute)*);
attribute : ID COLON! datatype;
datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX! (datatype^ (COMMA! datatype)* )? CLOSE_BOX!;
lookup  : OPEN_BRACE! (ID (PERIOD! ID)*) CLOSE_BRACE!;

With that under your belt, now go look at some of the tutorials.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...