Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
391 views
in Technique[技术] by (71.8m points)

java - Get original text of an Antlr rule

I am an ANTLR beginner and want to calculate a SHA1-Hash of symbols.

My simplified example grammar:

grammar Example;

method @after{calculateSha1($text); }: 'call' ID;

ID: 'A'..'Z'+;
WS: (' '|'
'|'
')+ {skip(); }
COMMENT: '/*' (options {greedy=false;}: .)* '*/' {$channel=HIDDEN}

As the lexer removes all whitespaces the different strings callABC, call /* DEF */ ABC unfortunately get the same SHA1-Hash value.

Is it possible to get the "original" text of a rule between the start- and end-token with all the skipped whitespaces and the text of the other channels?

(One possibility that comes into my mind is to member all characters in the WS- and COMMENT-lexer rule, but there are many more rules, so this isn't very practical.)

I use the standard ANTLRInputStream to feed the Lexer, but I don't know how to receive the original text.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Instead of skip()-ping the WS token, put it on the HIDDEN channel as well:

grammar Example;

@parser::members {
  void calculateSha1(String text) {
    try {
      java.security.MessageDigest md = java.security.MessageDigest.getInstance("SHA-1");
      byte[] sha1 = md.digest(text.getBytes());
      System.out.println(text + "
" + java.util.Arrays.toString(sha1) + "
");
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

parse 
  :  method+ EOF
  ;

method
@after{calculateSha1($text);}
  :  'call' ID
  ;

ID      : 'A'..'Z'+;
WS      : (' ' | '' | '
' | '
')+ {$channel=HIDDEN;};
COMMENT : '/*' .* '*/' {$channel=HIDDEN;};

The grammar above can be tested with:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "call ABC call /* DEF */ ABC";
    ExampleLexer lexer = new ExampleLexer(new ANTLRStringStream(source));
    ExampleParser parser = new ExampleParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

which will print the following to the console:

call ABC
[48, -45, 113, 5, -52, -128, -78, 75, -52, -97, -35, 25, -55, 59, -85, 96, -58, 58, -96, 10]

call /* DEF */ ABC
[-57, -2, -115, -104, 77, -37, 4, 93, 116, -123, -47, -4, 33, 42, -68, -95, -43, 91, 94, 77]

i.e.: the same parser rule, yet different $text's (and therefor different SHA1's).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...