Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.6k views
in Technique[技术] by (71.8m points)

unit testing - How to test ANTLR translation without adding EOF to every rule

I am in the middle of re-writing my translator and I am being much more disciplined about tests this time, since this version is likely to live for more than a few weeks.

Because you can run a visitor starting at any node, you can almost write beautiful small tests like this ...

expect(parse("some test code", "startGrammarRule")).toEqual(new ASTForGrammarRule())

and then write one ( or a few of these ) for each visitor function

EXCEPT that the rule you are invoking is a sub rule, and so does not have "EOF" in it, so if my grammar has somewhere in it

numberList: NUMBER ( ',' NUMBER )* ;

... then parse("1,2,3", "numberList") only parses "1" (because it is only an "EOF" which would make the parser hungry enough to consume all the string).

Editing the rule to add EOF is a non starter. I could, for every rule I write a test for, add a test version of the rule ...

numberList: NUMBER ( ',' NUMBER )* ;
numberList_TEST: numberList EOF ;

... but that is going to make the grammar cluttered and introduce worry that the _TEST rules have to always be maintained scrupulously ...

I want a flag when I create a parser which constructs that faux TEST rule dynamically and then parses from there, or something like that ...

Is there a better way to write tests for my parser that I haven't figured out yet?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In a Java project, I'm using a custom matcher to check if the parsed tokens are 100% of the tokenstream, and if not, will fail.

You seem to use the TypeScript target, so in TypeScript that could look like this:

T.g4

grammar T;

parse      : numberList EOF;
numberList : NUMBER ( ',' NUMBER )*;

NUMBER : [0-9]+;
ID     : [a-zA-Z]+;
WS     : [ 
]+ -> channel(HIDDEN);

parserMatchers.ts

import { TLexer } from '../src/parser/TLexer';
import { BailErrorStrategy, CharStreams, CommonTokenStream } from 'antlr4ts';
import { TParser } from '../src/parser/TParser';
import { Lexer } from 'antlr4ts/Lexer';

expect.extend({
  toBeCompletelyParsedBy: (source: string, ruleName: string) => {
    const lexer = new TLexer(CharStreams.fromString(source));
    lexer.removeErrorListeners();
    const tokenStream = new CommonTokenStream(lexer);
    const parser = new TParser(tokenStream);
    parser.removeErrorListeners();
    parser.errorHandler = new BailErrorStrategy();
    const context = parser[ruleName]();

    // Collect the real tokens: non-HIDDEN and non-EOF tokens
    const realTokens = tokenStream.getTokens().filter((t) =>
      t.channel === Lexer.DEFAULT_TOKEN_CHANNEL && t.type !== Lexer.EOF);

    let indexOfStop = realTokens.indexOf(context.stop);
    let pass = realTokens.length === (indexOfStop + 1);

    let message = () => {

      if (pass) {
        return `Expected '${source}' not to be completely parsed by rule '${ruleName}', but it did.`;
      }

      let offending = realTokens[indexOfStop + 1];

      return `Expected '${source}' to be completely parsed by rule '${ruleName}', but '${offending.text}' ` +
        `(${offending.line}:${offending.charPositionInLine}) was not included!`;
    };

    return { pass, message };
  }
});

declare global {
  namespace jest {
    interface Matchers<R> {
      toBeCompletelyParsedBy(ruleName: string): R
    }
  }
}

export {};

And in you unit tests, you can now do this:

import './parserMatchers';

test('the numberList parser rule', () => {
  expect('3, 4, 5').toBeCompletelyParsedBy('numberList');
  expect('3, 4, 5 FOO').not.toBeCompletelyParsedBy('numberList');
});

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...