Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
186 views
in Technique[技术] by (71.8m points)

Thread-safe / reentrant bison + flex

I would really prefer a working example to any explanation. Whatever I read so far on Bison's documentation site contradicts whatever Flex says. One says to declare yylex as

int yylex (yyscan_t yyscanner);

another one wants it to be:

int yylex(YYSTYPE *lvalp, YYLTYPE *llocp);

What I really need is the location information. I'm not sure as of yet if I need YYSTYPE (I don't have a use for this information right now, but maybe in the future I will).


Unrelated to the above, and as a bonus, I'd be interesting to know why this infrastructure is so bad. It seems like such a straight-forward thing to do, and yet it's otherwordly bad. It never works with defaults. Even writing a simplest textbook example of calculator requires a many days of fixing configuration errors... why?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

1. Sample code

A kind of explanation of how reentrancy is configured into bison and flex is provided in section 2 of this answer. Other annotations of the sample code are in section 3.

1.1 eval.l

%option noinput nounput noyywrap 8bit nodefault                                 
%option yylineno
%option reentrant bison-bridge bison-locations                                  

%{
  #include <stdlib.h>                                                           
  #include <string.h>
  #include "eval.tab.h"                                                   
  
  #define YY_USER_ACTION                                             
    yylloc->first_line = yylloc->last_line;                          
    yylloc->first_column = yylloc->last_column;                      
    if (yylloc->last_line == yylineno)                               
      yylloc->last_column += yyleng;                                 
    else {                                                           
      yylloc->last_line = yylineno;                                  
      yylloc->last_column = yytext + yyleng - strrchr(yytext, '
'); 
    }
%}                                                                              
%%
[ ]+            ;                                                  
#.*               ;                                                  

[[:digit:]]+      *yylval = strtol(yytext, NULL, 0); return NUMBER;  

.|
              return *yytext;                                    

1.2 eval.y

%define api.pure full
%locations
%param { yyscan_t scanner }

%code top {
  #include <stdio.h>
} 
%code requires {
  typedef void* yyscan_t;
}
%code {
  int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t scanner);
  void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg);
}

%token NUMBER UNOP
%left '+' '-'
%left '*' '/' '%'
%precedence UNOP
%%
input: %empty
     | input expr '
'      { printf("[%d]: %d
", @2.first_line, $2); }
     | input '
'
     | input error '
'     { yyerrok; }
expr : NUMBER
     | '(' expr ')'         { $$ = $2; }
     | '-' expr %prec UNOP  { $$ = -$2; }
     | expr '+' expr        { $$ = $1 + $3; }
     | expr '-' expr        { $$ = $1 - $3; }
     | expr '*' expr        { $$ = $1 * $3; }
     | expr '/' expr        { $$ = $1 / $3; }
     | expr '%' expr        { $$ = $1 % $3; }

%%

void yyerror(YYLTYPE* yyllocp, yyscan_t unused, const char* msg) {
  fprintf(stderr, "[%d:%d]: %s
",
                  yyllocp->first_line, yyllocp->first_column, msg);
}

1.3 eval.h

See 3.1 for an explanation of the need for this file.

#include "eval.tab.h"
#include "eval.lex.h"

1.4 main.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "eval.h"
#if !YYDEBUG
  static int yydebug;
#endif

int main(int argc, char* argv[]) {
  yyscan_t scanner;          
  yylex_init(&scanner);
  
  do {
    switch (getopt(argc, argv, "sp")) {
      case -1: break;
      case 's': yyset_debug(1, scanner); continue;
      case 'p': yydebug = 1; continue;
      default: exit(1);
    }
    break;
  } while(1);

  yyparse(scanner);          
  yylex_destroy(scanner);    
  return 0;
}

1.5 Makefile

all: eval

eval.lex.c: eval.l
        flex -o $@ --header-file=$(patsubst %.c,%.h,$@) --debug $<

eval.tab.c: eval.y
        bison -o $@ --defines=$(patsubst %.c,%.h,$@) --debug $<

eval: main.c eval.tab.c eval.lex.c eval.h
        $(CC) -o $@ -Wall --std=c11 -ggdb -D_XOPEN_SOURCE=700 $(filter %.c,$^)

clean:
        rm -f eval.tab.c eval.lex.c eval.tab.h eval.lex.h main

2. Re-entrancy issues

The most important thing to remember is that Bison/Yacc and Flex/Lex are two independent code generators. While they are frequently used together, this is not necessary; either one can be used by itself or with other tools.

Note: The following discussion only applies to normal "pull" parsers. Bison can generate push parsers (similar to Lemon) and that allows a useful control flow inversion, which actually simplifies several of the issues mentioned below. In particular, it completely avoids the circular dependency analysed in 3.1. I usually prefer push parsers, but they seemed out of scope for this particular question.

2.1 Bison / Yacc re-entrancy

A Bison/Yacc generated parser is called once to parse an entire body of text, so it has no need to maintain mutable persistent data objects between calls. It does rely on a number of tables which guide the progress of the parser, but the fact that these immutable tables have static lifetime does not affect re-entrancy. (With Bison, at least, these tables do not have external linkage but of course they are still visible by user-written code inserted into the parser.)

The main issue, then, are the externally-visible mutable globals yylval and yylloc, used to augment the parser-lexer interface. These globals are definitely part of Bison/Yacc; Flex-generated code does not even mention them, and all use of them is explicitly performed in user actions in the Flex definition files. To make a bison parser re-entrant, it is necessary to modify the API which the parser uses to collect information from the lexer about each token, and the solution adopted by Bison is the classic one of providing additional parameters which are pointers to the data structures being "returned" to the parser. So this re-entrancy requirement changes the way the Bison-generated parser calls yylex; instead of invoking

int yylex(void);

the prototype becomes either:

int yylex(YYSTYPE* yylvalp);

or

int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp);

depending on whether or not the parser requires the location information stored in yylloc. (Bison will automatically detect use of location information in actions, but you can also insist that a location object be provided to yylex.)

That means that the scanner must be modified in order to correctly communicate with a re-entrant bison parser, even if the lexer itself is not re-entrant. (See below.)

There are a small number of additional Bison/Yacc variables which are intended for use by user code, which might force source code changes if used:

  • yynerrs counts the number of syntax errors which have been encountered; with a re-entrant parser, yynerrs is local to the yyparse and therefore can only be used in actions. (In legacy applications, it is sometimes referenced by yyparse's caller; such uses need to be modified for re-entrant parsers.)

  • yychar is the token type of the lookahead symbol, and is sometimes used in error reporting. In a re-entrant parser, it is local to yyparse so if it is needed by an error reporting function, it will have to be passed explicitly.

  • yydebug controls whether a parse trace is produced, if debugging code has been enabled. yydebug is still global in a re-entrant parser, so it is not possible to enable debugging traces only for a single parser instance. (I regard this as a bug, but it could be considered a feature request.)

    Debugging code is enabled by defining the preprocessor macro YYDEBUG or by using the -t command-line flag. These are defined by Posix; Flex also provides the --debug command line flag; the %debug directive and the parse.trace configuration directive (which can set with -Dparse.trace on the bison command line.

2.2 Flex / Lex re-entrancy

yylex is called repeatedly over the course of the parse; each time it is called, it returns a single token. It needs to maintain a large amount of persistent state between calls, including its current buffer and various pointers tracking lexical progress.

In a default lexer, this information is kept in a global struct which is not intended to be referenced by user code, except for specific global variables (which are mostly macros in modern Flex templates).

In a re-entrant lexer, all of Flex's persistent information is collected into an opaque data structure pointed to by a variable of type yyscan_t. This variable must be passed to every call to Flex functions, not just yylex. (The list includes, for example, the various buffer management functions.) The Flex convention is that the persistent state object is always the last argument to a function. Some globals which have been relocated into this data structure have associated macros, so that it is possible to refer to them by their traditional names Flex actions. Outside of yylex, all accesses (and modifications, in the case of mutable variables) must be done with getter and setter functions documented in the Flex manual. Obviously, the list of getter/setter functions does not include accessors for Bison variables, such as yylval.

So yylex in a re-entrant scanner has the prototype

int yylex(yyscan_t state);

2.3 Communication between parser and scanner

Flex/lex itself only recognizes tokens; it is up to the user action associated with each pattern to communicate the result of the match. Conventionally, parsers expect that yylex will return a small integer representing the token's syntactic type or 0 to indicate that the end of input has been reached. The token's text is stored in the variable (or yyscan_t member) yytext (and its length in yyleng) but since yytext is a pointer to an internal buffer in the generated scanner, the string value can only be used before the next call to yylex. Since LR parsers do not generally process semantic information until several tokens have been read, yytext is not an appropriate mechanism for passing semantic information.

As mentioned above, non-reentrant Bison/Yacc generated parsers provide assume the use of the global yylval to communicate semantic information, as well as the yylloc global to communicate source location information, if that is desired (Bison only).

But, as noted above, in a re-entrant parser these variables are local to yyparse and the parser passes pointers to the variables on each call to the lexer. This requires changes to the prototype of yylex, as well as to any scanner actions which use yylval and/or yylloc.

The prototype expected by a reentrant bison-generated parser is:

int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t state);

(If locations are not used, the yyllocp argument is eliminated.)

Flex's %bison-bridge directive (or the combination of %bison-bridge and %bison-locations if location tracking is being used) will ensure that the yylex prototype is correct.

All references to yylval in scanner actions also need to be modified, since bison's reentrant API passes poin


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...