Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
243 views
in Technique[技术] by (71.8m points)

c - Flex/Bison: yytext skips over a value

I've been racking my brain for two days trying to figure out why the program is behaving this way. For a class project, I'm trying to write a program that parses an address and outputs it a certain way. Before I actually get to the output portion of the program, I just wanted to make sure my Bison-fu was actually correct and outputting some debugging information correctly.

It looks as if Flex and Bison are cooperating with each other nicely, as expected, but for some reason, when I get to the parsing of the third line of the address, yytext just skips over the zip code and goes straight to the new line.

Below is a stripped down version of my Flex and Bison files that I tested and still outputs the same thing as the full version:

[19:45]<Program4> $ cat scan.l
%option noyywrap
%option nounput
%option noinput

%{
#include <stdlib.h>
#include "y.tab.h"
#include "program4.h"
%}

%%

[ 	]+                 { /* Eat whitespace */}
[
]                    { return EOLTOKEN; }
","                     { return COMMATOKEN; }
[0-9]+                  { return INTTOKEN; }
[A-Za-z]+               { return NAMETOKEN; }
[A-Za-z0-9]+            { return IDENTIFIERTOKEN; }

%%

/*This area just occupies space*/
[19:45]<Program4> $ cat parse.y


%{
#include <stdlib.h>
#include <stdio.h>
#include "program4.h"

%}

%union {int num; char id[20]; }
%start locationPart
%expect 0
%token <num> NAMETOKEN
%token <num> EOLTOKEN
%token <num> INTTOKEN
%token <num> COMMATOKEN
%type <id> townName zipCode stateCode

%%

/* Entire block */
locationPart:           townName COMMATOKEN stateCode zipCode EOLTOKEN          
{ printf("Rule 12: LP: TN COMMA SC ZC EOL: %s
", yytext); }
| /* bad location part */                               
{ printf("Rule 13: LP: Bad location part: %s
", yytext); }
                    ;

/* Lil tokens */
townName:               NAMETOKEN                                               
{ printf("Rule 23: TN: NAMETOKEN: %s
", yytext); }
                    ;

stateCode:              NAMETOKEN                                               
{ printf("Rule 24: SC: NAMETOKEN: %s
", yytext); }
                    ;

zipCode:                INTTOKEN DASHTOKEN INTTOKEN                             
{ printf("Rule 25: ZC: INT DASH INT: %s
", yytext); }
                    | INTTOKEN                                              
{ printf("Rule 26: ZC: INT: %s
", yytext); }
                    ;

%% 

int yyerror (char const *s){
  extern int yylineno; //Defined in lex

  fprintf(stderr, "ERROR: %s at symbol "%s"
 at line %d.
", s, yytext, 
yylineno);
  exit(1);
}
[19:45]<Program4> $ cat addresses/zip.txt
Rockford, HI 12345
[19:45]<Program4> $ parser < addresses/zip.txt
Operating in parse mode.

Rule 23: TN: NAMETOKEN: Rockford
Rule 24: SC: NAMETOKEN: HI
Rule 26: ZC: INT:

Rule 12: LP: TN COMMA SC ZC EOL:

Parse successful!
[19:46]<Program4> $

As you can see near the bottom, it prints Rule 26: ZC: INT: but fails to print the 5 digit zip code. It's like the program just skips the number and stores the newline instead. Any ideas why it won't store and print the zip code?

Notes:

  • yytext is defined as an extern in my .h file (not posted here);
  • I am using the -vdy flags to compile the parse.c file
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you want to trace the workings of your parser, you are much better off enabling bison's trace feature. It's really easy. Just add the -t or --debug flag to the bison command to generate the code, and then add a line to actually produce the tracing:

/* This assumes you have #included the parse.tab.h header */
int main(void) {
#if YYDEBUG
   yydebug = 1;
#endif

This is explained in the Bison manual; the #if lets your program compile if you leave off the -t flag. While on the subject of flags, I strongly suggest you do not use the -y flag; it is for compiling old Yacc programs which relied on certain obsolete features. If you don't use -y, then bison will use the basename of your .y file with extensions .tab.c and .tab.h for the generated files.

Now, your bison file says that some of your tokens have semantic types, but your flex actions do not set semantic values for these tokens and your bison actions don't use the semantic values. Instead, you simply print the value of yytext. If you think about this a bit, you should be able to see why it won't work. Bison is a lookahead parser; it makes its parsing decisions based on the the current parsing state and a peek at the next token (if necessary). It peeks at the next token by calling the lexer. And when you call the lexer, it changes the value of yytext.

Bison (unlike other yacc implementations) doesn't always peek at the next token. But in your zipcode rule, it has no alternative, since it cannot tell whether the next token is a - or not without looking at it. In this case, it is not a dash; it is a newline. So guess what yytext contains when you print it out in the zipcode action.

If your tokenizer were to save the text in the id semantic value member (which is what it is for) then your parser would be able to access the semantic values as $1, $2, ...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...