Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

regex - yytext contains characters not in match

Background

I am using flex to generate a lexer for a programming language I am implementing.

I have some problems with this rule for identifiers:

[a-zA-Z_][a-zA-Z_0-9]* {
    printf("yytext is %s
", yytext);    
    yylval.s = yytext;
    return TOK_IDENTIFIER;
}

The rule works as it should when my parser is parsing expressions like this:

var0 = var1 + var2;

The printf statement will print out this:

yytext is 'var0'
yytext is 'var1'
yytext is 'var2'

Which is what it should.

The problem

But when my parser is parsing function declarations like this one:

func(array[10] type, arg2 wef, arg3 afe);

Now the printf statement will print this:

yytext is 'array['
yytext is 'arg2 wef'
yytext is 'arg3 afe'

The problem is that yytext contains characters that are not in the match.

Question

Why does flex include these characters in yytext and how can I solve this problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't see how that output could be produced from your lexer, but it is easy to see how it could be produced in your parser.

Basically, it is not correct to retain the value of yytext:

yylval.s = yytext;  /* DON'T DO THIS */

In effect, that is a dangling pointer because yytext is pointing to private memory inside the lexer framework, and the pointer is only valid until the next time the lexer is called. Since the parser generally needs to look at the next input token before executing a reduction action, it is almost certain that the pointer in the s member of each terminal in the production will have been invalidated by the time the action is executed.

If you want to keep the string value of the token pointed to by yytext, you must copy it:

yylval.s = strdup(yytext);

and then you will be responsible for freeing the copy when you no longer need it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...