Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
183 views
in Technique[技术] by (71.8m points)

c++ - Boost::Spirit Expression Parser

I have another problem with my boost::spirit parser.

template<typename Iterator>
struct expression: qi::grammar<Iterator, ast::expression(), ascii::space_type> {
    expression() :
        expression::base_type(expr) {
        number %= lexeme[double_];
        varname %= lexeme[alpha >> *(alnum | '_')];

        binop = (expr >> '+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1,_2)]
              | (expr >> '-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1,_2)]
              | (expr >> '*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1,_2)]
              | (expr >> '/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1,_2)] ;

        expr %= number | varname | binop;
    }

    qi::rule<Iterator, ast::expression(), ascii::space_type> expr;
    qi::rule<Iterator, ast::expression(), ascii::space_type> binop;
    qi::rule<Iterator, std::string(), ascii::space_type> varname;
    qi::rule<Iterator, double(), ascii::space_type> number;
};

This was my parser. It parsed "3.1415" and "var" just fine, but when I tried to parse "1+2" it tells me parse failed. I've then tried to change the binop rule to

    binop = expr >>
           (('+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1, _2)]
          | ('-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1, _2)]
          | ('*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1, _2)]
          | ('/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1, _2)]);

But now it's of course not able to build the AST, because _1 and _2 are set differently. I have only seen something like _r1 mentioned, but as a boost-Newbie I am not quite able to understand how boost::phoenix and boost::spirit interact.

How to solve this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It isn't entirely clear to me what you are trying to achieve. Most importantly, are you not worried about operator associativity? I'll just show simple answers based on using right-recursion - this leads to left-associative operators being parsed.

The straight answer to your visible question would be to juggle a fusion::vector2<char, ast::expression> - which isn't really any fun, especially in Phoenix lambda semantic actions. (I'll show below, what that looks like).

Meanwhile I think you should read up on the Spirit docs

  • here in the old Spirit docs (eliminating left recursion); Though the syntax no longer applies, Spirit still generates LL recursive descent parsers, so the concept behind left-recursion still applies. The code below shows this applied to Spirit Qi
  • here: the Qi examples contain three calculator samples, which should give you a hint on why operator associativity matters, and how you would express a grammar that captures the associativity of binary operators. Obviously, it also shows how to support parenthesized expressions to override the default evaluation order.

Code:

I have three version of code that works, parsing input like:

std::string input("1/2+3-4*5");

into an ast::expression grouped like (using BOOST_SPIRIT_DEBUG):

<expr>
  ....
  <success></success>
  <attributes>[[1, [2, [3, [4, 5]]]]]</attributes>
</expr>

The links to the code are here:

Step 1: Reduce semantic actions

First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking1. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:

1check that using BOOST_SPIRIT_DEBUG!

static ast::expression make_binop(char discriminant, 
     const ast::expression& left, const ast::expression& right)
{
    switch(discriminant)
    {
        case '+': return ast::binary_op<ast::add>(left, right);
        case '-': return ast::binary_op<ast::sub>(left, right);
        case '/': return ast::binary_op<ast::div>(left, right);
        case '*': return ast::binary_op<ast::mul>(left, right);
    }
    throw std::runtime_error("unreachable in make_binop");
}

// rules:
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop = (simple >> char_("-+*/") >> expr) 
    [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ]; 

expr = binop | simple;

Step 2: Remove redundant rules, use _val

As you can see, this has the potential to reduce complexity. It is only a small step now, to remove the binop intermediate (which has become quite redundant):

number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
expr = simple [ _val = _1 ] 
    > *(char_("-+*/") > expr) 
            [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ]
    > eoi;

As you can see,

  • within the expr rule, the _val lazy placeholder is used as a pseudo-local variable that accumulates the binops. Across rules, you'd have to use qi::locals<ast::expression> for such an approach. (This was your question regarding _r1).
  • there are now explicit expectation points, making the grammar more robust
  • the expr rule no longer needs to be an auto-rule (expr = instead of expr %=)

Step 0: Wrestle fusion types directly

Finally, for fun and gory, let me show how you could have handled your suggested code, along with the shifting bindings of _1, _2 etc.:

static ast::expression make_binop(
        const ast::expression& left, 
        const boost::fusion::vector2<char, ast::expression>& op_right)
{
    switch(boost::fusion::get<0>(op_right))
    {
        case '+': return ast::binary_op<ast::add>(left, boost::fusion::get<1>(op_right));
        case '-': return ast::binary_op<ast::sub>(left, boost::fusion::get<1>(op_right));
        case '/': return ast::binary_op<ast::div>(left, boost::fusion::get<1>(op_right));
        case '*': return ast::binary_op<ast::mul>(left, boost::fusion::get<1>(op_right));
    }
    throw std::runtime_error("unreachable in make_op");
}

// rules:
expression::base_type(expr) {
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop %= (simple >> (char_("-+*/") > expr)) 
    [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!!

expr %= binop | simple;

As you can see, not nearly as much fun writing the make_binop function that way!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...