Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
340 views
in Technique[技术] by (71.8m points)

c++ - Using boost::spirit::qi to parse numbers with separators

I am attempting to use boost::spirit::qi to do some parsing. It's actually going quite well, and I successfully have managed to parse numbers in various bases based on a suffix. Examples: 123, c12h, 777o, 110101b.

I then wanted to add the ability to allow a completely ignored separator character, to allow values like 123_456 or 1101_0011b to parse. I tried using the skip parser, but I highly suspect that I completely misunderstood how it was to be used. It compiles just fine, but my attempt to make it ignore the underscore does absolutely nothing at all. Any suggestions on how to make this do what I want would be appreciated. My test code is included below:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
using qi::_val;
using qi::_1;
using qi::skip;
using qi::uint_parser;
using ascii::char_;

template <typename Iterator>
struct unsigned_parser : qi::grammar<Iterator, uint64_t()> {

    unsigned_parser() : unsigned_parser::base_type(start) {
        uint_parser<uint64_t, 10> dec_parser;
        uint_parser<uint64_t, 16> hex_parser;
        uint_parser<uint64_t, 8> oct_parser;
        uint_parser<uint64_t, 2> bin_parser;

        start = skip(char_('_'))[
            /* binary with suffix */
            (bin_parser[_val=_1] >> char_("bByY"))
            /* octal with suffix */
            | (oct_parser[_val=_1] >> char_("qQoO"))
            /* hexadecimal with suffix */
            | (hex_parser[_val=_1] >> char_("hHxX"))
            /* decimal with optional suffix */
            | (dec_parser[_val=_1] >> -char_("dDtT"))
            ];
    }

    qi::rule<Iterator, uint64_t()> start;
};

int main(int argv, const char *argc[]) {
    typedef std::string::const_iterator iter;
    unsigned_parser<iter> up;
    uint64_t val;
    if (argv != 2) {
        std::cerr << "Usage: " << argc[0] << " <input>" << std::endl;
        return 1;
    }
    std::string test(argc[1]);
    iter i = test.begin();
    iter end = test.end();
    bool rv = parse(i, end, up, val);
    if (rv && i == end) {
        std::cout << "Succeeded: " << val << std::endl;
        return 0;
    }
    if (rv) {
        std::cout << "Failed partial parse: " << val << std::endl;
        return 1;
    }
    std::cout << "Failed." << std::endl;
    return 1;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Aw. Nobody should have to bother with implementation details like Spirit parser contexts unless you're extending the library and implementing your own parser directives.

Until that time, phoenix::function<>, phoenix::bind or even BOOST_PHOENIX_ADAPT_FUNCTION should be plenty for anyone.

Here are two approaches to your question without any patches to the library.

  1. Straightforward parsing Live On Coliru

    This could be viewed as the "naive" way of parsing the different styles of integers using just Qi and simple semantic actions:

    start = 
          eps [_val=0] >> +(char_("0-9a-fA-F") [ _val = _val*16 + _decode(_1) ] | '_')>>  char_("hHxX") /* hexadecimal with suffix */
        | eps [_val=0] >> +(char_("0-7")       [ _val = _val* 8 + _decode(_1) ] | '_')>>  char_("qQoO") /* octal       with suffix */
        | eps [_val=0] >> +(char_("01")        [ _val = _val* 2 + _decode(_1) ] | '_')>>  char_("bByY") /* binary      with suffix */
        | eps [_val=0] >> +(char_("0-9")       [ _val = _val*10 + _decode(_1) ] | '_')>> -char_("dDtT") /* decimal     with optional suffix */
        ;
    

    Of course, you will want to know what _decode looks like. Well you define it yourself:

    struct decode {
        template <typename> struct result { typedef int type; };
        template <typename Ch> int operator()(Ch ch) const {
            if (ch>='0' && ch<='9') return ch - '0';
            if (ch>='a' && ch<='z') return ch - 'a' + 10;
            if (ch>='A' && ch<='Z') return ch - 'A' + 10;
            assert(false);
        }
    };
    boost::phoenix::function<decode> _decode;
    
  2. Using BOOST_PHOENIX_ADAPT_FUNCTION macro Live On Coliru

    Instead of defining the function object you can use the macro

    int decode(char ch) {
        if (ch>='0' && ch<='9') return ch - '0';
        if (ch>='a' && ch<='z') return ch - 'a' + 10;
        if (ch>='A' && ch<='Z') return ch - 'A' + 10;
        assert(false);
    }
    
    BOOST_PHOENIX_ADAPT_FUNCTION(int, _decode, decode, 1)
    
  3. Using std::strtoul Live On Coliru

    Of course, the above may be a tad "complex" because it requires you to deal with nitty gritty details of integer arithmetics and digit decoding.

    Also, the "naive" approach does some duplicate work in case the literal is a decimal value like "101_101". It will calculate the subresult for the hex, octal and binary branches before realizing it was a decimal.

    So we could change the order around:

    start = 
            (raw[+char_("_0-9a-fA-F")] >>  char_("hHxX")) [ _val = _strtoul(_1,16) ] /* hexadecimal with suffix */
          | (raw[+char_("_0-7")]       >>  char_("qQoO")) [ _val = _strtoul(_1, 8) ] /* octal       with suffix */
          | (raw[+char_("_01")]        >>  char_("bByY")) [ _val = _strtoul(_1, 2) ] /* binary      with suffix */
          | (raw[+char_("_0-9")]       >> -char_("dDtT")) [ _val = _strtoul(_1,10) ] /* decimal     with optional suffix */
          ;
    

    Again you will be curious how we implemented _evaluate? It's a function that takes the synthesized attributes from raw (which is an iterator range) and the base, which is definitely known by then:

    struct strtoul_f {
        template <typename, typename> struct result { typedef uint64_t type; };
        template <typename Raw, typename Int> uint64_t operator()(Raw raw, Int base) const {
            std::string s(raw.begin(), raw.end());
            s.erase(std::remove(s.begin(), s.end(), '_'), s.end());
            char *f(&s[0]), *l(f+s.size());
            return std::strtoul(f, &l, base);
        }
    };
    boost::phoenix::function<strtoul_f> _strtoul;
    

    As you can see, the only complexity is removing the _ from the range first.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...