Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
229 views
in Technique[技术] by (71.8m points)

c++ - Spirit X3, Is this error handling approach useful?

After reading the the Spirit X3 tutorial on error handling and some experimentation. I was drawn to a conclusion.

I believe there is some room for improvement on the topic of error handing in X3. An important goal from my perspective is to provide a meaningful error message. First and foremost adding a semantic action that will set the _pass(ctx) member to false wouldn’t do it because X3 will try to match something else. Only throwing an x3::expectation_failure will quit the parse function prematurely, i.e. without trying to match anything else. So what is left are the parser directive expect[a] and parser operator> as well as manually throwing x3::expectation_failure from an semantic action. I do believe the vocabulary regarding this error handing is too limited. Please consider the following lines of X3 PEG grammar:

const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 a |
 b |
 c );

Now for expression a I cannot use expect[] or operator>, as other alternatives might be valid. I could be wrong but I think X3 requires me to spell out alternate wrong expressions that can match and if they match they can throw x3::expectation_failure which is cumbersome.

The question is, is there a good way of checking for error conditions in my PEG construct with the ordered alternatives for a, b and c using current X3 facilities?

If the answer is no, I would like to present my idea to provide a reasonable solution for this. I believe I would need a new parser directive for that. What should this directive do? It should call the attached semantic action when the parse fails instead. The attribute is obviously unused, but I would need the _where member to be set on the iterator position on the first occurrence of a parsing mismatch. So if a2 fails, _where should be set 1 after the end of a1. Let’s call the parsing directive neg_sa. That means negate semantic action.

pseudocode

// semantic actions
auto a_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto b_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto c_sa = [&](auto& ctx)
{
  // add _where to vector v

  // now we know we have a *real* error.
  // find the peak iterator value in the vector v
  // the position tells whether it belongs to a, b or c.
  // now we can formulate an error message like: “cannot make sense of b upto this position.”
  // lastly throw x3::expectation_failure
};

// PEG
const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

I hope I presented this idea clearly. Let me know in the comment section if I need to explain something further.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Okay, risking conflating too many things in an example, here goes:

namespace square::peg {
    using namespace x3;

    const auto quoted_string = lexeme['"' > *(print - '"') > '"'];
    const auto bare_string   = lexeme[alpha > *alnum] > ';';
    const auto two_ints      = int_ > int_;

    const auto main          = quoted_string | bare_string | two_ints;

    const auto entry_point   = skip(space)[ expect[main] > eoi ];
} // namespace square::peg

That should do. The key is that the only things that should be expectation points is things that make the respective branch fail BEYOND the point where it was unambiguously the right branch. (Otherwise, there would literally not be a hard expectation).

With two minor get_info specialization for prettier messages1, this could lead to decent error messages even when manually catching the exception:

Live On Coliru

int main() {
    using It = std::string::const_iterator;

    for (std::string const input : {
            "   -89 0038  ",
            "   "-89 0038"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   "-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "
";

        It iter = input.begin(), end = input.end();
        try {
        if (parse(iter, end, square::peg::entry_point)) {
            std::cout << "Parsed successfully
";
        } else {
            std::cout << "Parsing failed
";
        }
        } catch (x3::expectation_failure<It> const& ef) {
            auto pos = std::distance(input.begin(), ef.where());
            std::cout << "Expect " << ef.which() << " at "
                << "
" << input
                << "
" << std::setw(pos) << std::setfill('-') << "" << "^
";
        }
    }
}

Prints

====== "   -89 0038  "
Parsed successfully
====== "   "-89 0038"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expect quoted string, bare string or integer number pair at

    ^
====== "   -89 oops  "
Expect integral number at
       -89 oops 
    -------^
====== "   "-89 0038  "
Expect '"' at
       "-89 0038 
    --------------^
====== "   bareword "
Expect ';' at
       bareword
    ------------^
====== "   -89 3.14  "
Expect eoi at
       -89 3.14 
    --------^

This is already beyond what most people expect from their parsers.

But: Automate That, Also, More Flexible

We might not be content reporting just the one expectation and bailing out. Indeed, you can report and continue parsing as there were just a regular mismatch: this is where on_error comes in.

Let's create a tag base:

struct with_error_handling {
    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const&) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::cout << "Expecting " << ef.which() << " at "
                << "
" << s
                << "
" << std::setw(pos) << std::setfill('-') << "" << "^
";

            return error_handler_result::fail;
        }
};

Now, all we have to do is derive our rule ID from with_error_handlingand BAM!, we don't have to write any exception handlers, rules will simply "fail" with the appropriate diagnostics. What's more, some inputs can lead to multiple (hopefully helpful) diagnostics:

auto const eh = [](auto p) {
    struct _ : with_error_handling {};
    return rule<_> {} = p;
};

const auto quoted_string = eh(lexeme['"' > *(print - '"') > '"']);
const auto bare_string   = eh(lexeme[alpha > *alnum] > ';');
const auto two_ints      = eh(int_ > int_);

const auto main          = quoted_string | bare_string | two_ints;
using main_type = std::remove_cv_t<decltype(main)>;

const auto entry_point   = skip(space)[ eh(expect[main] > eoi) ];

Now, main becomes just:

Live On Coliru

for (std::string const input : { 
        "   -89 0038  ",
        "   "-89 0038"  ",
        "   something123123      ;",
        // undecidable
        "",
        // violate expecations, no successful parse
        "   -89 oops  ",   // not an integer
        "   "-89 0038  ", // missing "
        "   bareword ",    // missing ;
        // trailing debris, successful "main"
        "   -89 3.14  ",   // followed by .14
    })
{
    std::cout << "====== " << std::quoted(input) << "
";

    It iter = input.begin(), end = input.end();
    if (parse(iter, end, square::peg::entry_point)) {
        std::cout << "Parsed successfully
";
    } else {
        std::cout << "Parsing failed
";
    }
}

And the program prints:

====== "   -89 0038  "
Parsed successfully
====== "   "-89 0038"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   "-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

Attribute Propagation, on_success

Parsers aren't very useful when they don't actually parse anything, so let's add some constructive value handling, also showcaseing on_success:

Defining some AST types to receive the attributes:

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

Make sure we can print Values:

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

Now, use the old as<> trick to coerce attribute types, this time with error-handling:

As icing on the cake, let's demonstrate on_success in with_error_handling:

    template<typename It, typename Ctx>
        void on_success(It f, It l, two_i const& v, Ctx const&) const {
            std::cout << "Parsed " << std::quoted(std::string(f,l)) << " as integer pair " << v.first << ", " << v.second << "
";
        }

Now with largely unmodified main program (just prints the result value as well):

Live On Coliru

    It iter = input.begin(), end = input.end();
    Value v;
    if (parse(iter, end, square::peg::entry_point, v)) {
        std::cout << "Result value: " << v << "
";
    } else {
        std::cout << "Parsing failed
";
    }

Prints

====== "   -89 0038  "
Parsed "-89 0038" as integer pair -89, 38
Result value: two_i(-89, 38)
====== "   "-89 0038"  "
Result value: quoted("-89 0038")
====== "   something123123      ;"
Result value: bare(something123123)
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   "-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Parsed "-89 3" as integer pair -89, 3
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

Really Overdoing Things

I don't know about you, but I hate doing side-effects, let alone printing to the console from a parser. Let's use x3::with instead.

We want to append to the diagnostics via the Ctx& argument instead of writing to std::cout in the on_error handler:

struct with_error_handling {
    struct diags;

    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::ostringstream oss;
            oss << "Expecting " << ef.which() << " at "
                << "
" << s
                << "
" << std::setw(pos) << std::setfill('-') << "" << "^";

            x3::get<diags>(ctx).push_back(oss.str());

            return error_handler_result::fail;
        }
};

And on the call site, we can pass the context:

std::vector<std::string> diags;

if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
    std::cout << "Result value: " << v;
} else {
    std::cout << "Parsing failed";
}

std::cout << " with " << diags.size() << " diagnostics messages: 
";

The full program also prints the diagnostics:

Live On Wandbox2

Full Listing

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...