Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
138 views
in Technique[技术] by (71.8m points)

a simple C++11 regex that throws regex_error while searching

I'm using VS2015 community, why the code below throws std::regex_error at a specific length of string on my computer.

#include <string>
#include <regex>
#include <iostream>

int main()
{
    try {
        std::string subject(497, '-');
        std::regex  pattern("(.)+");
        bool match = std::regex_search(subject, pattern);
        std::cout << std::boolalpha << match << std::endl;
    }
    catch(const std::regex_error& e) {
        std::cerr << "1: " << e.what() << std::endl;
    }

    try {
        std::string subject(498, '-');
        std::regex  pattern("(.)+");
        bool match = std::regex_search(subject, pattern);
        std::cout << std::boolalpha << match << std::endl;
    }
    catch(const std::regex_error& e) {
        std::cerr << "2: " << e.what() << std::endl;
    }
}

The output is:

true
2: regex_error(error_stack): There was insufficient memory to determine whether the regular expression could match the specified character sequence.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It causes error_stack even when using cluster group there (?:.)+

I can tell you how to curtail this exception if you want.
And I can tell you what is causing it too.

Use regex.h as a reference..

First, you'll notice these defines near the top

#ifndef _REGEX_MAX_COMPLEXITY_COUNT
  #define _REGEX_MAX_COMPLEXITY_COUNT   10000000L   /* set to 0 to disable */
 #endif /* _REGEX_MAX_COMPLEXITY_COUNT */


#ifndef _REGEX_MAX_STACK_COUNT
  #ifdef _WIN64
   #define _REGEX_MAX_STACK_COUNT   600L    /* set to 0 to disable */
  #else /* _WIN64 */
   #define _REGEX_MAX_STACK_COUNT   1000L   /* set to 0 to disable */
  #endif /* _WIN64 */
#endif /* _REGEX_MAX_STACK_COUNT */

To disable either complexity or max stack, at compile time put the definition
you'd like before the include file <regex>.

For example, set to 0 to disable these protections.
For your test, we can set the stack to 200,000 like this

#define _REGEX_MAX_STACK_COUNT 200000
#include <regex>

This now will actually not throw an exception now for your example.

Second, what causes this in the code.

I just did a glance with a debug session, and found out a general cause.

It all comes in the _Matcher class.

Inside _Match_pat() it firsts decrements _Max_stack_count,
before exiting it increments _Max_stack_count.

_Max_stack_count gets set when _Matcher is instantiated.

template<>class _Matcher
{ // provides ways to match a regular expression to a text sequence

  _Max_stack_count = _REGEX_MAX_STACK_COUNT;

}

The recursion comes from _Match_pat() being called recursively by either
_Do_if() and _Do_rep() which are initially called from _Match_pat()

And the behavior comes mainly from open ended quantified groups (capture or cluster) that can match many, many times.

In your case, its (.)+
Unusually though, all this is doing is matching a character and putting it
in the capture group 1, but appending it to group 0.

In my opinion, the closure should register a increment of the stack count available
(pop), but instead the recursion is being used as a crutch to support
other states that would have to be duplicated to take account of this.

Anyway, you can set breakpoints and try it out for yourself.

Below are the code segments of interest.

    bool _Matcher<>::_Match_pat(_Node_base *_Nx)
    {
        if (0 < _Max_stack_count && --_Max_stack_count <= 0)
            _Xregex_error(regex_constants::error_stack);
        while (_Nx != 0)
        {

            case _N_if:
                if (!_Do_if((_Node_if *)_Nx))
                    _Failed = true;
                _Nx = 0;
                break;

            case _N_endif:
                break;

            case _N_rep:
                if (!_Do_rep((_Node_rep *)_Nx,
                    (_Nx->_Flags & _Fl_greedy) != 0, 0))
                    _Failed = true;
                _Nx = 0;
                break;

            case _N_end_rep:
                {   // return at end of loop
                _Node_rep *_Nr = ((_Node_end_rep *)_Nx)->_Begin_rep;
                _Loop_vals_t *_Psav = &_Loop_vals[_Nr->_Loop_number];

                if (_Nr->_Simple_loop == 0 && !_Do_rep(_Nr,
                    (_Nr->_Flags & _Fl_greedy) != 0, _Psav->_Loop_idx))
                    _Failed = true; // recurse only if loop contains if/do
                _Nx = 0;
                break;
                }


        }
        if (0 < _Max_stack_count)
        ++_Max_stack_count;
    }


    bool _Matcher<>::_Do_rep0()
    {   // apply repetition to loop with no nested if/do

        if (!_Match_pat(_Node->_Next))      // Several of these
    }

    bool _Matcher<>::_Do_if()
    {   // apply if node
        if (_Match_pat(_Node->_Next)) // try to match this branch
    }    

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...