Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
429 views
in Technique[技术] by (71.8m points)

c++ - Extracting class from demangled symbol

I'm trying to extract the (full) class names from demangled symbol output of nm using boost::regex. This sample program

#include <vector>

namespace Ns1
{
namespace Ns2
{
    template<typename T, class Cont>
    class A
    {
    public:
        A() {}
        ~A() {}
        void foo(const Cont& c) {}
        void bar(const A<T,Cont>& x) {}

    private:
        Cont cont;
    };
}
}

int main()
{
    Ns1::Ns2::A<int,std::vector<int> > a;
    Ns1::Ns2::A<int,std::vector<int> > b;
    std::vector<int> v;

    a.foo(v);
    a.bar(b);
}

will produce the following symbols for class A

Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::A()
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::bar(Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::foo(std::vector<int, std::allocator<int> > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::~A()

I want to extract the class (instance) name Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > preferably using a single regular expression pattern, but I have problems to parse the recursively occuring class specifiers within the <> pairs.

Does anyone know how to do this with a regular expression pattern (that's supported by boost::regex)?

My solution (based on David Hammen's answer, thus the accept):

I don't use (single) regular expressions to extract class and namespace symbols. I have created a simple function that strips off bracketing character pairs (e.g. <> or ()) from the tail of symbol strings:

std::string stripBracketPair(char openingBracket,char closingBracket,const std::string& symbol, std::string& strippedPart)
{
    std::string result = symbol;

    if(!result.empty() &&
       result[result.length() -1] == closingBracket)
    {
        size_t openPos = result.find_first_of(openingBracket);
        if(openPos != std::string::npos)
        {
            strippedPart = result.substr(openPos);
            result = result.substr(0,openPos);
        }
    }
    return result;
}

This is used in two other methods that extract the namespace / class from the symbol:

std::string extractNamespace(const std::string& symbol)
{
    std::string ns;
    std::string strippedPart;
    std::string cls = extractClass(symbol);
    if(!cls.empty())
    {
        cls = stripBracketPair('<','>',cls,strippedPart);
        std::vector<std::string> classPathParts;

        boost::split(classPathParts,cls,boost::is_any_of("::"),boost::token_compress_on);
        ns = buildNamespaceFromSymbolPath(classPathParts);
    }
    else
    {
        // Assume this symbol is a namespace global function/variable
        std::string globalSymbolName = stripBracketPair('(',')',symbol,strippedPart);
        globalSymbolName = stripBracketPair('<','>',globalSymbolName,strippedPart);
        std::vector<std::string> symbolPathParts;

        boost::split(symbolPathParts,globalSymbolName,boost::is_any_of("::"),boost::token_compress_on);
        ns = buildNamespaceFromSymbolPath(symbolPathParts);
        std::vector<std::string> wsSplitted;
        boost::split(wsSplitted,ns,boost::is_any_of(" 	"),boost::token_compress_on);
        if(wsSplitted.size() > 1)
        {
            ns = wsSplitted[wsSplitted.size() - 1];
        }
    }

    if(isClass(ns))
    {
        ns = "";
    }
    return ns;
}

std::string extractClass(const std::string& symbol)
{
    std::string cls;
    std::string strippedPart;
    std::string fullSymbol = symbol;
    boost::trim(fullSymbol);
    fullSymbol = stripBracketPair('(',')',symbol,strippedPart);
    fullSymbol = stripBracketPair('<','>',fullSymbol,strippedPart);

    size_t pos = fullSymbol.find_last_of(':');
    if(pos != std::string::npos)
    {
        --pos;
        cls = fullSymbol.substr(0,pos);
        std::string untemplatedClassName = stripBracketPair('<','>',cls,strippedPart);
        if(untemplatedClassName.find('<') == std::string::npos &&
        untemplatedClassName.find(' ') != std::string::npos)
        {
            cls = "";
        }
    }

    if(!cls.empty() && !isClass(cls))
    {
        cls = "";
    }
    return cls;
}

the buildNamespaceFromSymbolPath() method simply concatenates valid namespace parts:

std::string buildNamespaceFromSymbolPath(const std::vector<std::string>& symbolPathParts)
{
    if(symbolPathParts.size() >= 2)
    {
        std::ostringstream oss;
        bool firstItem = true;
        for(unsigned int i = 0;i < symbolPathParts.size() - 1;++i)
        {
            if((symbolPathParts[i].find('<') != std::string::npos) ||
               (symbolPathParts[i].find('(') != std::string::npos))
            {
                break;
            }
            if(!firstItem)
            {
                oss << "::";
            }
            else
            {
                firstItem = false;
            }
            oss << symbolPathParts[i];
        }
        return oss.str();
    }
    return "";
}

At least the isClass() method uses a regular expression to scan all symbols for a constructor method (which unfortunately doesn't seem to work for classes only containing member functions):

std::set<std::string> allClasses;

bool isClass(const std::string& classSymbol)
{
    std::set<std::string>::iterator foundClass = allClasses.find(classSymbol);
    if(foundClass != allClasses.end())
    {
        return true;
    }

std::string strippedPart;
    std::string constructorName = stripBracketPair('<','>',classSymbol,strippedPart);
    std::vector<std::string> constructorPathParts;

    boost::split(constructorPathParts,constructorName,boost::is_any_of("::"),boost::token_compress_on);
    if(constructorPathParts.size() > 1)
    {
        constructorName = constructorPathParts.back();
    }
    boost::replace_all(constructorName,"(","[\(]");
    boost::replace_all(constructorName,")","[\)]");
    boost::replace_all(constructorName,"*","[\*]");

    std::ostringstream constructorPattern;
    std::string symbolPattern = classSymbol;
    boost::replace_all(symbolPattern,"(","[\(]");
    boost::replace_all(symbolPattern,")","[\)]");
    boost::replace_all(symbolPattern,"*","[\*]");
    constructorPattern << "^" << symbolPattern << "::" << constructorName << "[\(].+$";
    boost::regex reConstructor(constructorPattern.str());

    for(std::vector<NmRecord>::iterator it = allRecords.begin();
        it != allRecords.end();
        ++it)
    {
        if(boost::regex_match(it->symbolName,reConstructor))
        {
            allClasses.insert(classSymbol);
            return true;
        }
    }
    return false;
}

As mentioned the last method doesn't safely find a class name if the class doesn't provide any constructor, and is quite slow on big symbol tables. But at least this seems to cover what you can get out of nm's symbol information.

I have left the tag for the question, that other users may find regex is not the right approach.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is hard to do with perl's extended regular expressions, which are considerably more powerful than anything in C++. I suggest a different tack:

First get rid of the things that don't look like functions such as data (look for the D designator). Stuff like virtual thunk to this, virtual table for that, etc., will also get in your way; get rid of them before you do you the main parsing. This filtering is something where a regexp can help. What you should have left are functions. For each function,

  • Get rid of the stuff after the final closing parenthesis. For example, Foo::Bar(int,double) const becomes Foo::Bar(int,double).

  • Strip the function arguments. The problem here is that you can have parentheses inside the parentheses, e.g., functions that take function pointers as arguments, which might in turn take function pointers as arguments. Don't use a regexp. Use the fact that parentheses match. After this step, Foo::Bar(int,double) becomes Foo::Bar while a::b::Baz<lots<of<template>, stuff>>::Baz(int, void (*)(int, void (*)(int))) becomes a::b::Baz<lots<of<template>, stuff>>::Baz.

  • Now work on the front end. Use a similar scheme to parse through that template stuff. With this, that messy a::b::Baz<lots<of<template>, stuff>>::Baz becomes a::b::Baz::Baz.

  • At this stage, your functions will look like a::b:: ... ::ClassName::function_name. There is a slight problem here with free functions in some namespace. Destructors are a dead giveaway of a class; there's no doubt that you have a class name if the function name starts with a tilde. Constructors are a near giveaway that you have a class at hand -- so long as you don't have a namespace Foo in which you have defined a function Foo.

  • Finally, you may want to re-insert the template stuff you cut out.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...