Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
288 views
in Technique[技术] by (71.8m points)

c++ - Is returning a dereferenced pointer as a reference from a function undefined behavior?

I'm writing a parser for the first time. I'm following this tutorial on Pratt parers. I've got it to work, but I've come up with sort of a problem.

The original tutorial is written in Java. I prefer C++, so that's what I wrote mine with. I was able to basically port most of the code to C++ (although, I did make it "mine" in the sense that there are some non-language related differences). The only real problem I have is with this line of code:

public Expression parse(Parser parser, Token token) {
   Expression operand = parser.parseExpression();
?    return new PrefixExpression(token.getType(), operand);

This works fine in Java (I'm assuming. I've never really worked with Java before, but I assume the guy knows what he's doing), but in C++ not so much. I was able to accomplish the same thing by using pointers like so:

Expression* parse(Parser& parser, Token token) {
    Expression* operand = parser.parseExpression();
    return new PrefixExpression(token.getType(), operand);

Which (although I am unfamiliar with the semantics of Java) seems to do the exact same thing in C++, only with pointers instead of normal objects.

However, the problem with working with pointers like this is that it gets messy kind of fast. Now it become much easier for everything to work with pointers, which means I have to worry about deallocation, and maybe memory leaks if I don't do it right. It just becomes a mess.

Now, the solution seems easy. I could just return PrefixExpression like this:

Expression parse(Parser& parser, Token token) {
    Expression operand = parser.parseExpression();
    return PrefixExpression(token.getType(), operand);

Here's my problem: if I do it like this, I lose the vtable and any extra data in this new Expression. That's a problem since Expression is actually just a base class for many types of expressions. Parse can parse anything it wants to, not just a PrefixExpression. That's how the original was designed. Generally, I like that design, but, as you can see, it's causing problems. Simply returning a new Expression right here loses things I need from that object later.

Now, I can try to solve this by returning a reference:

Expression& parse(Parser& parser, Token token) {
    // ...
    return PrefixExpression(token.getType(), operand);

That solves the vtable and extra data problem, but now that creates a new one. I'm returning a reference to a variable that will be destroyed instantly, which is of no help.

All of this to say, that's why I originally ultimately went with pointers. Pointers let me keep the data I needed later, but they are really hard to work with. I can squeeze by, but personally I'd like something better.

I think I could use std::move, but I'm not familiar with that enough to be certain I'd be using it properly. If I have to I will, but implementing that properly takes some skill and knowledge I just don't have. Besides, that is a lot of work to rework everything I have to work that way up to this point.

All of that to lead to the main point of my question: can I simply just return a reference to a new object safely? Let me just show an example:

Expression& parse(Parser& parser, Token token) {
    //...
    return *(new PrefixExpression(token.getType(), operand));

This would be nice and solve most of my problems because, if it does what I think it does, I get a reference to a new object, keep the vtable and extra data, and it doesn't get destroyed immediately. This would let me have my cake and eat it too.

However, my problem is can I actually do this? While I feel I have a good reason to do this, this to me seems very weird. I'm allocating new data inside a function, and expecting it to get deallocated outside the function automatically like any normal variable. Even if that did work, would that behave as I would expect it to outside this function completely? I am scared that this might be invoking undefined behavior or something like that. What does the standard think of this?

Edit: So here is a requested minimal sample:

Expression:

    // A (not really pure) purely virtual base class that holds all types of expressions
    class Expression {
        protected:
            const std::string type;
        public:
            Expression() : type("default") {}
            virtual ~Expression() {} //Because I'm dealing with pointers, I *think* I need a virtual destructor here. Otherwise, I don't really need 

            virtual operator std::string() {
                // Since I am working with a parser, I want some way to debug and make sure I'm parsing correctly. This was the easiest.
                throw ("ERROR: No conversion to std::string implemented for this expression!");
            }
            // Keep in mind, I may do several other things here, depending on how I want to use Expression
};

A child Expression, for Parenthesis:

    class Paren : public Expression {
        private:
            // Again, Pointer is not my preferred way, but this was just easier, since Parse() was returning a pointer anyway.
            Expression* value;
        public:
            Paren(Expression *e) {
                // I know this is also sketchy. I should be trying to perform a copy here. 
                // However, I'm not sure how to do this, since Expression could be anything.
                // I just decided to write my code so the new object takes ownership of the  pointer. I could and should do better 
                value = e;
            }

            virtual operator std::string() {
                return "(" + std::string(*value) + ")";
            }

            // Because again, I'm working with pointers
            ~Paren() {delete value;}
    };

And a parser:

class Parser {
    private:
        Grammar::Grammar grammar;
    public:
        // this is just a function that creates a unique identifier for each token.
        // Tokens normally have types identifier, number, or symbol.
        // This would work, except I'd like to make grammar rules based off
        // the type of symbol, not all symbols in general
        std::string GetMapKey(Tokenizer::Token token) {
                if(token.type == "symbol") return token.value;
                return token.type;
        }
        // the parsing function
        Expression * parseExpression(double precedence = 0) {
            // the current token
            Token token = consume();

                // detect and throw an error here if we have no such prefix
                if(!grammar.HasPrefix(GetMapKey(token))) {
                    throw("Error! Invalid grammar! No such prefix operator.");
                }

                // get a prefix parselet 
                Grammar::PrefixCallback preParse = grammar.GetPrefixCallback(GetMapKey(token));

                // get the left side
                Expression * left = preParse(token,*this);

                token = peek();

                double debug = peekPrecedence();

                while(precedence < peekPrecedence() && grammar.HasInfix(GetMapKey(token))) {
                    // we peeked the token, now we should consume it, now that we know there are no errors
                    token = consume();

                    // get the infix parser
                    Grammar::InfixCallback inParse = grammar.GetInfixCallback(GetMapKey(token));


                    // and get the in-parsed token
                    left = inParse(token,left,*this);
                }

                return left;
            }

After I posted the parser code, I realized I should mention that I put all the grammar related stuff into its own class. It just has some nice utilities related to grammar, as well as allows us to write a grammar independent parser and worry about the grammar later:

    class Grammar {
        public:
            // I'm in visual studio 2010, which doesn't seem to like the using type = value; syntax, so this instead
            typedef std::function<Expression*(Tokenizer::Token,Parser&)> PrefixCallback;
            typedef std::function<Expression*(Tokenizer::Token, Expression*, Parser&)> InfixCallback;
        private:
            std::map<std::string, PrefixCallback> prefix;
            std::map<std::string, InfixCallback> infix;
            std::map<std::string, double> infixPrecedence; // we'll use double precedence for more flexabillaty
        public:
            Grammar() {
                prefixBindingPower = std::numeric_limits<double>::max();
            }

            void RegisterPrefix(std::string key, PrefixCallback c) {
                prefix[key] = c;
            }

            PrefixCallback GetPrefixCallback(std::string key) {
                return prefix[key];
            }

            bool HasPrefix(std::string key) {
                return prefix.find(key) != prefix.end();
            }

            void RegisterInfix(std::string key, InfixCallback c, double p) {
                infix[key] = c;
                infixPrecedence[key] = p;
            }

            InfixCallback GetInfixCallback(std::string key) {
                return infix[key];
            }

            double GetInfixPrecedence(std::string key) {
                return infixPrecedence[key];
            }

            bool HasInfix(std::string key) {
                return infix.find(key) != infix.end();
            }
    };

Finally, I probably need to show a parsing callback to complete the set:

    Expression* ParenPrefixParselet(Tokenizer::Token token, Parser& parser) {
        Expression* value = parser.parseExpression(0);
        Expression* parenthesis = new Paren(value); // control of value gets given to  our new expression. No need to delete
        parser.consume(")");

        return parenthesis;
    }

That allows me to write a grammar that allows for things in parenthesis like this:

Grammar g;
g.RegisterPrefix("(", &ParenPrefixParselet);

Finally, a main():

int main() {
    Grammar g;
    g.RegisterPrefix("(", &ParenPrefixParselet);
    Parser parser(g);

    Expression* e = parser.parseExpression(0);

    std::cout << static_cast<std::string>(*e);

    return 0;
}

Believe it or not, I think that's pretty minimal. Remember, this is a parser. Keep in mind, that as a minimal example, I plan on it being expanded, but hopefully you get the idea.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You're right - you want a pointer, and to get around scoping you'll need dynamic allocation.

Java's already doing that for you behind the scenes.

Don't use new, though, use a smart pointer, so that it doesn't get messy.

We can't provide "a quote from the standard" about this because we'd have to quote 20 or 30 pages of rules, from how automatic storage duration works, to how dereferencing works, to how lvalues work, to how copying works, to how inheritance works, to how virtual member functions work, etc, etc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...