Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
198 views
in Technique[技术] by (71.8m points)

c++ - Store different data types in map - with info on type

I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.

I started with something like this:

typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;

However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being

map<string, pair<ValidType, vector<char>>> Data;

Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.

Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?

Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You're looking for a discriminated (or tagged) union.

Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.

If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.


For completeness, a naive discriminated union might look like:

class DU
{
public:
    enum TypeTag { None, Int, Double };
    class DUTypeError {};
private:
    TypeTag type_;
    union {
        int i;
        double d;
    } data_;

    void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
    DU() : type_(None) {}
    DU(DU const &other) : type_(other.type_), data_(other.data_) {}
    DU& operator= (DU const &other) {
        type_=other.type_; data_=other.data_; return *this;
    }

    TypeTag type() const { return type_; }
    bool istype(TypeTag tt) const { return type_ == tt; }

#define CONVERSIONS(TYPE, ENUM, MEMBER) 
    explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } 
    operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } 
    operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } 
    DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }

    CONVERSIONS(int, Int, i)
    CONVERSIONS(double, Double, d)
};

Now, there are several drawbacks:

  • you can't store non-POD types in the union
  • adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
  • you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
    • every one of these switches may also need updating if you add a type
    • if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
  • you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
  • I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern

Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...