Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
133 views
in Technique[技术] by (71.8m points)

c++ - Using char array inside union

I'm able to print the address and values of ints but not the chars of the union.Why is that so

#include <iostream>

using namespace std;

union Endian
{
    int i;
    char c[sizeof(int)];
    int j;
};

int main(int argc, char *argv[]) {
    Endian e;
    e.i = 20;
    cout << &e.j;
    cout << &e.i;
    cout << &e.c[0]; //Why can't I print this address
    cout << e.c[1]; // Why can't I print this value

}

O/P:0x7fff5451ab68 0x7fff5451ab68

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Disclaimer: OP's tags are quite ambiguous, so this answer uses the code as a frame of reference, which is C++ (use of iostream, pulling in the std namespace, cout).

You're using union in an inappropriate way. But we'll get back to that later.

e.i = 20;

Your code first uses the union as i, an integer. Which is okay. But what you did afterwards is really not a good idea. First you did two somewhat acceptable things:

cout << &e.j;
cout << &e.i;

You queried the address of the two ints in the union, which is marginally fine because they all share storage and the address of the first byte is therefore shared.

cout << &e.c[0]; //Why can't I print this address
cout << e.c[1]; // Why can't I print this value

Now, here's where you're crossing the line. You're now performing implicit pointer arithmetic and dereferencing in terms of indexing into the char[] array and even though you're trying to get the address of the first element, there's possible evaluation of an element which is not the last one set in the union. So, that's a big no-no.

Furthermore, &e.c[0] basically is char* which will be "intercepted" by cout and treated as a C-style string. It will not treat it as a simple address.

cout << e.c[1]; // Why can't I print this value

Undefined behavior. "But, but!", I can hear some of you say. Yes, it is UB in C++. Valid in C99 (6.5/7), and just barely by means of a footnote and some duct tape. It's a simple matter, already explained by LightnessRacesInSpace and Mysticial in the comments of this answer and others.

Yes, you can cast any typed variable you have to a char array and mess with it for whatever purpose you have in mind. But type-punning through unions is illegal in C++, there are no buts and excuses. Yes, it may work. Yes, if you're not bothered by it, you may continue to use it. But per C++ standard, it is clearly illegal.

Unless that member was the last member of the union to which you assigned a value, you shall not retrieve its value. It's as simple as that.

Unions in C++ have a purpose, described below. They can also have member functions and access specifiers. They cannot have virtual functions or static members. Neither can they be used as a base class or inherit from something. And they are not to be used for type-punning. It's illegal in C++.

Read further!

Understanding unions

A union is:

  • A way to allow memory reuse.
  • That's it.

A union is not:

  • A way to cowboy-cast between elements of the union
  • A way to cheat strict aliasing.

Even MSDN's got it right:

A union is a user-defined data or class type that, at any given time, contains only one object from its list of members (although that object can be an array or a class type).

What does this mean? It means that you can define something along the lines of this:

union stuff {

    int i;
    double d;
    float f;    

} m;

The idea is that all of them sit in the same space in memory. Storage of a union is inferred from the largest datatype in a given implementation. Platforms have a lot of freedom here. Freedom the specifications cannot cover. Not C. Not C++.

You must not write to the union as an int and then read it as a float (or anything else) as a way of some weird cowboy reinterpret_cast.

The use of std::cout is for example purposes and simplicity.

This is illegal:

m.i = 5;
std::cout << m.f; // NO. NO. NO. Please, no.

This is legal:

m.i = 5;
std::cout << m.i;

// Now I'm done with i, I have no intention of using it
// If I do, I'll make sure I properly set it.

m.f = 3.0f;
std::cout << m.f; // No "cowboy-interpreting", defined.

// I've got an idea, but I need it to be an int.

m.i = 3; // m.f and m.d are here-by invalidated.
int lol = 5;
m.i += lol;

Notice how there's no "cross-fire". This is the intended usage. Slim memory storage for three variables used at three different times with no fighting.

How did the misconception rise? Some very bad people woke up one day and I bet one of them was a 3D programmer and thought about doing this:

// This is wrong on so many different levels.
union {

    float arr[4];
    struct {
        float x,y,z,w;
    };

};

He undoubtedly had a "noble idea", to access a 4-tuple both as a float array and as individual xyzw members. Now, you know why this is wrong in terms of unions, but there is one more failure in here:

C++ does not have anonymous structs. It does have anonymous unions, for purposes illustrated above to bring it closer to the intended usage (dropping the m. "prefix"), as you can surely see how that benefits the general idea behind unions.

Don't do this. Please.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...