Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

c++ - reinterpret_cast, char*, and undefined behavior

What are the cases where reinterpret_casting a char* (or char[N]) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?


As we learned from this question, the following is undefined behavior:

alignas(int) char data[sizeof(int)];
int *myInt = new (data) int;           // OK
*myInt = 34;                           // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder

But at what point can we do a reinterpret_cast on a char array and have it NOT be undefined behavior? Here are a few simple examples:

  1. No new, just reinterpret_cast:

    alignas(int) char data[sizeof(int)];
    *reinterpret_cast<int*>(data) = 42;    // is the first cast write UB?
    int i = *reinterpret_cast<int*>(data); // how about a read?
    *reinterpret_cast<int*>(data) = 4;     // how about the second write?
    int j = *reinterpret_cast<int*>(data); // or the second read?
    

    When does the lifetime for the int start? Is it with the declaration of data? If so, when does the lifetime of data end?

  2. What if data were a pointer?

    char* data_ptr = new char[sizeof(int)];
    *reinterpret_cast<int*>(data_ptr) = 4;     // is this UB?
    int i = *reinterpret_cast<int*>(data_ptr); // how about the read?
    
  3. What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?

    // bunch of handle functions that do stuff with the members of these types
    void handle(MsgType1 const& );
    void handle(MsgTypeF const& );
    
    char buffer[100]; 
    ::recv(some_socket, buffer, 100)
    
    switch (buffer[0]) {
    case '1':
        handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
        break;
    case 'F':
        handle(*reinterpret_cast<MsgTypeF*>(buffer));
        break;
    // ...
    }
    

Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are two rules at play here:

  1. [basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.

  2. [base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.

These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".

All of your code examples fall prey to the same problem: they're not the object you cast them to:

alignas(int) char data[sizeof(int)];

That creates an object of type char[sizeof(int)]. That object is not an int. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.

Similarly:

char* data_ptr = new char[sizeof(int)];

That also creates an object of type char[sizeof(int)].

char buffer[100];

This creates an object of type char[100]. That object is neither a MsgType1 nor a MsgTypeF. So you cannot access it as if it were either.

Note that the UB here is when you access the buffer as one of the Msg* types, not when you check the first byte. If all your Msg* types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.

switch (buffer[0]) {
case '1':
    {
        MsgType1 msg;
        memcpy(&msg, buffer, sizeof(MsgType1);
        handle(msg);
    }
    break;
case 'F':
    {
        MsgTypeF msg;
        memcpy(&msg, buffer, sizeof(MsgTypeF);
        handle(msg);
    }
    break;
// ...
}

Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.

Does the answer to this question change between C++11 to C++1z?

There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...