Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
623 views
in Technique[技术] by (71.8m points)

c++ - Copying big endian float data directly into a vector<float> and byte swapping in place. Is it safe?

I'd like to be able to copy big endian float arrays directly from an unaligned network buffer into a std::vector<float> and perform the byte swapping back to host order "in place", without involving an intermediate std::vector<uint32_t>. Is this even safe? I'm worried that the big endian float data may accidentally be interpreted as NaNs and trigger unexpected behavior. Is this a valid concern?

For the purposes of this question, assume that the host machine receiving the data is little endian.

Here's some code that demonstrates what I'm trying to do:

std::vector<float> source{1.0f, 2.0f, 3.0f, 4.0f};
std::size_t number_count = source.size();

// Simulate big-endian float values being received from network and stored
// in byte buffer. A temporary uint32_t vector is used to transform the
// source data to network byte order (big endian) before being copied
// to a byte buffer.
std::vector<uint32_t> temp(number_count, 0);
std::size_t byte_length = number_count * sizeof(float);
std::memcpy(temp.data(), source.data(), byte_length);
for (uint32_t& datum: temp)
    datum = ::htonl(datum);
std::vector<uint8_t> buffer(byte_length, 0);
std::memcpy(buffer.data(), temp.data(), byte_length);
// buffer now contains the big endian float data, and is not aligned at word boundaries

// Copy the received network buffer data directly into the destination float vector
std::vector<float> numbers(number_count, 0.0f);
std::memcpy(numbers.data(), buffer.data(), byte_length); // IS THIS SAFE??

// Perform the byte swap back to host order (little endian) in place,
// to avoid needing to allocate an intermediate uint32_t vector.
auto ptr = reinterpret_cast<uint8_t*>(numbers.data());
for (size_t i=0; i<number_count; ++i)
{
    // IS THIS SAFE??
    uint32_t datum;
    std::memcpy(&datum, ptr, sizeof(datum));
    *datum = ::ntohl(*datum);
    std::memcpy(ptr, &datum, sizeof(datum));
    ptr += sizeof(datum);
}

assert(numbers == source);

Note the two "IS THIS SAFE??" comments above.

Motivation: I'm writing a CBOR serialization library with support for typed arrays. CBOR allows typed arrays to be transmitted as either big endian or little endian.

EDIT: Replaced illegal reinterpret_cast<uint32_t*> type punning in endian swap loop with memcpy.

question from:https://stackoverflow.com/questions/65910802/copying-big-endian-float-data-directly-into-a-vectorfloat-and-byte-swapping-in

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

After your edit:

Regarding the auto datum = reinterpret_cast<uint32_t*>(numbers.data());: This is not allowed in C++, one can only safely type-pun to uint8_t (only if CHAR_BIT == 8, more precisely this type-punning exception only holds for the char types)

Old answer: Below is for the question before the edit (the one with bit_cast).

This is safe, provided sizeof(float) == sizeof(uint32_t)

Dont worry about signaling NaNs. The exceptions are usually disabled, and even if they are enabled, they are only happening when a signaling NaN is generated. The move instructions do not generate exceptions.

Accessing the vector elements via data() pointer is supported (for both reading and writing). vector is guarantueed to have a contiguous storage.

But why aren't you doing all in only a single loop without the temp buffers?

Just have the float vector (input or output) and the data buffer (uint8_t vector). For sending just iterate over the float input vector, for each element perform the byte swapping and write the 4 bytes to the data buffer. One at a time. Then you do not need any intermediate buffers. It will probably not be slower. For receiving do the reverse.

Use std::bit_cast for conversion of float from/to std::array<uint8_t,4>. This would be the "correct" way in C++20 (you cant use C arrays directly with bit_cast). With this approach you do not need to invoke ntohl, just copy the bytes in correct order from/to buffer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...