cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

Question

Welcome To Ask or Share your Answers For Others

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

I am trying to cast a uint8_t array to uint32_t array. However, when i try to do this, I cant seem to be able to access every consecutive 4 bytes.

Let us say I have a uint8_t array with 8 bytes. I would like to access byte 2 -> 6 as one uint32_t.

These all get the same value *((uint32_t*)&uint8Array[0]), *((uint32_t*)&uint8Array[1]), *((uint32_t*)&uint8Array[2]), *((uint32_t*)&uint8Array[3])

While *((uint32_t*)&uint8Array[4]) gets the bytes 4 -> 8 as expected.

So it seem like I can not access 4 consecutive bytes from any address?

Is there any way that I can do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:43:10+0000

While unaligned accesses are not allowed in CUDA, the prmt PTX instruction has a handy mode to emulate the effect of unaligned reads within registers. This can be exposed with a bit of inline PTX assembly. If you can tolerate a read past the end of the array, the code becomes quite simple:

// WARNING! Reads past ptr!
__device__ uint32_t read_unaligned(void* ptr)
{
    uint32_t result;
    asm("{
"
        "   .reg .b64    aligned_ptr;
"
        "   .reg .b32    low, high, alignment;
"
        "   and.b64      aligned_ptr, %1, 0xfffffffffffffffc;
"
        "   ld.u32       low, [aligned_ptr];
"
        "   ld.u32       high, [aligned_ptr+4];
"
        "   cvt.u32.u64  alignment, %1;
"
        "   prmt.b32.f4e %0, low, high, alignment;
"
        "}"
        : "=r"(result) : "l"(ptr));
    return result;
}

To ensure the access past the end of the array remains harmless, round up the number of allocated byte to a multiple of 4, and add another 4 bytes.

Above device code has the same effect as the following code on a little-endian host that tolerates unaligned accesses:

__host__ uint32_t read_unaligned_host(void* ptr)
{
    return *(uint32_t*)ptr;
}

Categories

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags