This is slow for the reasons given in your second link, and the solution is actually pretty simple: Bypass the (slow) RawArray
slice assignment code, which in this case is inefficiently reading one raw C value at a time from the source array to create a Python object, then converts it straight back to raw C for storage in the shared array, then discards the temporary Python object, and repeats 1e8
times.
But you don't need to do it that way; like most C level things, RawArray
implements the buffer protocol, which means you can convert it to a memoryview
, a view of the underlying raw memory that implements most operations in C-like ways, using raw memory operations if possible. So instead of doing:
# assign memory, very slow
%time temp[:] = np.arange(1e8, dtype = np.uint16)
Wall time: 9.75 s # Updated to what my machine took, for valid comparison
use memoryview
to manipulate it as a raw bytes-like object and assign that way (np.arange
already implements the buffer protocol, and memoryview
's slice assignment operator seamlessly uses it):
# C-like memcpy effectively, very fast
%time memoryview(temp)[:] = np.arange(1e8, dtype = np.uint16)
Wall time: 74.4 ms # Takes 0.76% of original time!!!
Note, the time for the latter is milliseconds, not seconds; copying using memoryview
wrapping to perform raw memory transfers takes less than 1% of the time to do it the plodding way RawArray
does it by default!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…