Checking the documentation on memoryview:
memoryview objects allow Python code to access the internal data of an
object that supports the buffer protocol without copying.
class memoryview(obj)
Create a memoryview that references obj. obj must support the
buffer protocol. Built-in objects that support the buffer protocol
include bytes and bytearray.
Then we are given the sample code:
>>> v = memoryview(b'abcefg')
>>> v[1]
98
>>> v[-1]
103
>>> v[1:4]
<memory at 0x7f3ddc9f4350>
>>> bytes(v[1:4])
b'bce'
Quotation over, now lets take a closer look:
>>> b = b'long bytes stream'
>>> b.startswith(b'long')
True
>>> v = memoryview(b)
>>> vsub = v[5:]
>>> vsub.startswith(b'bytes')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'memoryview' object has no attribute 'startswith'
>>> bytes(vsub).startswith(b'bytes')
True
>>>
So what I gather from the above:
We create a memoryview object to expose the internal data of a buffer object without
copying, however, in order to do anything useful with the object (by calling the methods
provided by the object), we have to create a copy!
Usually memoryview (or the old buffer object) would be needed when we have a large object,
and the slices can be large too. The need for a better efficiency would be present
if we are making large slices, or making small slices but a large number of times.
With the above scheme, I don't see how it can be useful for either situation, unless
someone can explain to me what I'm missing here.
Edit1:
We have a large chunk of data, we want to process it by advancing through it from start to
end, for example extracting tokens from the start of a string buffer until the buffer is consumed.In C term, this is advancing a pointer through the buffer, and the pointer can be passed
to any function expecting the buffer type. How can something similar be done in python?
People suggest workarounds, for example many string and regex functions take position
arguments that can be used to emulate advancing a pointer. There're two issues with this: first
it's a work around, you are forced to change your coding style to overcome the shortcomings, and
second: not all functions have position arguments, for example regex functions and startswith
do, encode()
/decode()
don't.
Others might suggest to load the data in chunks, or processing the buffer in small
segments larger than the max token. Okay so we are aware of these possible
workarounds, but we are supposed to work in a more natural way in python without
trying to bend the coding style to fit the language - aren't we?
Edit2:
A code sample would make things clearer. This is what I want to do, and what I assumed memoryview would allow me to do at first glance. Lets use pmview (proper memory view) for the functionality I'm looking for:
tokens = []
xlarge_str = get_string()
xlarge_str_view = pmview(xlarge_str)
while True:
token = get_token(xlarge_str_view)
if token:
xlarge_str_view = xlarge_str_view.vslice(len(token))
# vslice: view slice: default stop paramter at end of buffer
tokens.append(token)
else:
break
See Question&Answers more detail:
os