Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
581 views
in Technique[技术] by (71.8m points)

python - loading arrays saved using numpy.save in append mode

I save arrays using numpy.save() in append mode:

f = open("try.npy", 'ab')
sp.save(f,[1, 2, 3, 4, 5])
sp.save(f,[6, 7, 8, 9, 10])
f.close()

Can I then load the data in LIFO mode? Namely, if I wish to now load the 6-10 array, do I need to load twice (use b):

f = open("try.npy", 'r')
a = sp.load(f)
b = sp.load(f)
f.close()

or can I straightforward load the second appended save?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm a little surprised that this sequential save and load works. I don't think it is documented (please correct me). But evidently each save is a self contained unit, and load reads to the end of that unit, as opposed to the end of the file.

Think of each load as a readline. You can't read just the last line of a file; you have to read all the ones before it.

Well - there is a way of reading the last - using seek to move the file read to a specific point. But to do that you have to know exactly where the desired block starts.

np.savez is the intended way of saving multiple arrays to a file, or rather to a zip archive.


save saves two parts, a header that contains information like dtype, shape and strides, and a copy of the array's data buffer. The nbytes attribute gives the size of the data buffer. At least this is the case for numeric and string dtypes.

save doc has an example of using an opened file - with seek(0) to rewind the file for use by load.

np.lib.npyio.format has more information on the saving format. Looks like it is possible to determine the length of the header by reading its first few bytes. You could probably use functions in the module to perform all these reads and calculations.


If I read the whole file from the example, I get:

In [696]: f.read()
Out[696]: 
b"x93NUMPYx01x00Fx00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }

 x01x00x00x00x02x00x00x00x03x00x00x00x04x00x00x00x05x00x00x00
x93NUMPYx01x00Fx00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }

 x06x00x00x00x07x00x00x00x08x00x00x00x00x00x00
x00x00x00"

I added line breaks to highlight the distinct pieces of this file. Notice that each save starts with x93NUMPY.

With an open file f, I can read the header (or the first array) with:

In [707]: np.lib.npyio.format.read_magic(f)
Out[707]: (1, 0)
In [708]: np.lib.npyio.format.read_array_header_1_0(f)
Out[708]: ((5,), False, dtype('int32'))

and I can load the data with:

In [722]: np.fromfile(f, dtype=np.int32, count=5)
Out[722]: array([1, 2, 3, 4, 5])

I deduced this from np.lib.npyio.format.read_array function code.

Now the file is positioned at:

In [724]: f.tell()
Out[724]: 100

which is the head of the next array:

In [725]: np.lib.npyio.format.read_magic(f)
Out[725]: (1, 0)
In [726]: np.lib.npyio.format.read_array_header_1_0(f)
Out[726]: ((5,), False, dtype('int32'))
In [727]: np.fromfile(f, dtype=np.int32, count=5)
Out[727]: array([ 6,  7,  8,  9, 10])

and we are at EOF.

And knowing that int32 has 4 bytes, we can calculate that the data occupies 20 bytes. So we could skip over an array by reading the header, calculating the size of the data block, and seek past it to get to the next array. For small arrays that work isn't worth it; but for very large ones, it may be useful.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...