I'm a little surprised that this sequential save and load works. I don't think it is documented (please correct me). But evidently each save
is a self contained unit, and load
reads to the end of that unit, as opposed to the end of the file.
Think of each load
as a readline
. You can't read just the last line of a file; you have to read all the ones before it.
Well - there is a way of reading the last - using seek
to move the file read to a specific point. But to do that you have to know exactly where the desired block starts.
np.savez
is the intended way of saving multiple arrays to a file, or rather to a zip archive.
save
saves two parts, a header that contains information like dtype
, shape
and strides
, and a copy of the array's data buffer. The nbytes
attribute gives the size of the data buffer. At least this is the case for numeric and string dtypes.
save
doc has an example of using an opened file - with seek(0)
to rewind the file for use by load
.
np.lib.npyio.format
has more information on the saving format. Looks like it is possible to determine the length of the header by reading its first few bytes. You could probably use functions in the module to perform all these reads and calculations.
If I read the whole file from the example, I get:
In [696]: f.read()
Out[696]:
b"x93NUMPYx01x00Fx00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }
x01x00x00x00x02x00x00x00x03x00x00x00x04x00x00x00x05x00x00x00
x93NUMPYx01x00Fx00
{'descr': '<i4', 'fortran_order': False, 'shape': (5,), }
x06x00x00x00x07x00x00x00x08x00x00x00x00x00x00
x00x00x00"
I added line breaks to highlight the distinct pieces of this file. Notice that each save
starts with x93NUMPY
.
With an open file f
, I can read the header (or the first array) with:
In [707]: np.lib.npyio.format.read_magic(f)
Out[707]: (1, 0)
In [708]: np.lib.npyio.format.read_array_header_1_0(f)
Out[708]: ((5,), False, dtype('int32'))
and I can load the data with:
In [722]: np.fromfile(f, dtype=np.int32, count=5)
Out[722]: array([1, 2, 3, 4, 5])
I deduced this from np.lib.npyio.format.read_array
function code.
Now the file is positioned at:
In [724]: f.tell()
Out[724]: 100
which is the head of the next array:
In [725]: np.lib.npyio.format.read_magic(f)
Out[725]: (1, 0)
In [726]: np.lib.npyio.format.read_array_header_1_0(f)
Out[726]: ((5,), False, dtype('int32'))
In [727]: np.fromfile(f, dtype=np.int32, count=5)
Out[727]: array([ 6, 7, 8, 9, 10])
and we are at EOF.
And knowing that int32
has 4 bytes, we can calculate that the data occupies 20 bytes. So we could skip over an array by reading the header, calculating the size of the data block, and seek
past it to get to the next array. For small arrays that work isn't worth it; but for very large ones, it may be useful.