python - How to store an array in hdf5 file which is too big to load in memory?

Question

Welcome To Ask or Share your Answers For Others

python - How to store an array in hdf5 file which is too big to load in memory?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to store an array in hdf5 file which is too big to load in memory?

Is there any way to store an array in an hdf5 file, which is too big to load in memory?

if I do something like this

f = h5py.File('test.hdf5','w')
f['mydata'] = np.zeros(2**32)

I get a memory error.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:53:44+0000

According to the documentation, you can use create_dataset to create a chunked array stored in the hdf5. Example:

>>> import h5py
>>> f = h5py.File('test.h5', 'w')
>>> arr = f.create_dataset('mydata', (2**32,), chunks=True)
>>> arr
<HDF5 dataset "mydata": shape (4294967296,), type "<f4">

Slicing the HDF5 dataset returns Numpy-arrays.

>>> arr[:10]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)
>>> type(arr[:10])
numpy.array

You can set values as for a Numpy-array.

>>> arr[3:5] = 3
>>> arr[:6]
array([ 0.,  0.,  0.,  3.,  3.,  0.], dtype=float32)

I don't know if this is the most efficient way, but you can iterate over the whole array in chunks. And for instance setting it to random values:

>>> import numpy as np
>>> for i in range(0, arr.size, arr.chunks[0]):
        arr[i: i+arr.chunks[0]] = np.random.randn(arr.chunks[0])
>>> arr[:5]
array([ 0.62833798,  0.03631227,  2.00691652, -0.16631022,  0.07727782], dtype=float32)

Categories

python - How to store an array in hdf5 file which is too big to load in memory?

python - How to store an array in hdf5 file which is too big to load in memory?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags