According to the documentation, you can use create_dataset
to create a chunked array stored in the hdf5. Example:
>>> import h5py
>>> f = h5py.File('test.h5', 'w')
>>> arr = f.create_dataset('mydata', (2**32,), chunks=True)
>>> arr
<HDF5 dataset "mydata": shape (4294967296,), type "<f4">
Slicing the HDF5 dataset
returns Numpy-arrays.
>>> arr[:10]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
>>> type(arr[:10])
numpy.array
You can set values as for a Numpy-array.
>>> arr[3:5] = 3
>>> arr[:6]
array([ 0., 0., 0., 3., 3., 0.], dtype=float32)
I don't know if this is the most efficient way, but you can iterate over the whole array in chunks. And for instance setting it to random values:
>>> import numpy as np
>>> for i in range(0, arr.size, arr.chunks[0]):
arr[i: i+arr.chunks[0]] = np.random.randn(arr.chunks[0])
>>> arr[:5]
array([ 0.62833798, 0.03631227, 2.00691652, -0.16631022, 0.07727782], dtype=float32)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…