I am looking for a possibility to append data to an existing dataset inside a .h5
file using Python (h5py
).
A short intro to my project: I try to train a CNN using medical image data. Because of the huge amount of data and heavy memory usage during the transformation of the data to NumPy arrays, I needed to split the "transformation" into a few data chunks: load and preprocess the first 100 medical images and save the NumPy arrays to hdf5 file, then load the next 100 datasets and append the existing .h5
file, and so on.
Now, I tried to store the first 100 transformed NumPy arrays as follows:
import h5py
from LoadIPV import LoadIPV
X_train_data, Y_train_data, X_test_data, Y_test_data = LoadIPV()
with h5py.File('.PreprocessedData.h5', 'w') as hf:
hf.create_dataset("X_train", data=X_train_data, maxshape=(None, 512, 512, 9))
hf.create_dataset("X_test", data=X_test_data, maxshape=(None, 512, 512, 9))
hf.create_dataset("Y_train", data=Y_train_data, maxshape=(None, 512, 512, 1))
hf.create_dataset("Y_test", data=Y_test_data, maxshape=(None, 512, 512, 1))
As can be seen, the transformed NumPy arrays are splitted into four different "groups" that are stored into the four hdf5
datasets[X_train, X_test, Y_train, Y_test]
.
The LoadIPV()
function performs the preprocessing of the medical image data.
My problem is that I would like to store the next 100 NumPy arrays into the same .h5
file into the existing datasets: that means that I would like to append to, for example, the existing X_train
dataset of shape [100, 512, 512, 9]
with the next 100 NumPy arrays, such that X_train
becomes of shape [200, 512, 512, 9]
. The same should work for the other three datasets X_test
, Y_train
and Y_test
.
See Question&Answers more detail:
os