This is actually one of the use-cases of HDF5.
If you just want to be able to access all the datasets from a single file, and don't care how they're actually stored on disk, you can use external links. From the HDF5 website:
External links allow a group to include objects in another HDF5 file and enable the library to access those objects as if they are in the current file. In this manner, a group may appear to directly contain datasets, named datatypes, and even groups that are actually in a different file. This feature is implemented via a suite of functions that create and manage the links, define and retrieve paths to external objects, and interpret link names:
Here's how to do it in h5py:
myfile = h5py.File('foo.hdf5','a')
myfile['ext link'] = h5py.ExternalLink("otherfile.hdf5", "/path/to/resource")
Be careful: when opening myfile
, you should open it with 'a'
if it is an existing file. If you open it with 'w'
, it will erase its contents.
This would be very much faster than copying all the datasets into a new file. I don't know how fast access to otherfile.hdf5
would be, but operating on all the datasets would be transparent - that is, h5py would see all the datasets as residing in foo.hdf5
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…