The hdf5
file must be written in table
format (as opposed to fixed
format) in
order to be queryable with pd.read_hdf
's where
argument.
Furthermore, A
must be declared as a data_column:
df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=['A'],
format='table')
or, to specify all columns as (queryable) data columns:
df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=True,
format='table')
Then you could use
pd.read_hdf('/tmp/out.h5', 'results_table', where='A in [1,3,4]')
to select rows where the value column A
is 1, 3 or 4. For example,
import numpy as np
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2],
'B': [0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1],
'C': [34, 32, 35, 34, 31, 34, 29, 34, 12, 34, 32, 34],
'D': [11, 15, 22, 15, 9, 15, 11, 15, 14, 15, 13, 15]})
df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=['A'],
format='table')
print(pd.read_hdf('/tmp/out.h5', 'results_table', where='A in [1,3,4]'))
yields
A B C D
0 1 0 34 11
2 3 1 35 22
3 4 1 34 15
5 1 0 34 15
7 3 0 34 15
8 4 1 12 14
10 1 0 32 13
If you have a very long list of values, vals
, then you could use string formatting to compose the right where
argument:
where='A in {}'.format(vals)