Even though this is an old question, I was wondering the same thing and I didn't see a solution I liked.
When reading binary data with Python I have found numpy.fromfile
or numpy.fromstring
to be much faster than using the Python struct module. Binary data with mixed types can be efficiently read into a numpy array, using the methods above, as long as the data format is constant and can be described with a numpy data type object (numpy.dtype
).
import numpy as np
import pandas as pd
# Create a dtype with the binary data format and the desired column names
dt = np.dtype([('a', 'i4'), ('b', 'i4'), ('c', 'i4'), ('d', 'f4'), ('e', 'i4'),
('f', 'i4', (256,))])
data = np.fromfile(file, dtype=dt)
df = pd.DataFrame(data)
# Or if you want to explicitly set the column names
df = pd.DataFrame(data, columns=data.dtype.names)
Edits:
- Removed unnecessary conversion of
data.to_list()
. Thanks fxx
- Added example of leaving off the
columns
argument
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…