python - Converting pandas.DataFrame to bytes

Question

Welcome To Ask or Share your Answers For Others

python - Converting pandas.DataFrame to bytes

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Converting pandas.DataFrame to bytes

I need convert the data stored in a pandas.DataFrame into a byte string where each column can have a separate data type (integer or floating point). Here is a simple set of data:

df = pd.DataFrame([ 10, 15, 20], dtype='u1', columns=['a'])
df['b'] = np.array([np.iinfo('u8').max, 230498234019, 32094812309], dtype='u8')
df['c'] = np.array([1.324e10, 3.14159, 234.1341], dtype='f8')

and df looks something like this:

    a            b                  c
0   10  18446744073709551615    1.324000e+10
1   15  230498234019            3.141590e+00
2   20  32094812309             2.341341e+02

The DataFrame knows about the types of each column df.dtypes so I'd like to do something like this:

data_to_pack = [tuple(record) for _, record in df.iterrows()]
data_array = np.array(data_to_pack, dtype=zip(df.columns, df.dtypes))
data_bytes = data_array.tostring()

This typically works fine but in this case (due to the maximum value stored in df['b'][0]. The second line above converting the array of tuples to an np.array with a given set of types causes the following error:

OverflowError: Python int too large to convert to C long

The error results (I believe) in the first line which extracts the record as a Series with a single data type (defaults to float64) and the representation chosen in float64 for the maximum uint64 value is not directly convertible back to uint64.

1) Since the DataFrame already knows the types of each column is there a way to get around creating a row of tuples for input into the typed numpy.array constructor? Or is there a better way than outlined above to preserve the type information in such a conversion?

2) Is there a way to go directly from DataFrame to a byte string representing the data using the type information for each column.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:55:53+0000

You can use df.to_records() to convert your dataframe to a numpy recarray, then call .tostring() to convert this to a string of bytes:

rec = df.to_records(index=False)

print(repr(rec))
# rec.array([(10, 18446744073709551615, 13240000000.0), (15, 230498234019, 3.14159),
#  (20, 32094812309, 234.1341)], 
#           dtype=[('a', '|u1'), ('b', '<u8'), ('c', '<f8')])

s = rec.tostring()
rec2 = np.fromstring(s, rec.dtype)

print(np.all(rec2 == rec))
# True

Categories

python - Converting pandas.DataFrame to bytes

python - Converting pandas.DataFrame to bytes

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags