Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
597 views
in Technique[技术] by (71.8m points)

python - Compress numpy arrays efficiently

I tried various methods to do data compression when saving to disk some numpy arrays.

These 1D arrays contain sampled data at a certain sampling rate (can be sound recorded with a microphone, or any other measurment with any sensor) : the data is essentially continuous (in a mathematical sense ; of course after sampling it is now discrete data).

I tried with HDF5 (h5py) :

f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)

but this is quite slow, and the compression ratio is not the best we can expect.

I also tried with

numpy.savez_compressed()

but once again it may not be the best compression algorithm for such data (described before).

What would you choose for better compression ratio on a numpy array, with such data ?

(I thought about things like lossless FLAC (initially designed for audio), but is there an easy way to apply such an algorithm on numpy data ?)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What I do now:

import gzip
import numpy

f = gzip.GzipFile("my_array.npy.gz", "w")
numpy.save(file=f, arr=my_array)
f.close()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...