python - Why does pickle take so much longer than np.save?

Question

Welcome To Ask or Share your Answers For Others

python - Why does pickle take so much longer than np.save?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why does pickle take so much longer than np.save?

I want to save a dict or arrays.

I try both with np.save and with pickle and see that the former always take much less time.

My actual data is much bigger but I just present a small piece here for demonstration purposes:

import numpy as np
#import numpy.array as array
import time
import pickle

b = {0: [np.array([0, 0, 0, 0])], 1: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1,  0,  0,  0]), np.array([ 0, -1,  0,  0]), np.array([ 0,  0, -1,  0]), np.array([ 0,  0,  0, -1])], 2: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1,  0,  0]), np.array([ 1,  0, -1,  0]), np.array([ 1,  0,  0, -1])], 3: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1,  0,  0,  0]), np.array([ 0, -1,  0,  0]), np.array([ 0,  0, -1,  0]), np.array([ 0,  0,  0, -1])], 4: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1,  0,  0]), np.array([ 1,  0, -1,  0]), np.array([ 1,  0,  0, -1])], 5: [np.array([0, 0, 0, 0])], 6: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1,  0,  0,  0]), np.array([ 0, -1,  0,  0]), np.array([ 0,  0, -1,  0]), np.array([ 0,  0,  0, -1])], 2: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1,  0,  0]), np.array([ 1,  0, -1,  0]), np.array([ 1,  0,  0, -1])], 7: [np.array([1, 0, 0, 0]), np.array([0, 1, 0, 0]), np.array([0, 0, 1, 0]), np.array([0, 0, 0, 1]), np.array([-1,  0,  0,  0]), np.array([ 0, -1,  0,  0]), np.array([ 0,  0, -1,  0]), np.array([ 0,  0,  0, -1])], 8: [np.array([2, 0, 0, 0]), np.array([1, 1, 0, 0]), np.array([1, 0, 1, 0]), np.array([1, 0, 0, 1]), np.array([ 1, -1,  0,  0]), np.array([ 1,  0, -1,  0]), np.array([ 1,  0,  0, -1])]}


start_time = time.time()
with open('testpickle', 'wb') as myfile:
    pickle.dump(b, myfile)
print("--- Time to save with pickle: %s milliseconds ---" % (1000*time.time() - 1000*start_time))

start_time = time.time()
np.save('numpy', b)
print("--- Time to save with numpy: %s milliseconds ---" % (1000*time.time() - 1000*start_time))

start_time = time.time()
with open('testpickle', 'rb') as myfile:
    g1 = pickle.load(myfile)
print("--- Time to load with pickle: %s milliseconds ---" % (1000*time.time() - 1000*start_time))

start_time = time.time()
g2 = np.load('numpy.npy')
print("--- Time to load with numpy: %s milliseconds ---" % (1000*time.time() - 1000*start_time))

which gives an output:

--- Time to save with pickle: 4.0 milliseconds ---
--- Time to save with numpy: 1.0 milliseconds ---
--- Time to load with pickle: 2.0 milliseconds ---
--- Time to load with numpy: 1.0 milliseconds ---

The time difference is even more pronounced with my actual size (~100,000 keys in the dict).

Why does pickle take longer than np.save, both for saving and for loading?

When should I use pickle?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:35:28+0000

Because as long as the written object contains no Python data,

numpy objects are represented in memory in a much simpler way than Python objects
numpy.save is written in C
numpy.save writes in a supersimple format that needs minimal processing

meanwhile

Python objects have a lot of overhead
pickle is written in Python
pickle transforms the data considerably from the underlying representation in memory to the bytes being written on the disk

Note that if a numpy array does contain Python objects, then numpy just pickles the array, and all the win goes out the window.

Categories

python - Why does pickle take so much longer than np.save?

python - Why does pickle take so much longer than np.save?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags