Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
312 views
in Technique[技术] by (71.8m points)

python - sys.getsizeof() results don't quite correlate to structure size

I am trying to create a list of size 1 MB. while the following code works:

dummy = ['a' for i in xrange(0, 1024)]
sys.getsizeof(dummy)
Out[1]: 9032

The following code does not work.

import os
import sys

dummy = []
dummy.append((os.urandom(1024))
sys.getsizeof(dummy)
Out[1]: 104

Can someone explain why?

If you're wondering why I am not using the first code snippet, I am writing a program to benchmark my memory by writing a for loop that writes blocks (of size 1 B, 1 KB and 1 MB) into memory.

start = time.time() 
for i in xrange(1, (1024 * 10)):  
     dummy.append(os.urandom(1024)) #loop to write 1 MB blocks into memory
end = time.time()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you check the size of a list, it will be provide the size of the list data structure, including the pointers to its constituent elements. It won't consider the size of elements.

str1_size = sys.getsizeof(['a' for i in xrange(0, 1024)])
str2_size = sys.getsizeof(['abc' for i in xrange(0, 1024)])
int_size = sys.getsizeof([123 for i in xrange(0, 1024)])
none_size = sys.getsizeof([None for i in xrange(0, 1024)])
str1_size == str2_size == int_size == none_size

The size of empty list: sys.getsizeof([]) == 72
Add an element: sys.getsizeof([1]) == 80
Add another element: sys.getsizeof([1, 1]) == 88
So each element adds 4 bytes.
To get 1024 bytes, we need (1024 - 72) / 8 = 119 elements.

The size of the list with 119 elements: sys.getsizeof([None for i in xrange(0, 119)]) == 1080.
This is because a list maintains an extra buffer for inserting more items, so that it doesn't have to resize every time. (The size comes out to be same as 1080 for number of elements between 107 and 126).

So what we need is an immutable data structure, which doesn't need to keep this buffer - tuple.

empty_tuple_size = sys.getsizeof(())                     # 56
single_element_size = sys.getsizeof((1,))                # 64
pointer_size = single_element_size - empty_tuple_size    # 8
n_1mb = (1024 - empty_tuple_size) / pointer_size         # (1024 - 56) / 8 = 121
tuple_1mb = (1,) * n_1mb
sys.getsizeof(tuple_1mb) == 1024

So this is your answer to get a 1MB data structure: (1,)*121

But note that this is only the size of tuple and the constituent pointers. For the total size, you actually need to add up the size of individual elements.


Alternate:

sys.getsizeof('') == 37
sys.getsizeof('1') == 38     # each character adds 1 byte

For 1 MB, we need 987 characters:

sys.getsizeof('1'*987) == 1024

And this is the actual size, not just the size of pointers.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...