Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

python - Is freeing handled differently for small/large numpy arrays?

I am trying to debug a memory problem with my large Python application. Most of the memory is in numpy arrays managed by Python classes, so Heapy etc. are useless, since they do not account for the memory in the numpy arrays. So I tried to manually track the memory usage using the MacOSX (10.7.5) Activity Monitor (or top if you will). I noticed the following weird behavior. On a normal python interpreter shell (2.7.3):

import numpy as np # 1.7.1
# Activity Monitor: 12.8 MB
a = np.zeros((1000, 1000, 17)) # a "large" array
# 142.5 MB
del a
# 12.8 MB (so far so good, the array got freed)
a = np.zeros((1000, 1000, 16)) # a "small" array
# 134.9 MB
del a
# 134.9 MB (the system didn't get back the memory)
import gc
gc.collect()
# 134.9 MB

No matter what I do, the memory footprint of the Python session will never go below 134.9 MB again. So my question is:

Why are the resources of arrays larger than 1000x1000x17x8 bytes (found empirically on my system) properly given back to the system, while the memory of smaller arrays appears to be stuck with the Python interpreter forever?

This does appear to ratchet up, since in my real-world applications, I end up with over 2 GB of memory I can never get back from the Python interpreter. Is this intended behavior that Python reserves more and more memory depending on usage history? If yes, then Activity Monitor is just as useless as Heapy for my case. Is there anything out there that is not useless?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Reading from Numpy's policy for releasing memory it seems like numpy does not have any special handling of memory allocation/deallocation. It simply calls free() when the reference count goes to zero. In fact it's pretty easy to replicate the issue with any built-in python object. The problem lies at the OS level.

Nathaniel Smith has written an explanation of what is happening in one of his replies in the linked thread:

In general, processes can request memory from the OS, but they cannot give it back. At the C level, if you call free(), then what actually happens is that the memory management library in your process makes a note for itself that that memory is not used, and may return it from a future malloc(), but from the OS's point of view it is still "allocated". (And python uses another similar system on top for malloc()/free(), but this doesn't really change anything.) So the OS memory usage you see is generally a "high water mark", the maximum amount of memory that your process ever needed.

The exception is that for large single allocations (e.g. if you create a multi-megabyte array), a different mechanism is used. Such large memory allocations can be released back to the OS. So it might specifically be the non-numpy parts of your program that are producing the issues you see.

So, it seems like there is no general solution to the problem .Allocating many small objects will lead to a "high memory usage" as profiled by the tools, even thou it will be reused when needed, while allocating big objects wont show big memory usage after deallocation because memory is reclaimed by the OS.

You can verify this allocating built-in python objects:

In [1]: a = [[0] * 100 for _ in range(1000000)]

In [2]: del a

After this code I can see that memory is not reclaimed, while doing:

In [1]: a = [[0] * 10000 for _ in range(10000)]

In [2]: del a

the memory is reclaimed.

To avoid memory problems you should either allocate big arrays and work with them(maybe use views to "simulate" small arrays?), or try to avoid having many small arrays at the same time. If you have some loop that creates small objects you might explicitly deallocate objects not needed at every iteration instead of doing this only at the end.


I believe Python Memory Management gives good insights on how memory is managed in python. Note that, on top of the "OS problem", python adds another layer to manage memory arenas, which can contribute to high memory usage with small objects.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...