A couple things to point out:
In "Check memory after changing size", you haven't deleted the original DataFrame yet, so this will be using strictly more memory
The Python interpreter is a bit greedy about holding onto OS memory.
I looked into this and can assure you that pandas is not leaking memory. I'm using the memory_profiler (http://pypi.python.org/pypi/memory_profiler) package:
import time, string, pandas, numpy, gc
from memory_profiler import LineProfiler, show_results
import memory_profiler as mprof
prof = LineProfiler()
@prof
def test(nrow=1000000, ncol = 4, timetest = 5):
from_ = nrow // 10
to_ = 9 * nrow // 10
df = pandas.DataFrame(numpy.random.randn(nrow, ncol),
index = numpy.random.randn(nrow),
columns = list(string.letters[0:ncol]))
df_new = df[from_:to_].copy()
del df
del df_new
gc.collect()
test()
# for _ in xrange(10):
# print mprof.memory_usage()
show_results(prof)
And here's the output
10:15 ~/tmp $ python profmem.py
Line # Mem usage Increment Line Contents
==============================================
7 @prof
8 28.77 MB 0.00 MB def test(nrow=1000000, ncol = 4, timetest = 5):
9 28.77 MB 0.00 MB from_ = nrow // 10
10 28.77 MB 0.00 MB to_ = 9 * nrow // 10
11 59.19 MB 30.42 MB df = pandas.DataFrame(numpy.random.randn(nrow, ncol),
12 66.77 MB 7.58 MB index = numpy.random.randn(nrow),
13 90.46 MB 23.70 MB columns = list(string.letters[0:ncol]))
14 114.96 MB 24.49 MB df_new = df[from_:to_].copy()
15 114.96 MB 0.00 MB del df
16 90.54 MB -24.42 MB del df_new
17 52.39 MB -38.15 MB gc.collect()
So indeed, there is more memory in use than when we started. But is it leaking?
for _ in xrange(20):
test()
print mprof.memory_usage()
And output:
10:19 ~/tmp $ python profmem.py
[52.3984375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59375]
[122.59765625]
[122.59765625]
[122.59765625]
So actually what's gone on is that the Python process is holding on to a pool of memory given what it's been using to avoid having to keep requesting more memory (and then freeing it) from the host OS. I don't know all the technical details behind this, but that is at least what is going on.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…