When I run a query on a large set of small objects (15k objects with only a few short string and boolean properties), without doing anything with these objects, I see my instance's memory usage continuously increasing (70Mb increase). The memory increase doesn't look proportional to the amount of data it ever needs to keep in memory for just the query.
The loop I use is the following:
cursor = None
while True:
query = MyModel.all()
if cursor:
query.with_cursor(cursor)
fetched = 0
for result in query.run(batch_size = 500):
fetched += 1
# Do something with 'result' here. Actually leaving it empty for
# testing to be sure I don't retain anything myself
if fetched == 500:
cursor = query.cursor()
break
else:
break
To be sure this is not due to appstats, I call appstats.recording.dont_record()
to not record any stats.
Does anyone have any clue what might be going on? Or any pointers on how to debug/profile this?
Update 1: I turned on gc.set_debug(gc.DEBUG_STATS)
on the production code, and I see the garbage collector being called regularly, so it is trying to collect garbage. When I call a gc.collect()
at the end of the loop (also the end of the request); it returns 0
, and doesn't help.
Update 2: I did some hacking to get guppy to work on dev_appserver, and this seemed to point that, after an explicit gc.collect()
at the end of the loop, most of the memory was consumed by a 'dict of google.appengine.datastore.entity_pb.Property'.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…