I am using python and when indexing documents (for search engine) it takes a lot of RAM, after i stop the indexing process the memory is still full (like 8gb of RAM). This is bad because i need my search engine to work all the time and not to reset the OS when i finished indexing. Is there any efficient way how to manage with huge arrays,dictionaries and lists, and how to free them. Any ideas?
I saw also some questions about it on stackoverflow, but they are old:
Python memory footprint vs. heap size
Profile Memory Allocation in Python (with support for Numpy arrays)
Info:
free -t
total used free shared buffers cached
Mem: 5839 5724 114 0 15 1011
-/+ buffers/cache: 4698 1141
Swap: 1021 186 835
Total: 6861 5910 950
top | grep python
3164 root 20 0 68748 31m 1404 R 17 0.5 53:43.89 python
6716 baddc0re 20 0 84788 30m 1692 S 0 0.5 0:06.81 python
ps aux | grep python
root 3164 57.1 0.4 64876 29824 pts/0 R+ May27 54:23 python SE_doc_parse.py
baddc0re 6693 0.0 0.2 53240 16224 pts/1 S+ 00:46 0:00 python index.py
uptime
01:02:40 up 1:43, 3 users, load average: 1.22, 1.46, 1.39
sysctl vm.min_free_kbytes
vm.min_free_kbytes = 67584
The real problem is when i start the script the indexing is fast, but when the usage is increasing it is getting slower.
Document wikidoc_18784 added on 2012-05-28 01:03:46 "fast"
wikidoc_18784
-----------------------------------
Document wikidoc_21934 added on 2012-05-28 01:04:00 "slower"
wikidoc_21934
-----------------------------------
Document wikidoc_22903 added on 2012-05-28 01:04:01 "slower"
wikidoc_22903
-----------------------------------
Document wikidoc_20274 added on 2012-05-28 01:04:10 "slower"
wikidoc_20274
-----------------------------------
Document wikidoc_23013 added on 2012-05-28 01:04:53 "even more slower"
wikidoc_23013
The size of the documents is one or two pages of text maximum. The indexing of 10 pages takes about 2-3 seconds.
Tnx everyone for the help :)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…