Hey Vlad, you have a couple of simple strategies here regarding logs.
The first thing to know is that Mongo can generally handle lots of successive inserts without a lot of RAM. The reason for this is simple, you only insert or update recent stuff. So the index size grows, but the data will be constantly paged out.
Put another way, you can break out the RAM usage into two major parts: index & data.
If you're running typical logging, the data portion is constantly being flushed away, so only the index really stays in RAM.
The second thing to know is that you can mitigate the index issue by putting logs into smaller buckets. Think of it this way. If you collect all of the logs into a date-stamped collection (call it logs20101206
), then you can also control the size of the index in RAM.
As you roll over days, the old index will flush from RAM and it won't be accessed again, so it will simply go away.
but I am also considering using a cron script that dumps and deletes old data
This method of logging by days also helps delete old data. In three months when you're done with the data you simply do db.logs20101206.drop()
and the collection instantly goes away. Note that you don't reclaim disk space (it's all pre-allocated), but new data will fill up the empty spot.
Should I also consider using smaller keys, as suggested on other forums?
Yes.
In fact, I have it built into my data objects. So I access data using logs.action
or logs->action
, but underneath, the data is actually saved to logs.a
. It's really easy to spend more space on "fields" than on "values", so it's worth shrinking the "fields" and trying to abstract it away elsewhere.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…