Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
286 views
in Technique[技术] by (71.8m points)

r - Best practices for storing and using data frames too large for memory?

I'm working with a large data frame, and have run up against RAM limits. At this point, I probably need to work with a serialized version on the disk. There are a few packages to support out-of-memory operations, but I'm not sure which one will suit my needs. I'd prefer to keep everything in data frames, so the ff package looks encouraging, but there are still compatibility problems that I can't work around.

What's the first tool to reach for when you realize that your data has reached out-of-memory scale?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You probably want to look at these packages:

  • ff for 'flat-file' storage and very efficient retrieval (can do data.frames; different data types)
  • bigmemory for out-of-R-memory but still in RAM (or file-backed) use (can only do matrices; same data type)
  • biglm for out-of-memory model fitting with lm() and glm()-style models.

and also see the High-Performance Computing task view.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...