I am trying to find a simple way to use something like Perl's hash functions in R (essentially caching), as I intended to do both Perl-style hashing and write my own memoisation of calculations. However, others have beaten me to the punch and have packages for memoisation. The more I dig, the more I find, e.g.memoise
and R.cache
, but differences aren't readily clear. In addition, it's not clear how else one can get Perl-style hashes (or Python-style dictionaries) and write one's own memoization, other than to use the hash
package, which doesn't seem to underpin the two memoization packages.
Since I can find no information on CRAN or elsewhere to distinguish between the options, perhaps this should be a community wiki question on SO: What are the options for memoization and caching in R, and what are their differences?
As a basis for comparison, here is a list of the options I've found. Also, it seems to me that all depend on hashing, so I'll note the hashing options as well. Key/value storage is somewhat related, but opens a huge can of worms regarding DB systems (e.g. BerkeleyDB, Redis, MemcacheDB and scores of others).
It looks like the options are:
Hashing
- digest - provides hashing for arbitrary R objects.
Memoization
- memoise - a very simple tool for memoization of functions.
- R.cache - offers more functionality for memoization, though it seems some of the functions lack examples.
Caching
- hash - Provides caching functionality akin to Perl's hashes and Python dictionaries.
Key/value storage
These are basic options for external storage of R objects.
Checkpointing
Other
- Base R supports: named vectors and lists, row and column names of data frames, and names of items in environments. It seems to me that using a list is a bit of a kludge. (There's also
pairlist
, but it is deprecated.)
- The data.table package supports rapid lookups of elements in a data table.
Use case
Although I'm mostly interested in knowing the options, I have two basic use cases that arise:
- Caching: Simple counting of strings. [Note: This isn't for NLP, but general use, so NLP libraries are overkill; tables are inadequate because I prefer not to wait until the entire set of strings are loaded into memory. Perl-style hashes are at the right level of utility.]
- Memoization of monstrous calculations.
These really arise because I'm digging in to the profiling of some slooooow code and I'd really like to just count simple strings and see if I can speed up some calculations via memoization. Being able to hash the input values, even if I don't memoize, would let me see if memoization can help.
Note 1: The CRAN Task View on Reproducible Research lists a couple of the packages (cacher
and R.cache
), but there is no elaboration on usage options.
Note 2: To aid others looking for related code, here a few notes on some of the authors or packages. Some of the authors use SO. :)
- Dirk Eddelbuettel:
digest
- a lot of other packages depend on this.
- Roger Peng:
cacher
, filehash
, stashR
- these address different problems in different ways; see Roger's site for more packages.
- Christopher Brown:
hash
- Seems to be a useful package, but the links to ODG are down, unfortunately.
- Henrik Bengtsson:
R.cache
& Hadley Wickham: memoise
-- it's not yet clear when to prefer one package over the other.
Note 3: Some people use memoise/memoisation others use memoize/memoization. Just a note if you're searching around. Henrik uses "z" and Hadley uses "s".
Question&Answers:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…