To answer your edit, just run a benchmark:
a = data.table(id=letters[1:2], var=1:2)
b = copy(a)
c = copy(b) # let's also just try modifying same value in place
# to see how well changing existing values does
microbenchmark(a <- rbind(a, data.table(id="c", var=3)),
b <- rbindlist(list(b, data.table(id="c", var=3))),
c[1, var := 3L],
set(c, 1L, 2L, 3L))
#Unit: microseconds
# expr min lq median uq max neval
# a <- rbind(a, data.table(id = "c", var = 3)) 865.460 1141.2585 1357.1230 1539.4300 6814.492 100
#b <- rbindlist(list(b, data.table(id = "c", var = 3))) 260.440 325.3835 445.4190 522.8825 1143.930 100
# c[1, `:=`(var, 3L)] 482.147 626.5570 778.3135 904.3595 1109.539 100
# ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?set(c, 1L, 2L, 3L) ? 2.339 ? ?5.677 ? ?7.5140 ? ?9.5170 ? 19.033 ? 100
rbindlist
is clearly better than rbind
. Thanks to Matthew Dowle pointing out the problems with using [
in a loop, I added another benchmark with set
.
From the above your best options are using rbindlist
, or sizing the data.table
to begin with and then just populating the values (you can also use a similar strategy to std::vector
in C++
, and double the size every time you run out of space, if you don't know the size of the data to begin with, and then once you're done filling it in, delete the extra rows).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…