Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
636 views
in Technique[技术] by (71.8m points)

r - Use outer instead of expand.grid

I'm looking for as much speed as possible and staying in base to do what expand.grid does. I have used outer for similar purposes in the past to create a vector; something like this:

v <- outer(letters, LETTERS, paste0)
unlist(v[lower.tri(v)])

Benchmarking has shown me that outer can be drastically faster than expand.grid but this time I want to create two columns just like expand.grid (all possible combos for 2 vectors) but my methods with outer do not benchmark as fast with outer this time.

I'm hoping to take 2 vectors and create every possible combo as two columns as fast as possible (I think outer may be the route but am wide open to any base method.

Here's the expand.grid method and outer method.

dat <- cbind(mtcars, mtcars, mtcars)

expand.grid(seq_len(nrow(dat)), seq_len(ncol(dat)))

FOO <- function(x, y) paste(x, y, sep=":")
x <- outer(seq_len(nrow(dat)), seq_len(ncol(dat)), FOO)
apply(do.call("rbind", strsplit(x, ":")), 2, as.integer)

The microbenchmarking shows outer is slower:

#     expr      min        lq    median        uq      max
# EXPAND.G  812.743  838.6375  894.6245  927.7505 27029.54
#    OUTER 5107.871 5198.3835 5329.4860 5605.2215 27559.08

I think my outer use is slow because I don't know how to use outer to directly create a length 2 vector that I can do.call('rbind' together. I have to slow paste and slow split. How can I do this with outer (or other methods in base) in a way that's faster than expand grid?

EDIT: Adding the microbenchmark results.

**

Unit: microseconds
      expr     min       lq  median      uq       max
1   ERNEST  34.993  39.1920  52.255  57.854 29170.705
2     JOHN  13.997  16.3300  19.130  23.329   266.872
3 ORIGINAL 352.720 372.7815 392.377 418.738 36519.952
4    TOMMY  16.330  19.5960  23.795  27.061  6217.374
5  VINCENT 377.447 400.3090 418.505 451.864 43567.334

**

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The documentation for rep.int isn't quite complete. It isn't just fastest in the most common case because you can pass vectors for the times argument, just like with rep. You can use it straightforward for both sequences reducing the time another 40% or so over Tommy's.

expand.grid.jc <- function(seq1,seq2) {
    cbind(Var1 = rep.int(seq1, length(seq2)), 
    Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2))))
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...