r - Combinatorial iterator like expand.grid

Question

Welcome To Ask or Share your Answers For Others

r - Combinatorial iterator like expand.grid

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Combinatorial iterator like expand.grid

Is there a fast way to iterate through combinations like those returned by expand.grid or CJ (data.table). These get too big to fit in memory when there are enough combinations. There is iproduct in itertools2 library (port of Python's itertools) but it is really slow (at least the way I'm using it - shown below). Are there other options?

Here is an example, where the idea is to apply a function to each combination of rows from two data.frames (previous post).

library(data.table)  # CJ
library(itertools2)  # iproduct iterator
library(doParallel)

## Dimensions of two data
dim1 <- 10
dim2 <- 100
df1 <- data.frame(a = 1:dim1, b = 1:dim1)
df2 <- data.frame(x= 1:dim2, y = 1:dim2, z = 1:dim2)

## function to apply to combinations
f <- function(...) sum(...)

## Too big to expand with bigger dimensions (ie, 1e6, 1e5) -> errors
## test <- expand.grid(seq.int(dim1), seq.int(dim2))
## test <- CJ(indx1 = seq.int(dim1), indx2 = seq.int(dim2))
## Error: cannot allocate vector of size 3.7 Gb

## Create an iterator over the cartesian product of the two dims
it <- iproduct(x=seq.int(dim1), y=seq.int(dim2))

## Setup the parallel backend
cl <- makeCluster(4)
registerDoParallel(cl)

## Run
res <- foreach(i=it, .combine=c, .packages=c("itertools2")) %dopar% {
  f(df1[i$x, ], df2[i$y, ])
}
stopCluster(cl)

## Expand.grid results (different ordering)
expgrid <- expand.grid(x=seq(dim1), y=seq(dim2))
test <- apply(expgrid, 1, function(i) f(df1[i[["x"]],], df2[i[["y"]],]))

all.equal(sort(test), sort(res))  # TRUE

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:28:51+0000

I think you'll get better performance if you give each of the workers a chunk of one of the data frames, have them each perform the computations, and then combine the results. This results in more efficient computation and reduced memory usage by the workers.

Here is an example that uses the isplitRow function from the itertools package:

library(doParallel)
library(itertools)
dim1 <- 10
dim2 <- 100
df1 <- data.frame(a = 1:dim1, b = 1:dim1)
df2 <- data.frame(x= 1:dim2, y = 1:dim2, z = 1:dim2)
f <- function(...) sum(...)

nw <- 4
cl <- makeCluster(nw)
registerDoParallel(cl)

res <- foreach(d2=isplitRows(df2, chunks=nw), .combine=c) %dopar% {
  expgrid <- expand.grid(x=seq(dim1), y=seq(nrow(d2)))
  apply(expgrid, 1, function(i) f(df1[i[["x"]],], d2[i[["y"]],]))
}

I split df2 because it has more rows, but you could choose either.

Categories

r - Combinatorial iterator like expand.grid

r - Combinatorial iterator like expand.grid

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags