In R data.table, how do I pass variable parameters to an expression?

Question

Welcome To Ask or Share your Answers For Others

In R data.table, how do I pass variable parameters to an expression?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

In R data.table, how do I pass variable parameters to an expression?

I am stuck with a small R issue with data.table. Your help is much appreciated. How do I do this:

getResult <- function(dt, expr, gby) {
  e <- substitute(expr)
  b <- substitute(gby)
  return(dt[,eval(e),by=b])
}

v1 <- "Sepal.Length"
v2 <- "Species"

dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)

I get following error:

Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument

Now, both v1 and v2 get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length) which seems to work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:20:11+0000

An alternative to flodel's answer in the comments could be

e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))

b <- parse(text = v2)

rDT2 <- dt[, eval(e), by = eval(b)]

#               b    V1
# [1,]     setosa 250.3
# [2,] versicolor 296.8
# [3,]  virginica 329.4

EDIT:

And to put this into a function,

getResult <- function(dt, expr, gby){
  return(dt[, eval(expr), by = eval(gby)])
}

(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above

EDIT from Matthew: There's a subtle reason why the paste0 and eval quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.

Categories

In R data.table, how do I pass variable parameters to an expression?

In R data.table, how do I pass variable parameters to an expression?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags