Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
455 views
in Technique[技术] by (71.8m points)

In R data.table, how do I pass variable parameters to an expression?

I am stuck with a small R issue with data.table. Your help is much appreciated. How do I do this:

getResult <- function(dt, expr, gby) {
  e <- substitute(expr)
  b <- substitute(gby)
  return(dt[,eval(e),by=b])
}

v1 <- "Sepal.Length"
v2 <- "Species"

dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)

I get following error:

Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument

Now, both v1 and v2 get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length) which seems to work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

An alternative to flodel's answer in the comments could be

e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))

b <- parse(text = v2)

rDT2 <- dt[, eval(e), by = eval(b)]

#               b    V1
# [1,]     setosa 250.3
# [2,] versicolor 296.8
# [3,]  virginica 329.4

EDIT:

And to put this into a function,

getResult <- function(dt, expr, gby){
  return(dt[, eval(expr), by = eval(gby)])
}

(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above


EDIT from Matthew: There's a subtle reason why the paste0 and eval quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...