Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
146 views
in Technique[技术] by (71.8m points)

What does .SD stand for in data.table in R

.SD looks useful but I do not really know what I am doing with it. What does it stand for? Why is there a preceding period (full stop). What is happening when I use it?

I read: .SD is a data.table containing the subset of x's data for each group, excluding the group column(s). It can be used when grouping by i, when grouping by by, keyed by, and _ad hoc_ by

Does that mean that the daughter data.tables is held in memory for the next operation?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

.SD stands for something like "Subset of Data.table". There's no significance to the initial ".", except that it makes it even more unlikely that there will be a clash with a user-defined column name.

If this is your data.table:

DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)
setkey(DT, y)
DT
#    x y v
# 1: a 1 1
# 2: b 1 3
# 3: c 1 5
# 4: a 3 2
# 5: b 3 4
# 6: c 3 6

Doing this may help you see what .SD is:

DT[ , .SD[ , paste(x, v, sep="", collapse="_")], by=y]
#    y       V1
# 1: 1 a1_b3_c5
# 2: 3 a2_b4_c6

Basically, the by=y statement breaks the original data.table into these two sub-data.tables

DT[ , print(.SD), by=y]
# <1st sub-data.table, called '.SD' while it's being operated on>
#    x v
# 1: a 1
# 2: b 3
# 3: c 5
# <2nd sub-data.table, ALSO called '.SD' while it's being operated on>
#    x v
# 1: a 2
# 2: b 4
# 3: c 6
# <final output, since print() doesn't return anything>
# Empty data.table (0 rows) of 1 col: y

and operates on them in turn.

While it is operating on either one, it lets you refer to the current sub-data.table by using the nick-name/handle/symbol .SD. That's very handy, as you can access and operate on the columns just as if you were sitting at the command line working with a single data.table called .SD ... except that here, data.table will carry out those operations on every single sub-data.table defined by combinations of the key, "pasting" them back together and returning the results in a single data.table!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...