.SD
stands for something like "S
ubset of D
ata.table". There's no significance to the initial "."
, except that it makes it even more unlikely that there will be a clash with a user-defined column name.
If this is your data.table:
DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)
setkey(DT, y)
DT
# x y v
# 1: a 1 1
# 2: b 1 3
# 3: c 1 5
# 4: a 3 2
# 5: b 3 4
# 6: c 3 6
Doing this may help you see what .SD
is:
DT[ , .SD[ , paste(x, v, sep="", collapse="_")], by=y]
# y V1
# 1: 1 a1_b3_c5
# 2: 3 a2_b4_c6
Basically, the by=y
statement breaks the original data.table into these two sub-data.tables
DT[ , print(.SD), by=y]
# <1st sub-data.table, called '.SD' while it's being operated on>
# x v
# 1: a 1
# 2: b 3
# 3: c 5
# <2nd sub-data.table, ALSO called '.SD' while it's being operated on>
# x v
# 1: a 2
# 2: b 4
# 3: c 6
# <final output, since print() doesn't return anything>
# Empty data.table (0 rows) of 1 col: y
and operates on them in turn.
While it is operating on either one, it lets you refer to the current sub-data.table
by using the nick-name/handle/symbol .SD
. That's very handy, as you can access and operate on the columns just as if you were sitting at the command line working with a single data.table called .SD
... except that here, data.table
will carry out those operations on every single sub-data.table
defined by combinations of the key, "pasting" them back together and returning the results in a single data.table
!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…