Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
330 views
in Technique[技术] by (71.8m points)

r - Row sums over columns with a certain pattern in their name

I have a data.table like this

dput(DT)
structure(list(ref = c(3L, 3L, 3L, 3L), nb = 12:15, i1 = c(3.1e-05, 
0.044495, 0.82244, 0.322291), i2 = c(0.000183, 0.155732, 0.873416, 
0.648545), i3 = c(0.000824, 0.533939, 0.838542, 0.990648), i4 = c(0.044495, 
0.82244, 0.322291, 0.393595)), .Names = c("ref", "nb", "i1", 
"i2", "i3", "i4"), row.names = c(NA, -4L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000320788>)

DT
#    ref nb       i1       i2       i3       i4
# 1:   3 12 0.000031 0.000183 0.000824 0.044495
# 2:   3 13 0.044495 0.155732 0.533939 0.822440
# 3:   3 14 0.822440 0.873416 0.838542 0.322291
# 4:   3 15 0.322291 0.648545 0.990648 0.393595

Now I want to calculate rows sums, but only including columns which start with an "i" ("i1", "i2", etc)

I have used grep to create a vector of the column names to be summed:

listCol <- colnames(DT)[grep("i", colnames(DT))]
listCol
# [1] "i1" "i2" "i3" "i4"

Then I have tried to loop over columns:

DT$sum <- rep.int(0, nrow(DT))
for (i in listCol){
    DT$sum = DT$sum + DT[ , get(i)]
}

...which gives the desired output:

DT
#    ref nb       i1       i2       i3       i4      sum
# 1:   3 12 0.000031 0.000183 0.000824 0.044495 0.045533
# 2:   3 13 0.044495 0.155732 0.533939 0.822440 1.556606
# 3:   3 14 0.822440 0.873416 0.838542 0.322291 2.856689
# 4:   3 15 0.322291 0.648545 0.990648 0.393595 2.355079

How can I improve my code?


Sub question:

This sub-question includes partially the answer to the previous one :

How to avoid this kind of strange notation :

myrowMeans = function (x){
    rowMeans(x, na.rm = TRUE)
}
DT[ , var := myrowMeans(.SD-myrowMeans(.SD)^2), .SDcols = grep("i", colnames(DT))]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use .SDcols to specify the columns, then take rowSums. Use := to assign new columns:

DT[ ,sum := rowSums(.SD), .SDcols = grep("i", names(DT))]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...