Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

r - Index a data frame row-by-row using column names selected from a variable

Consider the following data frame:

TEST <- structure(list(Value = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
  Select = structure(c(2L, 1L, 3L, 2L, 2L, 1L, 1L,
  2L, 1L, 1L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
  A = c(5L, 5L, 4L, 3L, 4L, 3L, 5L, 3L, 3L, 4L, 5L, 4L), 
  B = c(10L, 8L, 7L, 6L, 3L, 8L, 8L, 7L, 8L, 9L, 11L, 8L), 
  C = c(0L, 1L, 3L, 2L, 0L, 3L, 0L, 2L, 0L, 1L, 1L, 0L)), 
  .Names = c("Value", "Select", "A", "B", "C"), 
  row.names = c(NA, -12L), 
  class = "data.frame")

I want to efficiently assign the Value column, on a row-by-row basis, from the set of columns A, B and C based on the Select column.

For example, in row 1 I want Value to be equal to the element in column B - i.e. Value[1]=10.

My current method is to use a for loop:

for( idx in 1:nrow(TEST) ) {
  TEST$Value[idx] <- TEST[ idx, as.character(TEST$Select[idx]) ]
}

Which results in the desired output:

    Value Select A  B C
 1     10      B 5 10 0
 2      5      A 5  8 1
 3      3      C 4  7 3
 4      6      B 3  6 2
 5      3      B 4  3 0
 6      3      A 3  8 3
 7      5      A 5  8 0
 8      7      B 3  7 2
 9      3      A 3  8 0
 10     4      A 4  9 1
 11     1      C 5 11 1
 12     0      C 4  8 0

Is there a more efficient or alternative way of doing this? I feel like this is some sort of merge() or table join type operation.

P.S. I wasn't quite sure how to describe this operation - any suggestions for a better question/description also welcome.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would use matrix indexing and match. That approach is vectorized, hence much faster than a for or apply loop would give you:

L <- c("A", "B", "C")
TEST$Value <- TEST[L][cbind(seq_len(nrow(TEST)), match(TEST$Select, L))]

If you are not familiar with matrix indexing, it is documented inside ?"[":

A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...