Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

r - Find values in data frame 2 which is found in data frame 1, within a certain range

I want to find which values in df2 which is also present in df1, within a certain range. One value is considering both a and b in the data frames (a & b can't split up). For examples, can I find 9,1 (df1[1,1]) in df2? It doesn't have to be on the same position. Also, we can allow a diff of for example 1 for "a" and 1 for "b". For example, I want to find all values 9+-1,1+-1 in df2. "a" & "b" always go together, each row stick together. Does anyone have a suggestion of how to code this? Many many thanks!

set.seed(1)
a <- sample(10,5)
set.seed(1)
b <- sample(5,5, replace=T)
feature <- LETTERS[1:5] 
df1 <- data.frame(feature,a,b)
df1
> df1
 feature a b
       A 9 1
       B 4 4
       C 7 1
       D 1 2
       E 2 5
set.seed(2)
a <- sample(10,5)
b <- sample(5,5, replace=T)
feature <- LETTERS[1:5] 
df2 <- data.frame(feature,a,b)
df2
df2
 feature  a b
       A  5 1
       B  6 4
       C  9 5
       D  1 1
       E 10 2

Not correct but Im imaging this can be done for a for loop somehow!

for(i in df1[,1]) {
  for(j in df1[,2]){
    s<- c(s,(df1[i,1] & df1[j,2]== df2[,1] & df2[,2]))# how to add certain allowed diff levels?
  }
}
s


Output wanted:
feature_df1 <- LETTERS[1:5] 
match <- c(1,0,0,1,0)
feature_df2 <- c("E","","","D", "")
df <- data.frame(feature_df1, match, feature_df2) 
df
 feature_df1 match feature_df2
           A     1           E
           B     0            
           C     0            
           D     1           D
           E     0            
question from:https://stackoverflow.com/questions/66060237/find-values-in-data-frame-2-which-is-found-in-data-frame-1-within-a-certain-ran

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I loooove data.table, which is (imo) the weapon of choice for these kind of problems..

library( data.table )
#make df1 and df2 a data.table
setDT(df1, key = "feature"); setDT(df2)
#now perform a join operation on each row of df1,
# creating an on-the-fly subset of df2
df1[ df1, c( "match", "feature_df2") := {
  val = df2[ a %between% c( i.a - 1, i.a + 1) & b %between% c(i.b - 1, i.b + 1 ), ]
  unique_val = sort( unique( val$feature ) )
  num_val    = length( unique_val )
  list( num_val, paste0( unique_val, collapse = ";" ) )
}, by = .EACHI ][]

#    feature a b match feature_df2
# 1:       A 9 1     1           E
# 2:       B 4 4     0            
# 3:       C 7 1     0            
# 4:       D 1 2     1           D
# 5:       E 2 5     0            

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...