r - Find overlapping regions and extract respective value

Question

Welcome To Ask or Share your Answers For Others

r - Find overlapping regions and extract respective value

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Find overlapping regions and extract respective value

How do you find the overlapping coordinates and extract the respective seg.mean values for the overlapping region?

data1
      Rl       pValue     chr  start    end     CNA
      2        2.594433   6 129740000 129780000 gain
      2        3.941399   6 130080000 130380000 gain
      1        1.992114  10  80900000  81100000 gain
      1        7.175750  16  44780000  44920000 gain

data2

ID     chrom   loc.start   loc.end   num.mark  seg.mean
8410     6     129750000  129760000      8430   0.0039
8410     10    80907000   81000000        5   -1.7738
8410     16    44790000   44910000       12    0.0110

dataoutput

  Rl       pValue     chr  start    end        CNA    seg.mean
  2        2.594433   6 129750000   129760000  gain   0.0039
  1        1.992114  10  80907000   81000000   gain   -1.7738  
  1        7.175750  16  44790000   44910000   gain   0.0110

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:09:01+0000

As @Roland correctly suggested, here's a possible data.table::foverlaps solution

library(data.table)
setDT(data1) ; setDT(data2) # Convert data sets to data.table objects
setnames(data2, c("loc.start", "loc.end"), c("start", "end")) # Rename columns so they will match in both sets
setkey(data2, start, end) # key the smaller data so foverlaps will work
foverlaps(data1, data2, nomatch = 0L)[, 1:5 := NULL][] # run foverlaps and remove the unnecessary columns
#    seg.mean Rl   pValue chr   i.start     i.end  CNA
# 1:   0.0039  2 2.594433   6 129740000 129780000 gain
# 2:  -1.7738  1 1.992114  10  80900000  81100000 gain
# 3:   0.0110  1 7.175750  16  44780000  44920000 gain

Or

indx <- foverlaps(data1, data2, nomatch = 0L, which = TRUE) # run foverlaps in order to find indexes using `which`
data1[indx$xid][, seg.mean := data2[indx$yid]$seg.mean][] # update matches
#    Rl   pValue chr     start       end  CNA seg.mean
# 1:  2 2.594433   6 129740000 129780000 gain   0.0039
# 2:  1 1.992114  10  80900000  81100000 gain  -1.7738
# 3:  1 7.175750  16  44780000  44920000 gain   0.0110

Categories

r - Find overlapping regions and extract respective value

r - Find overlapping regions and extract respective value

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags