IRanges
is a good candidate for such job. No need to use chrom variable.
ir <- IRanges(my.df$start, my.df$stop)
## I create a new grouping variable Note the use of reduce here(performance issue)
my.df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
# chrom name start stop group2
# 1 1 a 70001 71200 2
# 2 1 b 70203 80001 2
# 3 1 c 70060 71051 2
# 4 14 d 40004 42004 1
# 5 16 e 50000872 50000890 3
# 6 16 f 50000872 51000952 3
The new group2 variable is the range indicator. Now using data.table
I can't transform my data to the desired output:
library(data.table)
DT <- as.data.table(my.df)
DT[, list(start=min(start),stop=max(stop),
name=list(name),chrom=unique(chrom)),
by=group2]
# group2 start stop name chrom
# 1: 2 70001 80001 a,b,c 1
# 2: 1 40004 42004 d 14
# 3: 3 50000872 51000952 e,f 16
PS: the collapsed variable name here is not string but a list of factor. This is more efficient and easier to access than a collapased character using paste for example.
EDIT after OP clarification, I will create the group variable by chrom. I mean the Iranges code now is called for each chrom group. I slightly modify your data, to create group of intervals the same chromosome.
my.df<- data.frame(chrom=c(1,1,1,1,14,16,16),
name=c("a","b","c","d","e","f","g"),
start=as.numeric(c(0,3000,70203,70060, 40004, 50000872, 50000872)),
stop=as.numeric(c(1,5000,80001,71051, 42004, 50000890, 51000952)))
library(data.table)
DT <- as.data.table(my.df)
## find interval for each chromsom
DT[,group := {
ir <- IRanges(start, stop);
subjectHits(findOverlaps(ir, reduce(ir)))
},by=chrom]
## Now I group by group and chrom
DT[, list(start=min(start),stop=max(stop),name=list(name),chrom=unique(chrom)),
by=list(group,chrom)]
group chrom start stop name chrom
1: 1 1 0 1 a 1
2: 2 1 3000 5000 b 1
3: 3 1 70060 80001 c,d 1
4: 1 14 40004 42004 e 14
5: 1 16 50000872 51000952 f,g 16