Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
593 views
in Technique[技术] by (71.8m points)

dataframe - R creating a sequence table from two columns

I have a table as below

product=c("a","b","c")
min=c(1,5,3)
max=c(1,7,7)
dd=data.frame(product,min,max)
> dd
  product min max
1       a   1   1
2       b   5   7
3       c   3   7

I want to create a table which will look like below. I want to create one row for each value between and including min and max for a product

product mm
a 1
b 5
b 6
b 7
c 3
c 4
c 5
c 6
c 7

How can i do it using R? is there any package which would give quick results?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try

library(data.table)
setDT(dd)[, list(mm=min:max), by = product]
#   product mm
#1:       a  1
#2:       b  5
#3:       b  6
#4:       b  7
#5:       c  3
#6:       c  4
#7:       c  5
#8:       c  6
#9:       c  7

Or a faster option would be seq.int(min, max, 1L) as suggested by @David Arenburg

 setDT(dd)[, list(mm = seq.int(min, max, 1L)), by = product]

Benchmarks

library(stringi)
set.seed(24)
product <- unique(stri_rand_strings(1e5,4))
min1 <- sample(1:10, length(product), replace=TRUE)
max1 <- sample(11:15, length(product), replace=TRUE)
dd <- data.frame(product, min1, max1)
dd2 <- copy(dd)

josilber <- function(){res1 <- data.frame(product=rep(dd$product,
                        dd$max1-dd$min1+1),
                  mm=unlist(mapply(seq, dd$min1, dd$max1)))
          }

akrun <- function(){as.data.table(dd2)[, list(mm = seq.int(min1, max1,
          1L)), by = product]}
Ananda <- function() {stack(lapply(split(dd[-1], dd[1]), 
                              function(x) seq(x[[1]], x[[2]])))}
jiber <- function(){res <- by(dd[,-1], dd[,1], function(x) 
              seq(x$min1, x$max1) )
             res <-  as.data.frame(unlist(res))
        data.frame(product=gsub("[0-9]", "", rownames(res)), mm=res[,1])}

system.time(akrun())
#   user  system elapsed 
# 0.129   0.001   0.129 
system.time(josilber())
#  user  system elapsed 
# 0.762   0.002   0.764 

 system.time(Ananda())
 #  user  system elapsed 
 #45.449   0.191  45.636 

system.time(jiber())
#  user  system elapsed 
# 48.013   8.218  56.291 

library(microbenchmark)
microbenchmark(josilber(), akrun(), times=20L, unit='relative')
#Unit: relative
#     expr     min       lq     mean   median       uq      max neval cld
#josilber() 6.39757 6.713236 5.570836 5.901037 5.603639 3.970663    20  b
#   akrun() 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000    20  a 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...