r - Identify consecutive sequences based on a given variable

Question

Welcome To Ask or Share your Answers For Others

r - Identify consecutive sequences based on a given variable

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Identify consecutive sequences based on a given variable

I am literally stuck on this. The df1 has the following variables:

serial = Group of people
id1 = the person from the group (eg. 12 (serial) 1 (id1) =group 12 person 1; 12 2 = group 12 person 2, etc. )
'Day'when the first (or start) recording was made.

The days consist of equal number of observations (eg.95)

        day1 (Monday)  =  day11-day196 
        day2 (Tuesday) = day21-day296     
        day3 (Wednesday) =  day31-day396   
        day4 (Thursday) =  day41-day496   
        day5 (Friday) = day51-day596      
        day6 (Saturday) = day61-day696   
        day7 (Sunday) =  day71-day796

Example of df1

serial id1  Day     day1 day2 day3 day4 day5 day6 day7
12      1   Monday    2    1    2    1    1    3    1
123     1   Tuesday   0    3    0    3    3    0    3
10      1   Wednesday 0    3    3    3    3    3    3

I would like to identify the consecutive records (there is no gap between the daily records) and the total amount of the records.

The starting day for consecutive recordings is the 'Day` variable. For example a consecutive record would be serial 12. Recording started on Monday and there are records (at leas one from 95 variable) during the week. During the week (7 x 95 variable) there were made 11 records

A non-consecutive record would be id 123 as the there is a gap day on day3 and day6. Record started on Tuesday and there is a gap on Wednesday and Saturday.

Finally I would like to record the duration of the consecutive recording.

Sample output:

 serial  id1   Duration Occurance        Days
12       1      11        7        day1 day2 day3 day4 day5 day6 day7
123      1      12        0        0
10       1      18        5        day3 day4 day5 day6 day7

Sample data

structure(list(serial = c(12, 123, 10), id1 = c(1, 1, 1), Day = structure(1:3, .Label = c("Monday",
"Tuesday", "Wednesday"), class = "factor"), day1 = c(2, 0, 0),
day2 = c(1, 3, 3), day3 = c(2, 0, 3), day4 = c(1, 3, 3),
day5 = c(1, 3, 3), day6 = c(3, 0, 3), day7 = c(1, 3, 3)), row.names = c(NA,
3L), class = "data.frame")

Similar post R - identify consecutive sequences

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:56:01+0000

We can use rleid from data.table to get the 'Occurance' correct

library(data.table)
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday")

out1 <-  do.call(rbind, Map(function(x, y) {
              i1 <- match(y, wkdays): length(x)
              i2 <- x[i1] != 0
              i3 <- all(i2)
              grp1 <- rleid(i2)
              Days <- if(i3) tapply(names(x)[i1][i2], grp1[i2], FUN = paste, collapse= ' ') else ''
             Occurance <- if(i3) length(grp1[i2]) else 0
             data.frame(Occurance, Days)
            }, asplit(df[-(1:3)], 1), df$Day))

 out1$Duration <- rowSums(df1[startsWith(names(df1), 'day')])
 out1
 # Occurance                               Days Duration
 #1         7 day1 day2 day3 day4 day5 day6 day7       11
 #2         0                                          12
 #3         5           day3 day4 day5 day6 day7       18

Categories

r - Identify consecutive sequences based on a given variable

r - Identify consecutive sequences based on a given variable

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags