Background
I neeed to replace the NA's in my data frame by using different methods depending on the NA's nature. My data frame come from a study with repeated measures, where some of the Na's are a result of subjects dropping out while others are a result of intermittent missing measurements, defined as one or a sequence of multiple missing measurements, followed by a measured value.
I will be referring to intermittent missing measurements as intermittent NA's.
Problem
I am having trouble testing whether the NA's are the result of intermittent missing measurements, and what functions I should use to replace these NA's with. I would ideally replace these intermittent NA's with the na.locf method. But I need Dropout NA's to be replaced with the baseline OR the last value observed, whichever is greater.
Examples
Example 1
Here is a clean example of NA's that I want to be treated as intermittent NA's with the na.locf imputation:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,NA,NA,15,16,19,NA,12,23,31))
and how I want it the end result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,34,34,15,16,19,19,12,23,31))
Example 2
Here is a clean example of NA's (dropout NA's) that I want to be imputed by the previous non-NA observation OR the baseline value (visit 1), whichever is greatest:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,22,18,15,16,19,NA,NA,NA,NA))
And how I want the end result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,22,18,15,16,19,34,34,34,34))
Example 3
Here is a complex example of a mixture of NA's which need different imputations, here where the previous non-NA observation is greater than the baseline observation (visit 1) for the dropout NA's:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,NA,NA,42,16,19,NA,38,NA,NA))
How I need the result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(34,34,34,42,16,19,19,38,38,38))
Example 4
Another complex example where the baseline observation (visit 1) is greater than the previous non-NA value for the dropout NA's:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(40,NA,NA,42,16,19,NA,38,NA,NA))
How I need the result to be:
data.frame(visit=c(1,2,3,4,5,6,7,8,9,10),value=c(40,40,40,42,16,19,19,38,40,40))
What I have tried
As suggested by @Gregor, upon me stating that this would solve my problems, it was possible to test for the presence of intermittent NA's with:
mutate(is.na(value) & !is.na(lead(value))
But this does not help me with imputing all intermittent NA's and in particular, intermittent NA's that are in a sequence (NA1,NA2,NA3,14), where only NA3 is returned as TRUE after running this test.
See Question&Answers more detail:
os