Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
719 views
in Technique[技术] by (71.8m points)

r - How to fill NAs with LOCF by factors in data frame, split by country

I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:

country value
AUT     NA
AUT     5
AUT     NA
AUT     NA
GER     NA
GER     NA
GER     7
GER     NA
GER     NA

The following generates the above data frame:

data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))

Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the following data frame:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     5
GER     5
GER     7
GER     7
GER     7

However, the function should only be used on the individual subsets split by the country. The following is the output I would need:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     NA
GER     NA
GER     7
GER     7
GER     7

I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.

Many thanks!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A modern version of the ddply solution is to use the package dplyr:

library(dplyr)
DF %>%
  group_by(county) %>% 
  mutate(value = na.locf(value, na.rm = F))      

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...