you want to use the test is.na
function:
df$value[is.na(df$value)] <- median(df$value, na.rm=TRUE)
which says for all the values where df$value
is NA
, replace it with the right hand side. You need the na.rm=TRUE
piece or else the median
function will return NA
to do this month by month, there are many choices, but i think plyr
has the simplest syntax:
library(plyr)
ddply(df,
.(months),
transform,
value=ifelse(is.na(value), median(value, na.rm=TRUE), value))
you can also use data.table
. this is an especially good choice if your data is large:
library(data.table)
DT <- data.table(df)
setkey(DT, months)
DT[,value := ifelse(is.na(value), median(value, na.rm=TRUE), value), by=months]
There are many other ways, but there are two!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…