Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It's not at all relevant if there are NAs in columns 1:4.
So far I have found functions that allow you to remove rows that have NAs in any of the columns 5:9, but I specifically need to remove only those that have all NAs in columns 5:9.
I wrote my own function to do this, but since I have 300k+ rows, it's very slow. I was wondering is there a more efficient way? This is my code:
remove.select.na<-function(x, cols){
nrm<-vector("numeric")
for (i in 1:nrow(x)){
if (sum(is.na(x[i,cols]))<length(cols)){
nrm<-c(nrm,i)
}
#Console output to track the progress
cat('
',paste0('Checking row ',i,' of ',nrow(x),' (', format(round(i/nrow(x)*100,2), nsmall = 2),'%).'))
flush.console()
}
x<-x[nrm,]
rm(nrm)
return(x)
}
where x is the dataframe and cols is a vector containing names of the columns that should be checked for NAs.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…