r - Remove rows which have all NAs in certain columns

Question

Welcome To Ask or Share your Answers For Others

r - Remove rows which have all NAs in certain columns

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Remove rows which have all NAs in certain columns

Suppose you have a dataframe with 9 columns. You want to remove cases which have all NAs in columns 5:9. It's not at all relevant if there are NAs in columns 1:4.

So far I have found functions that allow you to remove rows that have NAs in any of the columns 5:9, but I specifically need to remove only those that have all NAs in columns 5:9.

I wrote my own function to do this, but since I have 300k+ rows, it's very slow. I was wondering is there a more efficient way? This is my code:

remove.select.na<-function(x, cols){
  nrm<-vector("numeric")
  for (i in 1:nrow(x)){
    if (sum(is.na(x[i,cols]))<length(cols)){
      nrm<-c(nrm,i)
    }
    #Console output to track the progress
    cat('
',paste0('Checking row ',i,' of ',nrow(x),' (', format(round(i/nrow(x)*100,2), nsmall = 2),'%).'))
    flush.console()
  }
  x<-x[nrm,]
  rm(nrm)
  return(x)
}

where x is the dataframe and cols is a vector containing names of the columns that should be checked for NAs.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:40:18+0000

This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums() with is.na() it is easy to check whether all entries in these 5 columns are NA:

x <- x[rowSums(is.na(x[,5:9]))!=5,]

Categories

r - Remove rows which have all NAs in certain columns

r - Remove rows which have all NAs in certain columns

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags