Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
741 views
in Technique[技术] by (71.8m points)

r - Remove the columns with the colsums=0

I have a matrix which its elements are 0, 1,2,NA!
I want to remove the columns which their colsums are equal to 0 or NA! I want to drop these columns from the original matrix and create a new matrix for these columns (nonzero colsums)! (I think for calculating colsums I have consider na.rm=True and remove the colums with colsum=0, because if I consider na.rm=False all the values of my colsums get NA)

this is my matrix format:

mat[1:6,1:6]

1:11059017  1:11088817  1:11090640   1:11099385   1:1109967  1:111144756

 0        0            0             0           NA          0
 0        0            0             0           0          NA
 1       NA            2             0           NA          0    
 0        0            0             1          0           2  
 2        0            0             0          0           0
 0        0            NA            0          0           0

 Summat <-  colSums(mat,na.rm = TRUE)

head(summat)

1:11059017  1:11088817  1:11090640   1:11099385   1:1109967  1:111144756 

[,1]   3           0             2          1           0            2

The 2nd and 5th columns have colsum=0 so I Ishould remove them from the met and keep the rest of columns in another matrix.

my output should be like below:

met-nonzero

 1:11059017      1:11090640     1:11099385     1:111144756

  0             0                  0                0
  0             0                  0                NA
  1             2                  0                0
  0             0                  1                2  
  2             0                  0                0
  0             NA                 0                0

would you please let me know how can I do that?

data:

structure(c(0L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 0L, NA, 0L, 0L, 0L, 1L, 0L, 0L, NA, 0L, NA, 0L, 0L, 
0L, 0L, NA, 0L, 2L, 0L, 0L), .Dim = c(6L, 6L), .Dimnames = list(
    NULL, c("X1.11059017", "X1.11088817", "X1.11090640", "X1.11099385", 
    "X1.1109967", "X1.111144756")))

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Work out which ones have colSums != 0:

i <- (colSums(mat, na.rm=T) != 0) # T if colSum is not 0, F otherwise

Then you can either select or drop them e.g.

matnonzero <- mat[, i] # all the non-zero columns
matzeros <- mat[, !i]  # all the zero columns

update to comment (are there ways to do it without the colSums). IMO, yes, there are, but colSums is one of the more elegant/efficient ways.

You could do something like:

apply(is.na(mat) | mat == 0, 2, all)

which will return TRUE for each column that is all-NA/0, so that

mat[, !apply(is.na(mat) | mat == 0, 2, all)]

will return all the non-zero columns.

However colSums is faster than apply.

system.time( replicate(1000, mat[, !apply(is.na(mat) | mat == 0, 2, all)]) )
#   user  system elapsed 
#  0.068   0.000   0.069 
system.time( replicate(1000, mat[, colSums(mat, na.rm=T) != 0]))
#   user  system elapsed 
#  0.012   0.000   0.013 

I'm sure there are many other ways to do it too.


update again as OP keeps adding to their question in the comments.. The new question is: remove all columns that:

  • have a 0 or a NA
  • the entire column has all of the same value in it.

The mechanics are unchanged - you just come up with a boolean (true or false) for each column deciding whether to keep it or not.

e.g.

Just like if all values in a column are is.na or ==0 you drop the column, with your second condition you could write (e.g.) length(unique({column})) == 1, or all(diff({column})) == 0, or many other equivalent ways.

So to combine them, remember that apply(X, 2, FUN) will apply the function FUN to every column of X.

So you could do:

i <- apply(mat,
      2,
      function (column) {
          any(is.na(col) | col == 0) |
          length(unique(col)) == 1
      })

which returns TRUE if the column has any NAs or 0s, or if the entire column has only one unique value. So this is TRUE if we should discard that column. Then you subset your matrix just as before, i.e.

mat[, !i]

If you wish to add further conditions different to the ones you have already asked for, think them through and give it a try yourself, and if you still can't, ask a new question rather than modifying this one again.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...