Work out which ones have colSums != 0
:
i <- (colSums(mat, na.rm=T) != 0) # T if colSum is not 0, F otherwise
Then you can either select or drop them e.g.
matnonzero <- mat[, i] # all the non-zero columns
matzeros <- mat[, !i] # all the zero columns
update to comment (are there ways to do it without the colSums
).
IMO, yes, there are, but colSums is one of the more elegant/efficient ways.
You could do something like:
apply(is.na(mat) | mat == 0, 2, all)
which will return TRUE for each column that is all-NA/0, so that
mat[, !apply(is.na(mat) | mat == 0, 2, all)]
will return all the non-zero columns.
However colSums
is faster than apply
.
system.time( replicate(1000, mat[, !apply(is.na(mat) | mat == 0, 2, all)]) )
# user system elapsed
# 0.068 0.000 0.069
system.time( replicate(1000, mat[, colSums(mat, na.rm=T) != 0]))
# user system elapsed
# 0.012 0.000 0.013
I'm sure there are many other ways to do it too.
update again as OP keeps adding to their question in the comments..
The new question is: remove all columns that:
- have a 0 or a NA
- the entire column has all of the same value in it.
The mechanics are unchanged - you just come up with a boolean (true or false) for each column deciding whether to keep it or not.
e.g.
Just like if all
values in a column are is.na
or ==0
you drop the column, with your second condition you could write (e.g.) length(unique({column})) == 1
, or all(diff({column})) == 0
, or many other equivalent ways.
So to combine them, remember that apply(X, 2, FUN)
will apply the function FUN
to every column of X
.
So you could do:
i <- apply(mat,
2,
function (column) {
any(is.na(col) | col == 0) |
length(unique(col)) == 1
})
which returns TRUE
if the column has any NAs or 0s, or if the entire column has only one unique value. So this is TRUE
if we should discard that column. Then you subset your matrix just as before, i.e.
mat[, !i]
If you wish to add further conditions different to the ones you have already asked for, think them through and give it a try yourself, and if you still can't, ask a new question rather than modifying this one again.