Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
167 views
in Technique[技术] by (71.8m points)

maximal number of identical elements between any two columns of a matrix in R

I just was wondering if there was an easy way to compute the maximal number of identical elements between any two columns of a matrix in R.

For example, I have a matrix

test <- replicate(10, sample((0:3), 10, replace = TRUE))

test

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    3    0    1    0    2    2    1    0    2     0
 [2,]    1    1    3    2    0    2    3    0    2     2
 [3,]    2    3    0    0    1    2    0    3    0     2
 [4,]    2    2    1    1    2    0    0    1    1     0
 [5,]    2    0    1    2    0    1    1    1    0     0
 [6,]    1    0    1    3    2    3    3    1    3     2
 [7,]    0    1    3    2    1    0    1    2    1     1
 [8,]    0    3    1    3    0    2    3    1    1     1
 [9,]    2    3    1    3    0    1    0    1    3     2
[10,]    3    2    1    0    2    1    3    2    3     1

To compare column 1 and 2 I use

table(test[,1] == test[,2])

FALSE  TRUE 
    8     2 

So there are two identical elements between these two columns.

I could now repeat this for all pairs of columns using two nested for loops and then find the maximum number of TRUE calls but this does not look nice. Can anyone think of a better way?

Cheers,

Maik

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is always interesting to see a reasonable answer being voted down. Though I don't like this minus score, I would keep my answer. Voter, what do you think?


Let's first get some reproducible toy data:

set.seed(0); x <- replicate(10, sample((0:3), 10, replace = TRUE))
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    3    0    3    1    1    2    1    3    3     0
# [2,]    1    0    3    1    3    1    3    1    1     0
# [3,]    1    0    0    2    2    3    1    3    2     0
# [4,]    2    2    2    1    3    1    1    1    1     2
# [5,]    3    1    0    0    2    0    1    1    1     3
# [6,]    0    3    1    3    2    0    2    1    3     3
# [7,]    3    1    1    2    3    0    1    3    0     3
# [8,]    3    2    0    3    0    1    1    3    2     1
# [9,]    2    3    1    0    1    2    3    1    0     1
#[10,]    2    1    3    2    2    2    0    3    0     3

For any input matrix x, you can use:

y <- unlist(lapply(seq_len(ncol(x)-1L),
                   function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i])))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5

The comment by @David is doing essentially the same thing but way slower:

y <- combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]]))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5

Benchmarking

We generate a 10 * 1000 matrix for experiment:

set.seed(0); x <- replicate(1e+3, sample((0:3), 10, replace = TRUE))
system.time(unlist(lapply(seq_len(ncol(x)-1L), function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i]))))
#   user  system elapsed 
#  0.176   0.032   0.207 
system.time(combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]])))
#   user  system elapsed 
#  4.692   0.008   4.708 

Something like a distance matrix?

With this idea, you could also generate a "distance" matrix for number of non-equal elements between all columns (just replace the == with !=):

y <- unlist(lapply(seq_len(ncol(x)-1L),
                   function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] != x[, i])))
z <- matrix(0L, ncol(x), ncol(x))
z[lower.tri(z)] <- y
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    0    0    0    0    0    0    0    0    0     0
# [2,]    9    0    0    0    0    0    0    0    0     0
# [3,]    8    7    0    0    0    0    0    0    0     0
# [4,]    7    9    9    0    0    0    0    0    0     0
# [5,]    8   10    7    7    0    0    0    0    0     0
# [6,]    6   10    9    6    9    0    0    0    0     0
# [7,]    9    7    8    8    7    8    0    0    0     0
# [8,]    6    9    6    7    8    7    8    0    0     0
# [9,]    8    7    9    5    9    7    7    6    0     0
#[10,]    7    5    6    9    8    9    9    7    9     0

Note that only lower triangular matrix is computed due to symmetry. Diagonal are all zeros (or course).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...