It is always interesting to see a reasonable answer being voted down. Though I don't like this minus score, I would keep my answer. Voter, what do you think?
Let's first get some reproducible toy data:
set.seed(0); x <- replicate(10, sample((0:3), 10, replace = TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 3 0 3 1 1 2 1 3 3 0
# [2,] 1 0 3 1 3 1 3 1 1 0
# [3,] 1 0 0 2 2 3 1 3 2 0
# [4,] 2 2 2 1 3 1 1 1 1 2
# [5,] 3 1 0 0 2 0 1 1 1 3
# [6,] 0 3 1 3 2 0 2 1 3 3
# [7,] 3 1 1 2 3 0 1 3 0 3
# [8,] 3 2 0 3 0 1 1 3 2 1
# [9,] 2 3 1 0 1 2 3 1 0 1
#[10,] 2 1 3 2 2 2 0 3 0 3
For any input matrix x
, you can use:
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i])))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
The comment by @David is doing essentially the same thing but way slower:
y <- combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]]))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
Benchmarking
We generate a 10 * 1000 matrix for experiment:
set.seed(0); x <- replicate(1e+3, sample((0:3), 10, replace = TRUE))
system.time(unlist(lapply(seq_len(ncol(x)-1L), function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i]))))
# user system elapsed
# 0.176 0.032 0.207
system.time(combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]])))
# user system elapsed
# 4.692 0.008 4.708
Something like a distance matrix?
With this idea, you could also generate a "distance" matrix for number of non-equal elements between all columns (just replace the ==
with !=
):
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] != x[, i])))
z <- matrix(0L, ncol(x), ncol(x))
z[lower.tri(z)] <- y
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 0 0 0 0 0 0 0 0 0
# [2,] 9 0 0 0 0 0 0 0 0 0
# [3,] 8 7 0 0 0 0 0 0 0 0
# [4,] 7 9 9 0 0 0 0 0 0 0
# [5,] 8 10 7 7 0 0 0 0 0 0
# [6,] 6 10 9 6 9 0 0 0 0 0
# [7,] 9 7 8 8 7 8 0 0 0 0
# [8,] 6 9 6 7 8 7 8 0 0 0
# [9,] 8 7 9 5 9 7 7 6 0 0
#[10,] 7 5 6 9 8 9 9 7 9 0
Note that only lower triangular matrix is computed due to symmetry. Diagonal are all zeros (or course).