I am hoping to efficiently compute a co-occurence matrix by finding the co-occurences between two different variables within a group, ideally without using a complex loop that iterates through all possible combinations.
Given that my dataframe looks as follows:
df = data.frame(group = c(1,1,1,2,2,2),var1 = c(1,2,4,2,2,4),var2 = c(4,1,2,1,3,2))
> df
group var1 var2
1 1 1 4
2 1 2 1
3 1 4 2
4 2 2 1
5 2 2 3
6 2 4 2
I am hoping to turn this into a new co-occurence matrix, where the rows represent var1 and columns var2.
EDIT: For those unfamiliar with co-occurences, I am interested in pairs of values that occur simultaneously in a group. For example, the combination of "2" and "1" happens once in group 1, and other time in group 2, thus implying 2 co-occurences. In my example, I put the combination next two each other, but they could occur anywhere within the group.
It should look like the following:
> cooc
1 2 3 4
1 0 2 0 1
2 2 0 1 2
3 0 1 0 0
4 1 2 0 0
I have done this before when dealing with co-occurences using just one variable within a group by using the xtabs function, but not sure how to apply it to multiple columns. For example, if I was interested in finding the co-occurences for var1 within the different groups, I would do the following:
> td = xtabs(~group + var1,data = df)
> cooc = crossprod(td,td)
> diag(cooc) = 0
See Question&Answers more detail:
os