I have a dataframe that looks like this (this is just a subset, actually dataset has 2724098 rows)
> head(dat)
chr start end enhancer motif
chr10 238000 238600 9_EnhA1 GATA6
chr10 238000 238600 9_EnhA1 GATA4
chr10 238000 238600 9_EnhA1 SRF
chr10 238000 238600 9_EnhA1 MEF2A
chr10 375200 375400 9_EnhA1 GATA6
chr10 375200 375400 9_EnhA1 GATA4
chr10 440400 441000 9_EnhA1 GATA6
chr10 440400 441000 9_EnhA1 GATA4
chr10 440400 441000 9_EnhA1 SRF
chr10 440400 441000 9_EnhA1 MEF2A
chr10 441600 442000 9_EnhA1 SRF
chr10 441600 442000 9_EnhA1 MEF2A
I was able to transform my dataset to this format where groups of chr, start, end and enhancer represent a single ID:
> dat
id motif
1 GATA6
1 GATA4
1 SRF
1 MEF2A
2 GATA6
2 GATA4
3 GATA6
3 GATA4
3 SRF
3 MEF2A
4 SRF
4 MEF2A
I want to find the count of every possible pair of motifs, grouped by id. So I want an output table like,
motif1 motif2 count
GATA6 GATA4 3
GATA6 SRF 2
GATA6 MEF2A 2
... and so on for each pair of motif
In the actual dataset, there are 1716 unique motifs. There are 83509 unique id.
Any suggestions on how to proceed?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…