If you have a set of ranges, such as the following simple example...
[
[12, 25], #1
[14, 27], #2
[15, 22], #3
[17, 21], #4
[20, 65], #5
[62, 70], #6
[64, 80] #7
]
... how do you compute the maximally intersecting subset (not sure quite how to phrase it, but I mean "the subset of ranges which intersects and has the highest cardinality") and determine the degree of intersection (cardinality of ranges in that subset)?
Logically I can work it out, and might be able to translate that to a naive algorithm. Going down the list, we see that 1-5 intersect, and 5-7 intersect, and that #5 intersects both sets.
The result I want is simply the subset, since that gives me the information about the cardinality, and I can easily compute the intersection of the set as long as they all intersect. In the above example, it would be [[14, 27],[15, 22],[12, 25],[17, 21],[20, 65]]
.
Off the top of my head, I might try converting each range to a graph node, connecting the ones which are intersecting, and finding the largest fully-connected graph.
I was also thinking iteratively to start at the beginning, continue building up a list of intersecting ranges with a running intersection on each to check against—until you hit an element which doesn't intersect, then start a new list. Continue checking each item against the existing intersections. However I'm not sure this is complete.
I could take a stab at implementing something (lang is ruby FWIW), but I would love to hear how others might solve this problem, and what the most efficient and elegant way might be.
Update:
I believe this is a specific case of the Maximum clique problem, which is NP-hard and thus actually difficult. Suggestions for approximations/real-world use would be most appreciated!
See also: http://en.wikipedia.org/wiki/Maximum_clique / Find all complete sub-graphs within a graph
Update 2
Found a nice proof of this problem's NP-hardness and NP-completeness here: http://www.cs.bris.ac.uk/~popa/ipl.pdf
Looks like this is the end of the line then. Sorry folks! I'll work with a good-enough greedy approximation. Thanks.
As said in the answers I don't think that paper describes this problem... we probably have more information here based on the ranges.
See Question&Answers more detail:
os