Algorithm for a Solution
You're looking for a topological sort algorithm:
from collections import defaultdict
def topological_sort(dependency_pairs):
'Sort values subject to dependency constraints'
num_heads = defaultdict(int) # num arrows pointing in
tails = defaultdict(list) # list of arrows going out
for h, t in dependency_pairs:
num_heads[t] += 1
tails[h].append(t)
ordered = [h for h in tails if h not in num_heads]
for h in ordered:
for t in tails[h]:
num_heads[t] -= 1
if not num_heads[t]:
ordered.append(t)
cyclic = [n for n, heads in num_heads.iteritems() if heads]
return ordered, cyclic
if __name__ == '__main__':
connections = [(3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1)]
print topological_sort(connections)
Here is the output for your sample data:
([4, 6, 5, 3, 7, 8], [1, 2])
The runtime is linearly proportional to the number of edges (dependency pairs).
HOW IT WORKS
The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). Consider an example with the following connections: a->h b->g c->f c->h d->i e->d f->b f->g h->d h->e i->b
, the counts are:
node number of incoming edges
---- ------------------------
a 0
b 2
c 0
d 2
e 1
f 1
g 2
h 2
i 1
The algorithm works by "visting" nodes with no predecessors. For example, nodes a
and c
have no incoming edges, so they are visited first.
Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.
For example, in visiting node a
, we go to its successor h
to decrement its incoming count by one (so that h 2
becomes h 1
.
Likewise, when visiting node c
, we loop over its successors f
and h
, decrementing their counts by one (so that f 1
becomes f 0
and h 1
becomes h 0
).
The nodes f
and h
no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):
a c f h e d i b g
If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits to show the requested results.