Topological Sort

Reading: CLR 22.4

Absent-minded professor Budimlić has a problem when getting ready to go to work in the morning: he sometimes dresses out of order: he’ll put his shoes on before putting the socks on, so he’ll have to take the shoes off, put the socks on and than the shoes back on. There’s also shirt, tie, belt, shorts, pants, watch and jacket that have to be put in a certain order. Can you help him?

The order between different parts of clothing forms a graph: shorts before pants means there’s an edge between shorts and pants.

Let’s do a DFS of this graph. The discovery (d) and finishing times (f) might be:

Shorts: (1,10), pants(2,9), belt(3,6), jacket(4,5) ,shoes (7,8), socks (11,12), tie(13,14), watch(15,16), shirt(17,18).

If we order the nodes based on the decreasing finishing time of the DFS, we get: shirt, watch, tie, socks, shorts, pants, shoes, belt, jacket.

This is called topological sort. It works only on directed acyclic graphs. If the graph is cyclic, no topological order exists.

Topological order is a linear order of vertices such that if there’s an edge (u,v), vertex u appears before v in the order.

TOPOLOGICAL-SORT(G)

call DFS(G) to compute f[v] for each vertex v in G
as each vertex v is finished, and f[v] computed, put v on the front of a linked list
return the linked list of vertices

Lemma 1. A directed graph G is acyclic if and only if a DFS of G yields no back edges.

Proof:

=> : If there is a back edge (u,v) then v is an anchestor of u in the DF forest, so there is a path from v to u, which with the (u,v) edge forms a cycle.

<=: Suppose G has a cycle. Let v be the first vertex on the cycle that is discovered in the DFS. Let (u,v) be the incoming edge in the cycle. Since u is not yet discovered, there is a white-path from v to u at this point. According to the white-path theorem (see the DFS notes), u will be v’s descendent in the DF forest. Thus, (u,v) will become a back edge.

Theorem 2. TOPOLOGICAL-SORT(G) produces a topological sort of a directed acyclic graph G.

Proof: It is enough to show that for every edge (u,v) in G, f[u] > f[v]. Let’s look at the time d[u]: at this point, v cannot be gray, since then (u,v) would be a back edge, and G cyclic. v therefore has to be black or white. If v is white, than v is a descendent of u in the DF forest, and by parenthesis theorem f[u] > f[v]. If v is black, that means that v is already finished, so f[v] < d[u] < f[u].

Strongly conncected components

Reading: CLRS 22.5

Many algorithms need a decomposition of a directed graph into strongly connected components. A strongly connected component is a maximal subset G’ of a directed graph G, such that there is a path in G’ from every vertex to every other vertex in G’. For example, many compiler algorithms decompose the programs into strongly connected components (loops) and optimize the components independently.

Here’s an example of a graph, with strongly connected components indicated. The nodes also have their d[v] and f[v] times indicated from a DFS.

Please note that graphs G=(V,E) and G^T=(V,E^T) have identical SCC. How do you make a G^T? If the representation is an adjacency-matrix, just compute the transpose of M. That’s O(|V|²) time (It can be O(1) time if you don’t need G anymore – just change the meaning of the x and y dimensions). If the representation is adjacency lists, then for every vertex v, go through the v’s adjacency list and put v on the adjacency list in G^T of every vertex that is on v’s adjacency list in G. That’s O(E+V) time.

A component graph G^SCC is a graph in which all the SCC are collapsed into single nodes, and edges between SCC’s are maintained. For the graph above, the component graph looks like this:

We compute SCC by doing two DF traversals: first on G, and the second on G^T, but in order of decreasing f[v] computed by first DFS.

STRONGLY-CONNECTED-COMPONENTS(G)

call DFS(G) to compute f[v] for each vertex v
compute G^T
call DFS(G^T), but in the main loop of DFS consider the vertices in order of decreasing f[v] computed in line 1
output the vertices of each tree in DF forest computed in line 3 as a separate SCC

To show that the SCC algorithm computes SCC’s, we need to make some key observations:

Lemma 3: G^SCC is a DAG.

Proof: If there’s a path from C1 to C2 (meaning there is a path from u1 in C1 to u2 in C2), there cannot be a path from C2 to C1 (a path from v2 in C2 to v1 in C1) , since that would imply that there is a path from u2 to v2, a path from v2 to v1 and a path from v1 to u1, meaning that there is a path from u2 to u1, contradicting the premise that C1 and C2 are distinct SCC’s.

Since G^SCC is a DAG, it can be topologically sorted. If we do a DFS on the topologically sorted G^SCC in reverse order of the topologically sorted SCC’s, then all the edges in G^SCC will be cross edges, meaning that each SCC will be a tree in the second DFS.

We define f(C), where C is a subset of V as max(f[v], v is in C).

There are a couple of observations that formalize this intuition:

Lemma 4: Let C and C’ be distinct SCC’s in directed graph G=(V,E). Suppose there is an edge (u,v) in E, where u is in C and v is in C’. Then f(C) > f(C’).

Corollary 5: Suppose that there is an edge (u,v) in E^T, where u is in C and v is in C’. Then f(C) < f(C’).

Latest corollary gives an insight of why the algorithms works. We start the second DFS with the SCC C that has the highest f(C). By Corollary 5, there are no edges going from C to any other strongly connected components. All vertices in C are reachable from the start, and all are on white paths, so they will all be on the DF tree. Thus, the first DF tree we produce will be the first SCC.

When we start the DFS on a strongly connected component C later in the algorithm, f(C) > f(C’) for all other C’ that have yet to be visited (that’s how we choose roots for the second DFS). All the vertices in the C are on a white path from the root, so they will all be on a DF tree. Also, all the edges coming out of C will be the edges to the SCC’s that are already visited (by Corollary 5), which means that no vertices other than the ones in C will be on that tree. So, the DF tree created at that point IS an SCC.

The roots for the DF trees in the second DFS in our example graph will be b, c, g and h, in that order, and the discovered DF trees will be bae, cd, gf and h, which are the SCC’s.