Introduction to Graphs

What is a Graph?

An intuitive notion: a bunch of dots connected by lines:

Figure 1.

Unfortunately, that is too unformal for our purposes. We need a more “mathematical” notation of a graph.

Definition 1. Graph G is a pair (V, E) of sets V and E, where V is a set of vertices (nodes) in the graph, and

E is a set of pairs (a, b), where a, b Î V.

In our example, G = ({1,2,3,4}{(1,2),(2,3),(1,4),(2,4)}) …………..(1)

In general, graphs are directed, meaning that (a,b)ÎE and (b,a)ÎE are not equivalent statements. A directed graph represented by our sets V and E on line (1) actually looks like this:

Figure 2.

A significant subset of graphs are undirected graphs, where (a,b)ÎE if and only if (b,a)ÎE. Usually, these graphs are drawn as on Figure 1 (without arrows) and represented as sets where only one of the pairs (a,b) and (b,a) are included in the set if there is an edge between a and b.

Sometimes the edges in a graph have weights assotiated with them. These are weighted graphs (both directed and undirected).

Examples of graphs: map of Houston (directed), map of Texas (undirected), electronic circuit (weighted undirected), computer program (directed) e.t.c.

Graph representation

In computer programs, graphs are usually represented in two ways: as a collection of adjacency lists or as an adjacency matrix.

Adjacency lists

This is a compact way of representing sparse graphs – those graphs where |E| << |V|². The graph is represented with an array of |V| lists. Each vertex v has a list of the vertices that are adjacent to v in G. Adjacency list representation of our graph from Figure 1 looks like this:

Figure 3.

An adjacency list representation of the directed graph from Figure 2 looks like this:

Figure 4.

The sum of lenghts of all the adjacency lists for an undirected graph is 2|E|, since for every edge (a,b), there will be a list element “b” on the a’s list and a list element “a” on the b’s list.

The sum of lenghts of all the adjacency lists for a directed graph is |E|, since for every edge(a,b), there will only be a list element “b” on the a’s list.

Weigthed graphs are easy to represent with adjacency lists: if the weight of the edge (a,b) is W, just store W along with the “b” element on the a’s list.

Adjacency matrix

Sometimes, it is advantageous to represent graphs with an adjacency matrix. If the graph is dense (that is, |E| is close to |V|²), adjacency matrix might be more memory efficient. Moreover, if your graph algorithm needs a fast check whether an edge (a,b) is in the graph, adjacency matrix is preferred.

Adjacency matrix M is a |V| ´ |V| matrix, where the dimensions represent vertices (numbered in some arbitrary manner), and the elements are

M(i,j) = 1 if (i,j)ÎE, 0 otherwise.

An adjacency matrix of the graph from Figure 1. is:

	1	2	3	4
1	0	1	0	1
2	1	0	1	1
3	0	1	0	0
4	1	1	0	0

Adjacency matrix of the directed graph from Figure 2 is:

	1	2	3	4
1	0	1	0	1
2	0	0	1	1
3	0	0	0	0
4	0	0	0	0

The size of the adjacency matrix is always |V|², regardless of the size of |E|. If |E| is close to |V|², adjacency matrix can be much more efficient, since only one bit is needed for every vertice pair, as opossed to one pointer (usually 4 bytes) for every edge.

Weighted graphs are also easy to represent with adjacency matrices. Just store the weight in the matrix instead of only 0’s and 1’s.

Breadth-first Search

Problem: For a given graph G, and a specified vertice s in the graph, find all the vertices v that are reachable from s, and determine the shortest path in G from s to v.

Breadth-first search does this by constructing a breadth-first tree - a tree whose root is s, and the path on that tree from s to v is the shortest path in G from s to v.

During the execution, BFS algorithm assigns a color to every vertex: white if the vertex has not been reached yet, gray if the vertex is in the set of vertices currently being processed (a BFS frontier), and black if the vertice and ALL of its neibhors have been processed.

BFS also computes d[v] for every vertex v, which is the shortest distance from s to v in G.

It also computes p[v] for every vertex v, which is the predecessor of v in the breadth-first tree.

BFS algorithm

1 for each vertex v in V

2 color[v] = white

3 d[v] = INFINITY

4 p[v] = NULL

5 color[s] = gray

6 d[s] = 0

7 Queue.clear()

8 Queue.put(s)

9 while (!Queue.empty())

10 u = Queue.get()

11 for each v adjacent to u

12 if (color[v] == white)

13 color[v] = gray

14 d[v] = d[u] + 1

15 p[v] = u

16 Queue.put(v)

17 color[u] = black

An example of a breadth-first search.

Complexity of BFS

Each vertex is never whitened, so the test at line 12 ensures each vertex is enqueued exactly once, thus dequeued exactly once. Total queue operations are O(V).

Adjacency lists are scanned only when the vertex is dequeued, thus each adjacency list is scanned exactly once. Total time for scanning adjacency lists is therefore O(E).

Initialization is O(V). Total running time of the algorithm is O(E+V).

Correctness of BFS

Definition 1. b(s,v) is the minimum number of edges in any path from vertex s to vertex v. If there is no path from s to v, b(s,v)=INFINITY. B(s,v) is the shortest-path distance.

Lemma 1. Let G = (V, E) be a graph, and sÎV a vertex. The, for any edge (u,v)ÎE:

b(s,v) <= b(s,u)+1

Proof. If u is reachable from s, so is v. The shortest path from s to v cannot be more than the shortest path from s to u plus the edge from u to v, thus the inequality holds. If u is not reachable from s, then b(s,u) = INFINITY, so the inequality holds.

Lemma 2. Upon termination, the BFS algorithm computes d[v] for every vertex, and d[v] >= b(s,v)

Proof. By induction on the number i of enqueue operations. For i =1 (after s is enqueued), d[s] = [0] = b(s,s), and d[v] = INFINITY >= b(s,v) for all v <> s.

For i = n, let us look at a white vertex v discovered at that step from vertex u. By induction, d[u] >= b(s,u). Since d[v] = d[u]+1 >= b(s,u) + 1 >= b(s,v). QED.

Lemma 3. At all times during the execution of BFS, the queue contains vertices (v1, v2, … vr) such that d[v1] <= d[v2] <= d[v3] … <= d[vr] AND d[vr] <= d[v1] + 1.

Proof. By induction on the number i of queue operations. For i = 1, queue only has s, so the hypothesis holds.

For i = n:

After dequeuing v1: since d[vr]<=d[v1]+1 and d[v1] <= d[v2], then d[vr] <= d[v2]+1, so the hypothesis holds.

After enqueuing v_r+1: d[v_r+1] = d[v1] + 1 (line 14) >= d[vr]. Also, d[v_r+1] = d[v1] + 1 <= d[v2] + 1, since d[v1]<=d[v2]. Since v2 is the new head of the queue, the hypothesis holds.

Corollary 4. If vertices u and v are enqueued during execution of BFS, and u is enqueued before v, then d[u] <= d[v].

Theorem 5. Given G=(V,E) and source vertex s, BFS algorithm discovers every vertex v reachable from s, and upon termination, d[v] = b(s,v). Moreover, for any vertex v reachable from s, one of the shortest paths from s to v is a path from s to p[v], followed by edge (p[v],v).

Proof. By contradiction. Assume some vertex gets assigned d[v] <> b(s,v). Let v be such a vertex with minimum b(s,v). By lemma 2, d[v]>=b(s,v), so it must be that d[v]>b(s,v). Vertex v must be reachable from s, for if it’s not, b(s,v) = INFINITY >=d[v]. Let u be the vertex on a shortest path from s to v, then it must be b(s,v) = b(s,u)+1 = d[u] + 1 (Since d[u] = b(s,u), otherwise we would have chosen u in our proof). Thus:

d[v] > d[u] + 1 …….. (1)

Let’s prove that this can never happen. Let’s look at the time when BFS dequeues u. At that point, v is either white, black or gray.

If v is white, line 14 sets d[v] = d[u] + 1

If v is black, than it was already removed from the queue, and by Corollary 4, d[v]<=d[u].

If v is gray, it was painted gray when some other vertex w was dequeued, which was earlier than now (when we are dequeuing u), so d[v] = d[w] + 1 <= d[u] + 1 (Corollary 4).

So, d[v] = b(s,v) for all v in V. All reachable vertices must be discovered, otherwise they will have d = INFINITY. If p[v] = u, then d[v] = d[u] + 1, so one of the shortest paths from s to v can be obtained by taking a shortest path from s to u and then taking the edge (u,v).