Disjoint Sets

 

Sometimes we want to partition a group of N distinct objects into a collection of disjoint sets. Important operations on disjoint sets are finding which set a given element belongs to, and creating a union of two sets (why is intersection not very important for disjoint sets?).

 

A set will be identified by an object representative. That’s one of the objects that belongs to the set. Sometimes can be arbitrary, sometimes it has to be a certain (i.e. smallest or biggest) object from the set. The only condition is that it has to be the same object every time we ask about it on the given set.

 

Important operations:

 

MAKE-SET(x) – make a set out of object x. x shouldn’t be in any other set. X will become representative of the newly created set.

UNION(x,y) – make a set that is a union of sets to which x and y belong. After union, sets of x and y are destroyed.

FIND-SET(x) – find the representative of the set to which x belongs.

 

An example of disjoint set application

 

A very simple algorithm to find connected components on an undirected graph. We will discuss finding strongly connected components on a directed graph, a somewhat harder problem, next week.

After we are done constructing the connected components, testing whether two elements are in the same component involves just finding their coresponding representatives.

 

CONNECTED-COMPONENTS(G)
1.    for each vertex v in V
2.        MAKE-SET(v)
3.    for each edge (u,v) in E
4.        if FIND-SET(u) != FIND-SET(v)
5.            UNION(u,v)
 
SAME-COMPONENT(u,v)
1.    return (FIND-SET(u) == FIND-SET(v))
 

Linked-list implementation

 

A very simple disjoint set implementation can be done using linked lists.

 

1st approach: Just make a list of all the elements. The last element is the representative. MAKE-SET takes O(1) time, so does the UNION. FIND-SET takes O(n) amortized time.

 

2nd approach: Make a list of all the elements. The first element is the representative, every element has a pointer back to the representative. MAKE-SET takes O(1), so does the FIND-SET. UNION now takes O(n) amortized time because we have to update the representative pointers in the list we are appending, and because we can come up with a scenario where we always append a longer list onto a short one

 

3rd approach: Same as the 2nd approach, but we keep a length of the list with the list, and always append a shorter list to a longer one. Now an amortized cost of a sequence of m MAKE-SET, UNION and FIND-SET operations, n of which are MAKE-SET operations, is O(m + n log n).

 

Proof: Look at how many times an objects representative pointer can be updated. Since every time the object is in a shorter list, the resulting list is at least of length 2k, where k was the length of the shorter list. You can start with 1 element on the list, and update its representative pointer at most log n times, where n is the total number of elements.

 

Disjoint-set forests

 

If we represent sets as trees where the root is the representative and every node points to its parent while the root points to itself, and apply a couple of heuristics, we can get asymptotically faster disjoint-set implementation.

 

MAKE-SET is simple. It just creates a tree with one node. FIND-SET starts at the element and traverses the tree upwards until it gets to the root, which it returns as the representative. UNION just links the root of one tree to the onther one.

 

Union by rank.

 

When performing UNION, if we make the root of the larger tree be the root of the resulting tree (similar to the 3rd approach for linked-list implementation), we can save some time. We won’t actually compute the size of the tree (as in the number of elements), but the rank, which is an upper bound on the height of the tree. In union-by-rank, the tree with a smaller rank is made to point to the root of the tree with a higher rank.

 

Path compression

 

When doing FIND-SET, we can speed-up any subsequent queries on the find path by compressing the path: we make each node on the path to the root point directly to the root. Any subsequent queries on any of the nodes on the path will be done in just one step.

 

 

MAKE-SET(x)
1.    p[x] = x
2.    rank[x] = 0
 
UNION(x,y)
1.    LINK(FIND-SET(x), FIND-SET(y))
 
LINK(x,y)
1.    if (rank[x] > rank[y])
2.       p[y] = x
3.    else 
4.       p[x] = y
5.       if (rank[x]==rank[y])
6.          rank[y] = rank[y]+1
 
FIND-SET(x)
1.    if (x != p[x])
2.       p[x] = FIND-SET(p[x])
3.    return p[x]
 

When we use either union by rank or path compression, the running time is improved. When we use both heuristics, the running time is further improved to O(ma(n)) where m is the total number of operations, and n is the number of MAKE-SET operations.  a(n) is very slowly growing function, which has a value of at most 4 for any concievable value of n.