COS 226 Lecture 17: Graph algorithms %ps /lecture 17 def 1 GRAPH: a set of OBJECTS with CONNECTIONS Interesting and useful abstraction Study of graph algorithms challenging branch of computer science Study of mathematical properties of graphs challenging branch of discrete mathematics Hundreds of interesting graph algorithms known Important applications abound transportation systems scheduling circuit simulation software systems web search computer vision computational biology ... ----- Glossary of terms Vertex Edge Graph Dense Sparse Path Cycle, Tour Tree Spanning tree Connected Connected component Undirected Digraph Weighted Network %% 0 %ps 1.4 1.4 scale 130 260 translate %include figs/16graph/ps/graph.ps %%% ----- Graph examples %% 12 %ps 1.4 1.2 scale -80 50 translate %include figs/16graph/ps/metro.ps %%% %% 8 %ps .7 .7 scale 300 100 translate %include figs/16graph/ps/web.ps %%% ----- More graph examples 1 CONCRETE models: direct representations Ex: Transportation network cities connected by roads Ex: Electric circuit devices connected by wires 1 Warning: geometric intuition may mislead Ex: Airline fares (triangle inequality might not hold) 1 ABSTRACT models: represent other abstractions Ex: Scheduling tasks connected by precedence constraints Ex: Programming system functions that call one another Ex: CFG symbols related by productions Ex: Game graphs vertices: board positions; edges: moves ----- Representing graphs Graphs are abstract mathematical objects ADT implementations require specific representations 2 AS USUAL many different representations possible efficiency depends on matching algs to representations Standard issues apply space vs. time array vs. linked list integers vs. reals symbol tables duplicate vertices or edges? mix of ADT operations ----- Representing graphs VERTEX NAMES (A B C D E F G H) progs use integers between 0 and V-1 convert via implicit or explicit symbol table Two drawings that represent the same graph %% 6.5 %ps 1.4 1.4 scale 0 0 translate %include figs/16graph/ps/graph.ps %%% SET OF EDGES representation A-B A-G A-C L-M J-M J-L J-K E-D F-D H-I F-E A-F G-E /lines lines 5 add def ----- Adjacency matrix representation V-by-V array gives constant-time edge existence test 2 VERTEX-INDEXED ARRAY: one entry for each vertex 2 ADJACENCY MATRIX: vertex-indexed array of vertex-indexed arrays 1 in (i, j) AND (j,i) iff edge i-j in graph -- A B C D E F G H I J K L M A 1 1 1 0 0 1 1 0 0 0 0 0 0 B 1 1 0 0 0 1 1 0 0 0 0 0 0 C 1 0 1 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 1 1 0 0 0 0 0 0 0 E 0 0 0 1 1 1 1 0 0 0 0 0 0 F 1 1 0 1 1 1 0 0 0 0 0 0 0 G 1 1 0 0 1 0 1 0 0 0 0 0 0 H 0 0 0 0 0 0 0 1 1 0 0 0 0 I 0 0 0 0 0 0 0 1 1 0 0 0 0 J 0 0 0 0 0 0 0 0 0 1 1 1 1 K 0 0 0 0 0 0 0 0 0 1 1 0 0 L 0 0 0 0 0 0 0 0 0 1 0 1 1 M 0 0 0 0 0 0 0 0 0 1 0 1 1 --- /lines lines 3 sub def ----- Adjacency lists representation Array of lists takes space proportional to no. of edges 2 ADJACENCY LISTS representation A: F C B G B: A C: A D: F E E: G F D F: A E D G: E A H: I I: H J: K L M K: J L: J M M: J L TWO representations of each edge for UNDIRECTED graphs ----- Graph ADT Standard mechanism to separate clients from implementations (plus simple typedef for edges) GRAPH.h: -- . typedef struct { int v; int w; } Edge; . Edge EDGE(int, int); . . typedef struct graph *Graph; . Graph GRAPHinit(int); . void GRAPHinsertE(Graph, Edge); --- Typical client program calls GRAPHinit to create data structures uses Graph handle as arg to graph-processing ADT functions calls GRAPHinsertE to build graph by adding edges calls ADT function to do graph processing Ex: GRAPHcc computes connected components. /lines lines 4 add def ----- Adjacency lists Graph ADT implementation -- #include "GRAPH.h" typedef struct node *link; struct node { int v; link next; }; struct graph { int V; int E; link *adj; }; link NEW(int v, link next) { link x = malloc(sizeof *x); x->v = v; x->next = next; return x; } Graph GRAPHinit(int V) { int v; Graph G = malloc(sizeof *G); G->V = V; G->E = 0; G->adj = malloc(V*sizeof(link)); for (v = 0; v < V; v++) G->adj[v] = NULL; return G; } void GRAPHinsertE(Graph G, Edge e) { int v = e.v, w = e.w; G->adj[v] = NEW(w, G->adj[v]); G->adj[w] = NEW(v, G->adj[w]); G->E++; } --- /lines lines 6 sub def ----- Summary of basic costs E edges, V vertices Space requirements: Adjacency lists: V+E Adjacency matrix: V^2 Set of edges: E (+V) Choice of representation affects algorithm efficiency even for simple primitives Ex: Is there an edge from A to B? lists: O(E) matrix: O(1) Ex: Is there an edge from A to anywhere? lists: O(1) matrix: O(V) /lines lines 1 add def ----- Basic graph problems (short list) PATHS Is there a path from A to B? CYCLES Does the graph contain a cycle? CONNECTIVITY (SPANNING TREE) Is there a way connect all the vertices? BICONNECTIVITY Is there a vertex whose removal will disconnect the graph? PLANARITY Is there a way to draw the graph without edges crossing? SHORTEST (LONGEST) PATH What is the shortest (longest) way from A to B? MINIMAL SPANNING TREE What is the best way connect the vertices? HAMILTON TOUR Is there a cycle that uses each vertex exactly once? ISOMORPHISM Do two given adj matrices represent the same graph? /lines lines 1 sub def ----- Traversing graphs Goal: VISIT every vertex in the graph 2 Depth-first search (DFS) To VISIT a node k mark it (recursively) VISIT all unmarked vertices connected to k To TRAVERSE a graph initialize all nodes to be unmarked VISIT each unmarked node %ps 8 8 52 590 redbox Solves some simple graph problems connectivity cycles basis for solving some difficult graph problems biconnectivity planarity ----- Traversing a graph's components Needed for any implementation of VISIT UNLESS graph is known to be connected IF visit(k) marks all nodes connected to k THEN traverse(G) marks all of G's nodes -- int mark[maxV]; int cnt = 0; traverse(Graph G) { int k; for (k = 1; k <= G->V; k++) mark[k] = 0; for (k = 1; k <= G->V; k++) if (mark[k] == 0) visit(G, k); } --- ----- DFS implementation Adjacency matrix -- visit(Graph G, int k) { int t; mark[k] = ++cnt; for (t = 1; t <= V; t++) if (G->adj[k][t] != 0) if (mark[t] == 0) visit(G, t); } --- Adjacency lists -- visit(Graph G, int k) { link t; mark[k] = ++cnt; for (t = G->adj[k]; t != z; t = t->next) if (mark[t->v] == 0) visit(G, t->v); } --- /lines lines 2 add def ----- DFS example (adjacency lists) visit A visit F (first on A's list) check A on F's list (been there) visit E (second on F's list) visit G (first on E's list) check E on G's list (been there) check A on G's list (been there) check F on E's list (been there) visit D (third on E's list) check F on D's list (been there) check E on D's list (been there) check D on F's list (done that) visit C (second on A's list) visit B (third on A's list) check G on F's list (done that) ... "been there": currently working on it "done that": totally finished dealing with it /lines lines 3 add def ----- DFS tree (adjacency lists) Tree structure captures dynamics of DFS 1 TREE links first encounter: recursive call second encounter: been there 1 BACK links first encounter: been there second encounter: done that A: F C B G B: A C: A D: F E E: G F D F: A E D G: E A H: I I: H J: K L M K: J L: J M M: J L %% 0 %ps 1.7 1.7 scale 30 100 translate %include figs/16graph/ps/dfs.lists.tiny.ps %%% %ps linesreset ----- Connected components ADT function Is there a path from s to t? 1 UNION-FIND (lecture 1) query: O(log* V) preprocessing: O(E log* V) space: O(V) 1 DFS query: O(1) preprocessing: O(E) space: O(V) UF advantage: can intermix query and edge insertion DFS advantage: can give client the path change arg to pass EDGE taken to visit the vertex maintain parent-link representation of DFS tree [see text] /lines lines 4 add def ----- Connected-components ADT functions (DFS) 1 GRAPHcc: preprocessing (DFS) 1 GRAPHconnect: query 1 cc: vertex-indexed array in graph representation -- void dfsRcc(Graph G, int v, int id) { link t; G->cc[v] = id; for (t = G->adj[v]; t != NULL; t = t->next) if (G->cc[t->v] == -1) dfsRcc(G, t->v, id); } int GRAPHcc(Graph G) { int v, id = 0; G->cc = malloc(G->V * sizeof(int)); for (v = 0; v < G->V; v++) G->cc[v] = -1; for (v = 0; v < G->V; v++) if (G->cc[v] == -1) dfsRcc(G, v, id++); return id; } int GRAPHconnect(Graph G, int s, int t) { return G->cc[s] == G->cc[t]; } --- /lines lines 4 sub def ----- Graph-search overview DFS is one of a family of graph-search functions all visit all nodes and edges strategy to use dictated by problem at hand GENERALIZED GRAPH SEARCH To TRAVERSE a graph initialize all nodes to be unmarked put some vertex on a generalized queue (GQ) while the GQ is nonempty remove a vertex and mark it put all unmarked adjacent vertices on the GQ %ps 9 7 52 700 redbox ISSUE: duplicate vertices on queue ignore the new one or forget the old one? ----- Stack-based graph traversal Use explicit stack instead of recursive calls -- visit(Graph G, int k) { link t; STACKpush(k); while (!STACKempty()) { k = STACKpop(); mark[k] = ++id; for (t = G->adj[k]; t != z; t = t->next) if (mark[t->v] == 0) { STACKpush(t->v); mark[t->v] = -1;} } } --- ----- Stack-based traversal example (adjacency lists) visit A push F, push C, push B, push G visit G push E, been to A visit E been to G, been to F, push D visit D been to F, been to E visit B been to A visit C been to A visit F been to A, done with E, done with D ----- Stack-based search 1 NOT the same as recursive DFS. Why? Algs differ in treatement of vertices that are adjacent to partially visited vertices DFS: visits such a vertex stack-based: avoids it (it is on the stack and will get visited later) Nonrecursive DFS: PUSH next node on adj list equivalent to disallowing duplicate vertices on stack No particular reason to use stack other ADTs work as well (stay tuned) 2 GRAPH SEARCH: generalized-queue--based traversal ----- Graphs and mazes %% 8 %ps 1.3 1.3 scale 40 0 translate %include figs/16graph/ps/pac.maze.ps %%% vertices: intersections edges: hallways 1 DFS mark ENTRY and EXIT halls at each vertex leave by ENTRY when no unmarked halls Stack-based? ----- Breadth-first search (BFS) %% 4 %ps 1.6 1.6 scale 20 0 translate %include figs/16graph/ps/bfs.tiny.ps %%% Put unvisited nodes on a QUEUE, not a stack -- visit(Graph G, int k) { link t; QUEUEput(k); while (!QUEUEempty()) { k = QUEUEget(); mark[k] = ++id; for (t = G->adj[k]; t != z; t = t->next) if (mark[t->v] == 0) { QUEUEput(t->v); mark[t->v] = -1; } } } --- /lines lines 3 add def ----- BFS vs DFS example Depth-first search %% 8 %ps 1.4 1.4 scale 30 0 translate %include figs/16graph/ps/dfs.medium.ps %%% Breadth-first search %% 8 %ps 1.4 1.4 scale 30 0 translate %include figs/16graph/ps/bfs.medium.ps %%% ----- DFS example %% 19 %ps 1.3 1.3 scale -18 0 translate %include figs/16graph/ps/web4.dfs.0.ps %%% Search order depends on graph representation ----- DFS example (continued) Same graph, different order of edges on adj lists %% 20 %ps 1.3 1.3 scale -24 0 translate %include figs/16graph/ps/web4.dfs.1.ps %%% ----- DFS example (continued) Same graph, random choice of edges on adj lists %% 20 %ps 1.3 1.3 scale -24 0 translate %include figs/16graph/ps/web4.dfs.r.ps %%% ----- Graph search and path problems 1 Problem: PATHS Is there a path from A to B? 1 Solution: DFS, BFS, any graph search 1 Problem: SHORTEST PATH Find a shortest path (fewest edges) from A to B. 1 Solution: BFS 1 Problem: EULER PATH (existence) Is there a cycle that uses each EDGE exactly once? 1 Solution: Yes, if degrees of all vertices are even 1 Problem: EULER TOUR Find a cycle that uses all the graph's edges. 1 Solution: interesting exercise [see text] 1 Problem: HAMILTON TOUR Is there a cycle that uses each VERTEX exactly once? 1 Solution: ?? (NP-complete)