COS 226 Lecture 2: Elementary sorting algs %ps /lecture 2 def %% 1.8 %ps 1.6 1.6 scale 0 0 translate %include figs/ps/insertiondots.ps %%% %% 1.8 %ps 1.6 1.6 scale 0 0 translate %include figs/ps/selectiondots.ps %%% %% 1.8 %ps 1.6 1.6 scale 0 0 translate %include figs/ps/bubbledots.ps %%% %% 1.8 %ps 1.6 1.6 scale 0 0 translate %include figs/ps/shelldots.ps %%% ----- Why study elementary algorithms? Easy to code Fastest for small files Context for developing ground rules Fastest in some special situations May not be so elementary ----- Ground rules FILES of RECORDS containing KEYS File fits in memory Use abstract comparison, exchange -- typedef int Item #define less(A, B) (A < B) #define exch(A, B) { Item t = A; A = B; B = t; } --- Macros or subroutines? Macros: low cost, simple Subroutines: more general ----- Selection sort example %% 14 %ps 2.8 2.8 scale 20 0 translate %include figs/ps/selection.ps %%% ----- Selection sort implementation -- void selection(Item a[], int l, int r) { int i, j; for (i = l; i < r; i++) { int min = i; for (j = i+1; j <= r; j++) if (less(a[j], a[min])) min = j; exch(a[i], a[min]); } } --- ----- Insertion sort example %% 14 %ps 2.8 2.8 scale 20 0 translate %include figs/ps/insertion.ps %%% ----- Insertion sort implementation -- void insertion(Item a[], int l, int r) { int i, j; for (i = l+1; i <= r; i++) { Item v = a[i]; j = i; while (j > l && less(v, a[j-1])) { a[j] = a[j-1]; j--; } a[j] = v; } } --- ----- Bubble sort example %% 14.5 %ps 2.8 2.8 scale 20 5 translate %include figs/ps/bubble.ps %%% ----- Bubble sort implementation -- void bubble(Item a[], int l, int r) { int i, j; for (i = l; i < r; i++) for (j = r; j > i; j--) compexch(a[j], a[j-1]); } --- Improvements: add a test to exit if no exchanges go back and forth ----- Properties of elementary sorts All: quadratic running time Selection sort comparisons: N-1 + N-2 + ... + 2 + 1 = N^2/2 exchanges: N Insertion sort (average case) comparisons: (N-1 + N-2 + ... + 1)/2 = N^2/4 exchanges: N^2/4 Bubble sort comparisons: N-1 + N-2 + ... + 2 + 1 = N^2/2 exchanges: about N^2/2 ----- Special situations Large records, small keys selection sort linear in amount of data N records M words (1-word keys) comparison cost N^2/2 exchange cost NM if N is about equal to M costs and amount of data are both about M^2 LINEAR sort Files nearly in order bubble and insertion sort can be linear (even quicksort can be quadratic) ----- Pointer sort Sort large records by swapping *references* to the records, not the records themselves -- 1 9 Fox 1 --- [associated info] --- 2 6 Quilici 1 --- ... --- 3 8 Chen 2 --- ... --- 4 3 Furia 3 --- ... --- 5 1 Kanaga 3 --- ... --- 6 4 Andrews 3 --- ... --- 7 10 Rohde 3 --- ... --- 8 5 Battle 4 --- ... --- 9 2 Aaron 4 --- ... --- 10 7 Gazsi 4 --- ... --- --- Trivial to implement: change abstract comparison ----- Pointer sort implementations Array indices -- typedef int Item #define less(A, B) (data[A].key < data[B].key) #define exch(A, B) { Item t = A; A = B; B = t; } --- True pointers -- typedef dataType* Item #define less(A, B) (*A.key < *B.key) #define exch(A, B) { Item t = A; A = B; B = t; } --- ----- Stable sorting for two-key records Sort on the first key, then on the second -- Aaron 4 Fox 1 Andrews 3 Quilici 1 Battle 4 Chen 2 Chen 2 Furia 3 Fox 1 Kanaga 3 Furia 3 Andrews 3 Gazsi 4 Rohde 3 Kanaga 3 Battle 4 Quilici 1 Aaron 4 Rohde 3 Gazsi 4 --- 2 Invalid assumption: second sort preserves first sort ----- Stable sort File stays sorted on first key where equal on second -- Aaron 4 Fox 1 Andrews 3 Quilici 1 Battle 4 Chen 2 Chen 2 Andrews 3 Fox 1 Furia 3 Furia 3 Kanaga 3 Gazsi 4 Rohde 3 Kanaga 3 Aaron 4 Quilici 1 Battle 4 Rohde 3 Gazsi 4 --- Which of the elementary methods are stable? ----- 4-sorting Divide into 4 subfiles every 4th element starting at the 1st every 4th element starting at the 2nd every 4th element starting at the 3rd every 4th element starting at the 4th %% 13 %ps 2.8 2.8 scale 20 10 translate %include figs/ps/shellexampleAa.ps %%% ----- Interleaved 4-sorting Use insertion sort with an "increment" of 4 %% 10 %ps 2.8 2.8 scale 20 10 translate %include figs/ps/shellexampleAb.ps %%% ----- 4-sorting implementation -- h = 4; for (i = l+h; i <= r; i++) { Item v = a[i]; j = i; while (j >= l+h && less(v, a[j-h])) { a[j] = a[j-h]; j -= h; } a[j] = v; } --- ----- Shellsort Use a decreasing sequence of increments Each pass makes the next easier 1 provided increments are properly chosen poor choice: happens to everyone good choice: lots have been studied best choice: research challenge (still) ----- Shellsort example %% 23 %ps 2.4 2.4 scale -16 10 translate %include figs/ps/shellexampleB.ps %%% ----- Shellsort implementation -- void shellsort(Item a[], int l, int r) { int i, j; int incs[16] = { 1391376, 463792, 198768, 86961, 33936, 13776, 4592, 1968, 861, 336, 112, 48, 21, 7, 3, 1 }; for ( k = 0; k < 16; k++) { int h = incs[k]; for (i = l+h; i <= r; i++) { Item v = a[i]; j = i; while (j >= h && less(v, a[j-h])) { a[j] = a[j-h]; j -= h; } a[j] = v; } } } --- ----- Shellsort summary Need a sort routine, fast? Use Shellsort! not much code best method for small and medium files still OK even for giant files How do we know what increments to use? plenty of proven winners to use easiest: 1, 4, 13, 40, 121, 364, 1093, ... ----- Relatively prime increment sequences When we h-sort a file that is k-sorted, it stays k-sorted (Know an easy proof? SEND MAIL) Only 18N comparisons are needed to 1-sort a file that is 4-sorted and 13-sorted Elements to the left of x that could be greater: %% 3 %ps 2 2 scale 0 0 translate %include figs/ps/shellFrob.ps %%% x ----- Shellsort theory In general, if h and k are relatively prime: 1 (h-1)(k-1)N comparisons (at most) to 1-sort a file that is h-sorted and k-sorted 1 (h-1)(k-1)N/g comparisons (at most) to g-sort a file that is h-sorted and k-sorted 2 Big increments (small files) h(N/h)^2 = N^2/h 2 Small increments, use theorem: h^2N/h = Nh Tradeoff best bounds: N^(3/2) total Similar methods (harder proofs) give 4/3, 5/4, 6/5 ... /lines 22 def ----- More increment sequences On the other hand, common divisors are good: N comparisons to 1-sort a file that is 2-sorted and 3-sorted N comparisons to 2-sort a file that is 4-sorted and 6-sorted N comparisons to 3-sort a file that is 6-sorted and 9-sorted . 1 . 2 3 . 4 6 9 . 8 12 18 27 . 16 24 36 54 81 . 32 48 72 108 162 243 . 64 96 144 216 324 486 729 Total time: N (log N)(log N) Too many increments for real sizes start with bigger numbers than 2 and 3 throw in some primes Have a better idea for an increment sequence? SEND MAIL if it beats 1 3 7 21 48 112 336 ...