3.5 Advanced Sorting
This section under major construction.
Sorting algorithms and priority queues are critical in a broad variety of applications.
A prime reason why sorting is so useful is that it is much easier to search for an element in a sorted list than an unsorted one. Given someone's first and last name, it is easy to find their phone number using a phone book since the entries are sorted by last name. A digital music player organizes song files by artist name or song title; a search engine displays search results in descending order of importance; a spreadsheet displays columns sorted by a particular field; an RSS reader sorts news by date and time; a matrix-processing package sorts the real eigenvalues of a symmetric matrix in descending order. Other tasks are also made easier once a file is in sorted sorted: look up an entry in the index in the back of this book, find the median element, the closest pair, statistical outliers, remove duplicates in a mailing list, or detect double registrations in a voting precinct.
Sorting also arises as a critical subroutine in many applications that appear to have nothing to do with sorting at all including: data compression (see the Burrows-Wheeler programming assignment), computer graphics (convex hull, closest pair), computational biology (longest common substring discussed below), supply chain management (schedule jobs to minimize weighted sum of completion times), combinatorial optimization (Kruskal's algorithm), social choice and voting (Kendall's tau distance),
Historically, sorting was most important for commercial applications, but sorting also plays a major role in the scientific computing infrastructure. NASA and the fluids mechanics community use sorting to study problems in rarefied flow; these collision detection problems are especially challenging since they involve ten of billions of particles and can only be solved on supercomputers in parallel. Similar sorting techniques are used in some fast N-body simulation codes. Another important scientific application of sorting is for load balancing the processors of a parallel supercomputers. Scientists rely on clever sorting algorithm to perform load-balancing on such systems.
Load balancing.
Suppose that you have N identical processors and M job to complete. Task j requires tj seconds of processing time. Your goal is the schedule all of the tasks so that the time at which the last job completes is as early as possible. List scheduling = consider jobs in arbitrary order and assign job j to machine that is least busy. The LPT rule (longest processing time first) is to consider the jobs in descending order of time, and assign a job to the machine that becomes available first. Although this algorithm does not necessarily yield an optimal schedule (It is NP-hard so we do not expect a practical solution. See Section xyz), the resulting solution is guaranteed to be within 33% of the best possible. To implement this algorithm, we first sort the jobs. Then we maintain a priority queue of N elements, corresponding to each processor, where the priority is the sum of the processing times. At each step, we delete the processor with the minimum priority, add the next job to the processor, and re-insert that processor into the priority queue. Program Processor.java implements this strategy.Comparable. Date.java implements a simple date ADT using month, day, and year. It implements the Comparable interface using the natural ordering on dates.
Comparator. What happens when we want a data type to have more than one ordering, depending on the application? For example, we might have a Person data type which we might want to sort by last name, age, bank account, height, or IQ. In such situations, you can define a class or subclass that implements the Comparator interface.
It is easy to modify our sorting routines to work with comparators. Program Insertion.java is a revised version of insertion sort that works with comparators.
interface Comparator<Key> { int compare(Key x, Key y); }
Transaction. Data type for business transactions (date, account number, name). Want to be able to sort by either of the three fields. Program Transaction.java represents a transaction using an id (long), name (String) and date (Date).
Program Customer.java is a data type for customers.
Java uses a similar strategy in its String library. The compareTo function is case-sensitive. If you want to sort with a case-insensitive comparison function, you can use String.CASE_INSENSITIVE_ORDER. This is a Comparator that use the compareToIgnoreCase method.
System sort. Java provides a sysem sort for sorting primitive types and objects that implement the Comparable interface. It relies on two classic sorting algorithms - quicksort and mergesort - which we will learn about in Sections X and Y. Program SystemSort.java illustratese how to use it. The algorithm for sorting strings (and other objects) is a tuned version of mergesort. It is guaranteed to be stable and run in time proportional to N log N. Program SystemSort.java illustrates how to use it. The algorithm for primitive types is a variant of 3-way quicksort developed by Bentley and McIlroy. It is extremely efficient for most inputs that arise in practice, including inputs that are already sorted. However, using a clever techinique described by M. D. McIlroy in A Killer Adversary for Quicksort, it is possible to construct pathological inputs that make the system sort run in quadratic time. Even worse, it is blows the function call stack. To see Java's sorting library break, here are some killer inputs of varying sizes: 10,000, 20,000, 50,000, 100,000, 250,000, 500,000, and 1,000,000. You can test them out using the program IntegerSort.java which takes a command line input N, reads in N integers from standard input, and sorts them using the system sort.
Java implements its sorting libraries twice, once using Comparable objects and once for dealing with Comparator. For example, you can sort an array s of strings using Java's case insensitive comparator function via Arrays.sort(s, String.CASE_INSENSITIVE_ORDER). We illustrate below how to implement bubblesort using a Comparator, but limit our future discussions to Comparable objects, noting that it is straightforward to add this additional functionality.
Java priority queue library implementation.
Program SystemPQ.java implements our min priority queue interface using the PriorityQueue library in Java 1.5.
Stability.
In-place.
In-place = O(1) extra space. In-situ = O(log N) extra space.
Comparison of sorting algorithms.
Sorting summary.
Computational complexity.
N log N lower bound under decision tree model of computation. Applies to randomized algorithms.
Molecular Dynamics Simulation of Hard Spheres
Simulate the motion of N colliding particles according to the laws of elastic collision using event-driven simulation. Such simulations are widely used in molecular dynamics (MD) to understand and predict properties of physical systems at the particle level. This includes the motion of molecules in a gas, the dynamics of chemical reactions, atomic diffusion, sphere packing, the stability of the rings around Saturn, the phase transitions of cerium and cesium, one-dimensional self-gravitating systems, and front propagation. The same techniques apply to other domains that involve physical modeling of particle systems including computer graphics, computer games, and robotics. We will revisit some of these issues again in Chapter 7.
Hard sphere model. The hard sphere model (billiard ball model) is an idealized model of the motion of atoms or molecules in a container. We focus on the two-dimensional version called the hard disc model. The salient properties of this model are listed below.
- N particles in motion, confined in the unit box.
- Particle i has known position (rxi, ryi), velocity (vxi, vyi), mass mi, and radius σi.
- Particles interact via elastic collisions with each other and with the reflecting boundary.
- No other forces are exerted. Thus, particles travel in straight lines at constant speed between collisions.
Simulation. There are two natural approaches to simulating the system of particles.
- Time-driven simulation. Discretize time into quanta of size dt. Update the position of each particle after every dt units of time and check for overlaps. If there is an overlap, roll back the clock to the time of the collision, update the velocities of the colliding particles, and continue the simulation. This approach is simple, but it suffers from two significant drawbacks. First, we must perform N2 overlap checks per time quantum. Second, we may miss collisions if dt is too large and colliding particles fail to overlap when we are looking. To ensure a reasonably accurate simulation, we must choose dt to be very small, and this slows down the simulation.
- Event-driven simulation.
With event-driven simulation we focus only on those times at which
interesting events occur.
In the hard disc model, all particles travel in straight line
trajectories at constant speeds between collisions.
Thus, our main challenge is to determine the ordered sequence of
particle collisions.
We address this challenge by maintaining a priority queue of
future events, ordered by time.
At any given time, the priority queue contains all future
collisions that would occur, assuming each particle
moves in a straight line trajectory forever.
As particles collide and change direction, some of the events
scheduled on the priority queue become "stale" or "invalidated",
and no longer correspond to physical collisions.
We adopt a lazy strategy, leaving such invalidated collision on the priority
queue, waiting to identify and discard them as they are deleted.
The main event-driven simulation loop works as follows:
- Delete the impending event, i.e., the one with the minimum priority t.
- If the event corresponds to an invalidated collision, discard it. The event is invalid if one of the particles has participated in a collision since the time the event was inserted onto the priority queue.
- If the event corresponds to a physical collision between particles
i and j:
- Advance all particles to time t along a straight line trajectory.
- Update the velocities of the two colliding particles i and j according to the laws of elastic collision.
- Determine all future collisions that would occur involving either i or j, assuming all particles move in straight line trajectories from time t onwards. Insert these events onto the priority queue.
- If the event corresponds to a physical collision between particles i and a wall, do the analogous thing for particle i.
This event-driven approach results in a more robust, accurate, and efficient simulation than the time-driven one.
Collision prediction. We review the formulas that specify if and when a particle will collide with the boundary or with another particle, assuming all particles travel in straight-line trajectories.
- Collision between particle and a wall.
Given the position (rx, ry), velocity
(vx, vy), and radius σ of
a particle at time t, we wish to determine if and when it
will collide with a vertical or horizontal wall.
Since the coordinates are between 0 and 1, a particle comes into contact with a horizontal wall at time t + Δt if the quantity ry + Δt vy equals either σ or (1 - σ). Solving for Δt yields:
An analogous equation predicts the time of collision with a vertical wall.
- Collision between two particles.
Given the positions and velocities of two particles i and j
at time t,
we wish to determine if and when they will collide with each other.
Let (rxi' , ryi' ) and (rxj' , ryj' ) denote the positions of particles i and j at the moment of contact, say t + Δt. When the particles collide, their centers are separated by a distance of σ = σi + σj. In other words:
σ2 = (rxi' - rxj' )2 + (ryi' - ryj' )2
During the time prior to the collision, the particles move in straight-line trajectories. Thus,rxi' = rxi + Δt vxi, ry i' = ryi + Δt vyi
Substituting these four equations into the previous one, solving the resulting quadratic equation for Δt, selecting the physically relevant root, and simplifying, we obtain an expression for Δt in terms of the known positions, velocities, and radii.
rxj' = rxj + Δt vxj, ry j' = ryj + Δt vyj
If either Δv ⋅ Δr ≥ 0 or d < 0, then the quadratic equation has no solution for Δt > 0; otherwise we are guaranteed that Δt ≥ 0.
Collision resolution. In this section we present the physical formulas that specify the behavior of a particle after an elastic collision with a reflecting boundary or with another particle. For simplicity, we ignore multi-particle collisions. There are three equations governing the elastic collision between a pair of hard discs: (i) conservation of linear momentum, (ii) conservation of kinetic energy, and (iii) upon collision, the normal force acts perpendicular to the surface at the collision point. Physics-ly inclined students are encouraged to derive the equations from first principles; the rest of you may keep reading.
- Between particle and wall. If a particle with velocity (vx, vy) collides with a wall perpendicular to x-axis, then the new velocity is (-vx, vy); if it collides with a wall perpendicular to the y-axis, then the new velocity is (vx, -vy).
- Between two particles.
When two hard discs collide, the normal force acts along
the line connecting their centers (assuming no friction or spin).
The impulse (Jx, Jy) due to the normal force in the
x and y directions of a perfectly elastic collision
at the moment of contact is:
and where mi and mj are the masses of particles i and j, and σ, Δx, Δy and Δ v ⋅ Δr are defined as above. Once we know the impulse, we can apply Newton's second law (in momentum form) to compute the velocities immediately after the collision.
vxi' = vxi + Jx / mi, vxj' = vxj - Jx / mj
vyi' = vyi + Jy / mi, vyj' = vyj - Jy / mj
Particle collision simulation in Java. Our implementation involves the following data types: MinPQ.java, Particle.java, Event.java, and CollisionSystem.
- Particle data type.
Represents particles moving in the unit box.
- public Particle(...): constructor.
- public double dtX(): return the duration of time until the invoking particle collides with a vertical wall, assuming it follows a straight-line trajectory. If the particle never collides with a vertical wall, return positive infinity.
- public double dtY(): return the duration of time until the invoking particle collides with a horizontal wall, assuming it follows a straight-line trajectory. If the particle never collides with a horizontal wall, return positive infinity.
- public double dt(Particle b): return the duration of time until the invoking particle collides with particle b, assuming both follow straight-line trajectories. If the two particles never collide, return a negative value.
- public void bounceX(): update the invoking particle to simulate it bouncing off a vertical wall.
- public void bounceY(): update the invoking particle to simulate it bouncing off a horizontal wall.
- public void bounce(Particle b): update both particles to simulate them bouncing off each other.
- public int getCollisionCount(): return the total number of collisions involving this particle.
- Event data type.
Data type that represents collision events. There are
four different types of events: a collision with a vertical wall,
a collision with a horizontal wall, a collision between two particles,
and a redraw event. This would be a fine opportunity to experiment
with OOP and polymorphism. We propose the following simpler
(but somewhat less elegant approach).
- public Event(double t, Particle a, Particle b): Create a new event representing a collision between particles a and b at time t. If neither a nor b is null, then it represents a pairwise collision between a and b; if both a and b are null, it represents a redraw event; if only b is null, it represents a collision between a and a vertical wall; if only a is null, it represents a collision between b and a horizontal wall.
- public double time(): return the time associated with the event.
- public Particle a(): return the first particle, possibly null.
- public Particle b(): return the second particle, possibly null.
- public int compareTo(Event e2): compare the time associated with this event e1 and e2. Return a positive number (greater), negative number (less), or zero (equal) accordingly.
- public boolean isValid(): return true if the event is still valid, and false if there was a supervening event involving either particle.
In order to implement isValid, the event data type should store the collision counts of the relevant particles at the time the event was created. The event corresponds to a physical collision if the current collision counts of the particles are still the same as when the event was created, i.e., no intervening collisions.
Data files. We use the following data format. The first line contains the number of particles N. Each of the remaining N lines consists of 6 real numbers (position, velocity, mass, and radius) followed by three integers (red, green, and blue values of color). You may assume that all of the position coordinates are between 0 and 1, and the color values are between 0 and 255. Also, you may assume that none of the particles intersect each other or the walls.
Here are some sample data files.
N rx ry vx vy mass radius r g b rx ry vx vy mass radius r g b rx ry vx vy mass radius r g b rx ry vx vy mass radius r g b
- One particle in motion.
- Two particles in head on collision.
- Two particles, one at rest, colliding at an angle.
- Some good examples for testing and debugging.
- One red particle in motion, N blue particles at rest.
- N particles on a lattice with random initial directions (but same speed) so that the total kinetic energy is consistent with some fixed temperature T, and total linear momentum = 0. Have a different data set for different values of T.
- Diffusion I: assign N very tiny particles of the same size near the center of the container with random velocities.
- Diffusion II: N blue particles on left, N green particles on right assigned velocities so that they are thermalized (e.g., leave partition between them and save positions and velocities after a while). Watch them mix. Calculate average diffusion rate?
- N big particles so there isn't much room to move without hitting something.
- Einstein's explanation of Brownian motion: 1 big red particle in the center, N smaller blue particles.
Q + A
Q. Why doesn't the Java library use a randomized version of quicksort?
A. Good question. At the very least, the library should cutoff to some guaranteed N log N algorithm if it "realizes" it is in trouble. Programmers may want their libraries to be deterministic for debugging. But the library only uses quicksort for primitive types when stability is not an issue, so the programmer probably couldn't observe the randomness, except in the running time.
Q. Why does the Java library use mergesort for sorting objects and quicksort for sorting primitive types?
A. With objects, stability might be desirable or expected. With primitive types, stability is irrelevant since there is no associated data.
Q. What's the best way to create comparators for non-English alphabets, e.g., for accents, umlauts, and pre-composed character like ch in Spanish?
A. Use Java's Collator library. For example in UNICODE, Rico occurs lexicographically before Réal, but in French, Réal occurs first.
import java.util.Arrays; import java.text.Collator; ... Arrays.sort(words, Collator.getInstance(Locale.FRENCH));
Q. Any specialized data structures for dealing with colliding particle type simulations?
A. Google for kinetic heaps or kinetic data structures. (Guibas)
Exercises
- Suppose that you have a file that is sorted, except for two or three items. Which of the sorting method (insertion sort, selection sort, mergesort, quicksort) would you use and why?
-
Consider the following implementation of the compareTo
method for String. How does the third line help
with efficiency?
public int compareTo(String t) { String s = this; if (s == t) return 0; int n = Math.min(s.length(), t.length()); for (int i = 0; i < n; i++) { if (s.charAt(i) < t.charAt(i)) return -1; else if (s.charAt(i) > t.charAt(i)) return +1; } return s.length() - t.length(); }Answer: avoid directly comparing all n characters if s and t are references to the same string.
-
What is the flaw with the following implementation of the
Comparable interface for account balances.
public class Customer implements Comparable<Customer> { private String name; private double balance; public int compareTo(Customer b) { Customer a = this; if (a.balance < b.balance - 0.005) return -1; if (a.balance > b.balance + 0.005) return +1; return 0; } }Answer: violates the Comparable contract. It is possible that a.compareTo(b) and b.compareTo(c) are both 0, but a.compareTo(c) is positive or negative.
-
What is the flaw with the following implementation of the
Comparable interface for 32-bit int
account balances?
public class Customer implements Comparable<Customer> { private String name; private int balance; public int compareTo(Customer b) { Customer a = this; return a.balance - b.balance; } }Answer: the subtraction based comparator is an idiom that dates bake to Unix. However, it can fail since the difference between two 32-bit integers may not fit in a 32-bit int. Unless you know that the integers lie in a small range, it's safer to use
public int compareTo(Customer b) { Customer a = this; if (a.balance < b.balance) return -1; if (a.balance > b.balance) return +1; return 0; }Given n time stamps when file is requested from web server, find largest interval of time at which no file arrives. Solution: sort by time stamp. Scan sorted list to identify maximum gap.
- Case insensitive order.
Write a program to read in a sequence of strings and
sort them in ascending order, ignoring case.
String[] a = new String[N]; for (int i = 0; i < N. i++) { a[i] = StdIn.readString(); } Arrays.sort(a, String.CASE_INSENSITIVE_ORDER); - Grades. Write a program Grade.java to represent a data type for grades (A, B+, etc.). It should implement the Comparable interface using the natural ordering on grades by gpa.
- Students. Write an ADT Student.java that represents a student in a college course. Each student should have a login (String), a section number (integer), and a grade (Grade). Use comparators to support sorting by any of the three fields as with the Transaction ADT.
- Sort by reverse domain. Write a program Domain.java to read in a list of domain names from standard input, and print the reverse domains in sorted order. For example, the reverse domain of cs.princeton.edu is edu.princeton.cs. This is useful for web log analysis. Hint: use s.split("\\.") to split the string s into tokens, delimited by dots. Then write an appropriate compareTo method.
- Software version numbers. Create an ADT Version that represents a software version number, such as 115.1.1, 115.10.1, 115.10.2. Implement the Comparable interface so that 115.1.1 is less than 115.10.1, and so forth.
- Case insensitive comparator.
Implement your own version of the
comparator String.CASE_INSENSITIVE_ORDER.
public class CaseInsensitive implements Comparator<String> { public int compare(String a, String b) { return a.compareToIgnoreCase(b); } } - Descending order order string comparator.
Implement a comparator that sorts string
in descending order instead of ascending order.
Alternatively, you can use Collections.reverseOrder(). It returns a Comparator that imposes the reverse of the natural ordering of objects that implement the Comparable iterface.public class Descending implements Comparator<String> { public int compare(String a, String b) { return -a.compareToIgnoreCase(b); } } - Spam campaign. To initiate an illegal spam campaign, you have a list of email addresses from various domains (the part of the email address that follows the @ symbol). To better forge the return addresses, you want to send the email from another user at the same domain. For example, you might want to forge an email from wayne@princeton.edu to rs@princeton.edu. How would you process the email list to make this an efficient task?
- California runoff election for governor.
In order to thwart bias against candidates whose names appear toward the end
of the alphabet, California sorted the candidates appearing on a ballot
in the following order:
R W Q O J M V A H B S G Z X N T C I E K U P D Y F L
Write a program California.java that sorts strings according to this ordering. Try it on this list of candidates. Assume that each string is comprised solely of uppercase letters. Hint: Use a Comparator.
- Given an array of N integers, the 2-sum problem is to find a pair of integers whose sum is closest to zero. Describe an O(N log N) algorithm for the problem. Solution: sort by absolute value - the best pair is now adjacent.
- Given an array of N integers, the 3-sum problem is to determine find three integers whose sum is closest to zero. Design an algorithm that uses O(N2 log N) time and O(N) space. Hint: sort and binary search.
- Design an algorithm for the 3-sum problem that runs in O(N2) time and O(N) space. Hint: sort and clever search. Remark: quadratic lower bound in restricted decision tree model where each decision is based on the sign of an arbitrary linear combination of 3 or fewer inputs. (3-SUM hard.)
- For some fixed linear equation in 3 variables (say with integer coefficients), given N numbers, do any 3 of them satisfy the equation? Design a quadratic algorithm for the problem.
- Kendall's tau distance. Given two permutations, Kendall's tau distance is the number of pairs out of position. Write a program KendallTau.java that computes the the Kendall tau distance between two permutations of size N in O(N log N) time. Useful in top-k lists, social choice and voting theorey, comparing genes using expression profiles, and ranking search engine results.
- Antipodal points. Given N points on a circle, centered at the origin, design an algorithm that determines whether there are two points that are antipodal, i.e., the line connecting the two points goes through the origin. Your algorithm should run in time proportional to N log N.
- Antipodal points. Repeat the previous question, but assume the points are given in clockwise order. Your algorithm should run in time proportional to N.
- Idle time.
Suppose that a parallel machine processes n jobs. Job j is
processed from sj to tj.
Given the list of start and finish times, find the largest
interval where the machine is idle.
Find the largest interval where the machine is non-idle.
Simple polygon. Given N points in the plane, draw a simple polygon with the N points as vertices. Hint: find the point p with the smallest y coordinate, breaking ties with the smallest x coordinate. Connect the points in increasing order of the polar angle they make with p.
- Particle collision game. Implement the game Particles: you control a red ball with a mouse, trying to avoid colliding with blue balls that behave according to the laws of elastic collision. After a certain interval of time, introduce a new blue ball at random.
- Brownian motion. In 1827, the botanist Robert Brown observed the motion of wildflower pollen grains immersed in water using a microscope. He observed that the pollen grains were in a random motion, following what would become known as Brownian motion. This phenomenon was discussed, but no convincing explanation was provided until Einstein provided a mathematical one in 1905. Einstein's explanation: the motion of the pollen grain particles was caused by millions of tiny molecules colliding with the larger particles. He gave detailed formulas describing the behavior of tiny particles suspended in a liquid at a given temperature. Einstein's explanation was intended to help justify the existence of atoms and molecules and could be used to estimate the size of the molecules. Einstein's theory of Brownian motion enables engineers to compute the diffusion constants of alloys by observing the motion of individual particles. Here's a flash demo of Einstein's explanation of Brownian motion from here.
- Free path and free time. Free path = distance a particle travels between collisions. Plot histogram. Mean free path = average free path length over all particles. As temperature increases, mean free path increases (holding pressure constant). Compute free time length = time elapsed before a particle collides with another particle or wall.
- Collision frequency. Number of collisions per second.
- Root mean-square velocity. Root mean-square velocity / mean free path = collision frequency. Root mean square velocity = sqrt(3RT/M) where molar gas constant R = 8.314 J / mol K, T = temperature (e.g., 298.15 K), M = molecular mass (e.g., 0.0319998 kg for oxygen).
- Maxwell-Boltzmann distribution. Distribution of velocity of particles in hard sphere model obey Maxwell-Boltzmann distribution (assuming system has thermalized and particles are sufficiently heavy that we can discount quantum-mechanical effects). Distribution shape depends on temperature. Velocity of each component has distribution proportional to exp(-mv_x^2 / 2kT). Magnitude of velocity in d dimensions has distribution proportional to v^(d-1) exp(-mv^2 / 2kT). Used in statistical mechanics because it is unwieldy to simulate on the order of 10^23 particles. Reason: velocity in x, y, and z directions are normal (if all particles have same mass and radius). In 2d, Rayleigh instead of Maxwell-Boltzmann.
- Pressure. Main thermodynamic property of interest = mean pressure. Pressure = force per unit area exerted against container by colliding molecules. In 2d, pressure = average force per unit length on the wall of the container.
- Temperature. Plot temperature over time (should be constant) = 1/N sum(mv^2) / (d k), where d = dimension = 2, k = Boltzmann's constant.
- Diffusion. Molecules travel very quickly (faster than a speeding jet) but diffuse slowly because they collide with other molecules, thereby changing their direction. Two vessels connected by a pipe containing two different types of particles. Measure fraction of particles of each type in each vessel as a function of time.
- Time reversibility. Change all velocities and run system backwards. Neglecting roundoff error, the system will return to its original state!
- Maxwell's demon. Maxwell's demon is a thought experiment conjured up by James Clerk Maxwell in 1871 to contradict the second law of thermodynamics. Vertical wall in middle with molecule size trap door, N particles on left half and N on right half, particle can only go through trap door if demon lets it through. Demon lets through faster than average particles from left to right, and slower than average particles from right to left. Can use redistribution of energy to run a heat engine by allowing heat to flow from left to right. (Doesn't violate law because demon must interact with the physical world to observe the molecules. Demon must store information about which side of the trap door the molecule is on. Demon eventually runs out of storage space and must begin erasing previous accumulated information to make room for new information. This erasing of information increases the entropy, requiring kT ln 2 units of work.)
- Metric space. Extend to support spheres and hyperplanes in an arbitray metric space.
- Cell method. Useful optimization: divide region into rectangular cells. Ensure that particles can only collide with particles in one of 9 adjacent cells in any time quantum. Reduces number of binary collision events that must be calculated. Downside: must monitor particles as they move from cell to cell.
- Multi-particle collisions. Handle multi-particle collisions. Such collisions are important when simulating the break in a game of billiards.
- Sorting parallel arrays. When sorting parallel arrays, it is useful to have a version of sorting routine that returns a permutation, say index[], of the indices in sorted order. Add a method indirectSort to Insertion.java that takes an array of Comparable objects a[] as input, but instead of rearranging the elements of a[], returns an integer array index[] so that a[index[0]] through a[index[N-1]] are the elements in ascending order.
- Inplace permutation. Write a program Permutation.java that includes functions that take an array and a permutation (or inverse permutation) and rearranges the elements in the array according to the permutation (or inverse permutation). Do it in-place: use only a constant amount of extra memory.
- List files. Write a program Files.java that takes the name of a directory as a command line input and prints out all of the files in the current directory, sorted by filename. Hint: use the File ADT.
- List files. Write comparators for the type File to order by increasing/decreasing order of file size, ascending/descending order of filename, and ascending/descending order of last modification date. Use these comparators in a program LS that takes a command line argument and list the files in the current directory According to that order, e.g., "-t" to sort by timestamp. Support multiple flags to break ties. Be sure to use a stable sort.
- Ticket ranges. Given a list of ticket seats of the form A1, A2, A11, A10, B7, B9, B8, B3, find the largest non-empty block of adjacent seats, e.g., A3-A9.
- Smith's rule. The following problem arises in supply chain management. You have a bunch of jobs to schedule on a single machine. (Give example.) Job j requires p[j] units of processing time. Job j has a positive weight w[j] which represents its relative importance - think of it as the inventory cost of storing the raw materials for job j for 1 unit of time. If job j finishes being processed at time t, then it costs t * w[j] dollars. The goal is to sequence the jobs so as to minimize the sum of the weighted completion times of each job. Write a program SmithsRule.java that reads in a command line parameter N and a list of N jobs specified by their processing time p[j] and their weight w[j], and output an optimal sequence in which to process their jobs. Hint: Use Smith's rule: schedule the jobs in order of their ratio of processing time to weight. This greedy rule turns out to be optimal.
- Contiguous sum.
Given a list of real numbers and a target value V, find a contiguous
block (of any length) whose sum is as close to K as possible.
Brute force: compute the sum of each contiguous block by brute force. This takes O(N^3) time.
Partial sums: compute all partial sums s[i] = a[0] + a[1] + ... + a[i] so that contiguous blocks have a sum of the form s[j] - s[i]. This takes O(N^2) time.
Sort and binary search: form the partial sums as above and then sort them in ascending order. For each i, binary search for the s[j] that is as close to s[i] as possible. This takes O(N log N) time.
- Rhyming words.
For your poetry class, you would like to tabulate a list of
rhyming words. A crude way to accomplish this task is as follows:
- Read in a dictionary of words into an array of strings.
- Reverse the letters in each word, e.g., confound becomes dnuofnoc.
- Sort the resulting array of words.
- Reverse the letters in each word back to their original state.
Now the word confound will be next to words like astound and compound. Write a program Rhymer.java that reads in a sequence of words from standard input and prints them out in the order specified above.
Now repeat, but use a customized Comparator that sorts lexicographically from right-to-left.
Exercises
- Give an O(N log N) algorithm for computing the median of a sequence of N integers. Hint: sort first.
- Give an O(N log N) algorithm for computing the mode (value that occurs most frequently) of a sequence of N integers.
- Given a sequence of N real numbers, find the pair of integers that are closest in value. Give a O(N log N) algorithm.
- Given a sequence of N real numbers, find the pair of integers that are farthest apart in value. Give a O(N) algorithm.
- Given a sorted list of N integers and a target integer x, determine in O(N) time whether there are any two that sum to exactly x.
- Given two sorted lists of size N1 and N2, find the median of all elements in O(log N) time where N = N1 + N2.
- Give a O(N) algorithm for sorting a sequence of N integers that are between 0 and 99.
- Give a O(N) algorithm for computing the median of a sequence of N integers that are between 0 to 99.
- Give N points in the plane with integer coordinates, describe how to enumerate all points that are included two or more times in O(N log N) time. Solution: Sort using x-coordinate as primary key and y-coordinate as secondary key. Now, all duplicates are consecutive.
- A sequence of N consecutive insert into heap operations takes O(N) comparisons. Explain why its not the case that a sequence of N consecutive delete-max operations takes O(N) comparisons. Solution: would violate Theta(N log N) sorting lower bound.
- An article on Slashdot announces a new comparison-based priority queue that performs insert and delete-max in O(sqrt(log n)). Explain why this is impossible. Solution: would imply a O(n sqrt(log n)) comparison-based sorting algorithm.
- Suppose you have a sequence of N elements, with at most log N distinct ones. Describe how to sort them in O(N log log N) time.
- Element distinctness. Given N elements x1, ..., xn from a totally ordered universe, is there a pair i ≠ j such that xi = xj? Theta(N log N) lower bound in algebraic decision tree model. O(N log N) easy by sorting. Solution: Given N distinct values. Let ai be ith smallest element. In any comparison based algorithm for the problem, we must compare ai against ai-1; otherwise the algorithm could not distinguish between the case ai-1 < ai and ai-1 = ai. Consider two different permutations of the N elements. There is an element ai where ai-1 precedes ai in the first permutation but ai-1 precedes ai in the second. Since every comparison based algorithm must compare ai and ai-1, the algorithm runs differently on the two permutations. Thus, there are at leats N! leaves. reference
- Element distinctness. Given N elements, prove that determining whether any two repeated takes at least N log N comparisons. Assume that you can only access the elemnts throught the less() function. Solution: suppose the elements are all distinct and a_1 < a_2 < ... < a_N. Any correct algorithm for element distinctness must compare a_i with a_i+1 for each i < N. If not, then the algorithm would produce the same output if we changed a_i+1 to a_i (but this would change the answer from no duplicates to has duplicates). The set of comparisons the algorithm uses forms a DAG. Find the total order (in linear time) and this gives the sorted order. Thus, the algorithm can be used for sorting distinct elements and the sorting lower bound applies.
- Coin distinctness. Given N coins and a balance scale, determine if the coins all weigh different amounts. Prove that any algorithm that solves the problem has complexity at least N log N. Hint: you may use the fact that given N real numbers, element distinctness requires at least N log N comparisons in the linear decision tree model (where you can make comparisons between arbitrary linear combinations of the N elements, e.g, x1 + 2x2 < x3).
- Existence symbol table. Suppose you had a comparison-based algorithm that supports insert and exists in O(1) comparisons per operation. Explain why this is impossible in this model of computation. Solution: to solve element distinctness in O(N) time, for each of the N elments, check whether it's already in the data structure (if so, then this is a duplicate); if not, insert it.
- Lower bound for closest pair. Theta(N log N) bound for closest pair. Solution: could solve element distinctness problem in sub-lineararithmic time if had a sub-lineararithmic algorithm by finding closest pair and checking if they are equal.
- Lower bound for mode. Theta(N log N) bound for mode. Solution: could solve element distinctness problem by finding mode and checking whether it is more than one.
- Majority. Give a sequence of N elements, design an O(N) algorithm to find the majority (element that appears more than N/2 times) or determine that none exists. Solution: if a and b are two elements and a != b, then remove both of them; majority still remains. Use N-1 comparisons to find candidate for majority; use N-1 comparisons to check if candidate really is a majority. [Could solve with linear time median finding, but this is much simpler.]
- Set equality or set inclusion. Given two sets of elements S and T, does S = T or is S a subset of T?
- Set disjointness. Given two sets S and T, does S intersect T equal the empty set? Give an O(N log N) algorithm and a matching lower bound.
- Set disjointness. Given two sets of integers S and T, each in ascending order, does S intersect T equal the empty set? Prove an Omega(N log N) lower bound or given an O(N) algorithm.
- Lower bound for merging two lists. Show any comparison-based algorithm for merging two lists of size N requires at least 2N-1 comparisons in the worst case. Solution: lower bound holds even if all elements are distinct. If the ith and (i+1)st smallest elements are in different lists, then they must be compared.
- Lower bound for binary search. log(N+1) comparisons required. Solution: lower bound holds even if all elements are distinct....
- Euclidean TSP and MST lower bounds. Given an Omega(N log N) for the Euclidean TSP or Euclidean MST (requires algebraic decision trees to make sense since you want to allow a reasonable algorithm to compute distances). Solution: choose N points on a line. Optimal tour must visit the points in ascending order.
- Nearly sorted.
Given an array of N elements, each which is at most
k positions from its target position, devise an algorithm that
sorts in O(N log k) time.
Solution 1: divide the file into N/k pieces of size k, and sort each piece in O(k log k) time, say using mergesort. Note that this preserves the property that no element is more than k elements out of position. Now, merge each block of k elements with the block to its left.
Solution 2: insert the first k elements into a binary heap. Insert the next element from the array into the heap, and delete the minimum element from the heap. Repeat.
- Sorting with duplicates. There are at least N! / (n1!) (n2!) ... (nk!) leaves; thus lower bound is log(N! / (n1!) (n2!) ... (nk!)). Give matching upper bound??? partial reference. Mergesort where you remove duplicates as soon as they are discovered. How many of the ni copies of element i remain after j merge passes?
- Smallest and second smallest. Describe how to find both the smallest and second smallest element using at most N + log N comparisons. Solution: divide the elements into pairs and compare two elements in each pair. Recur with the N/2 winners from each pair. After N-1 comparisons, we have the minimum element. Note that each element is compared with at most log N other elements. In particular, the smallest element is compared with at most log N other elements, and one of these must be the second smallest. If you keep track of all the comparisons, you can remember the log N involving the smallest element and compare these.
- Smallest and largest. Describe how to find both the smallest and largest elements using at most ceil(3n/2) - 2 comparisons. Answer: divide the elements into pairs and compare two elements in each pair. Put the smallest elements in A and the largest in B. If n is odd, put element n in both A and B. This requires floor(n/2) comparisons. Now directly compute the minimum in A (ceil(n/2) - 1 comparisons) and the maximum in B (ceil(N/2) - 1) comparisons. [In fact, this is best possible.]
- Counting inversions. Prove a Theta(N log N) lower bound. This is an open research question.
- Nuts and bolts. Prove a Theta(N log N) lower bound for the nuts and bolts problem in the quicksort section. [A matching worst case upper bound of O(N log N) is known, but it is remarkably complicated.]
- Sorting a linked list. Given a singly linked list of N elements, how could you sort it in guaranteed O(N log N) time, stably, and with O(1) extra space?