COS 126 Assignment 8: Pit Mining

COS 126

Pit Mining

Programming Assignment 8

Due: Wednesday, 11:59pm

Dilbert the Digger wants to mine ore (or treasure) so as to maximize his net profits. The mining region is divided into blocks, and for simplicity we assume a one dimensional layout. From a preliminary geological survey, Dilbert estimates that removing block i will produce a net profit (value of ore minus processing cost) of a_i million dollars. Because of environmental restrictions, Dilbert is required to mine for ore in only one contiguous set of blocks. Which blocks should he choose? In other words, given a sequence of N integers (possibly negative), determine the maximum possible sum of any consecutive sequence. The problem is easy if all the numbers are positive: choose the entire sequence. The difficulty is when there are negative integers: should you take a negative integer in the hope that nearby positive integers will compensate for it? For the input below, the maximum possible sum is 104, achieved by taking blocks 2 through 6. Note that the sum will always be at least 0, since Dilbert can always choose not to dig at all.

block	0	1	2	3	4	5	6	7	8	9	10
net profit	12	-34	40	6	-10	56	12	-1	-15	10	4

Applications: In real world applications, mine operators need to solve a three dimensional version of the problem. Moreover, there are further geological restrictions, e.g., you can't uncover a region 100 feet underground without first uncovering some of its surrounding regions. The two-dimensional version of this problem also arises in image processing, where the goal is to find the "maximum likelihood estimate" of a certain kind of pattern.

Part 1: brute force. Your first step is to design a brute force algorithm to solve the problem. Your program should read the integers using scanf(). You will probably find it convenient to store them in an array a[]. Now, for every possible pair of integers p and q such that 0 <= p < q < N, compute the sum a[p] + ... + a[q] from scratch. Output the maximum value over all possible pairs. Also, output a pair of indices p and q that achieve this value. (If all the values are negative, your program should output the value 0 and set both indices to -1.) Call your program brute.c, and test it out on various inputs supplied on the checklist page.

File format and testing. The file format consists of integers separated by whitespace. Use the Unix command /usr/bin/time to determine how long your program takes to process each input. Determine the theoretical complexity of the brute force algorithm as a function of the input size N. Justify your answer in the readme file. Use the profiling features of lcc to determine how many times each instruction is actually executed, and compare your theoretical estimates with the observed performance.

Part 2: a better algorithm. The brute force algorithm will not be useful on large inputs. Can you think of a way to improve it? There are many ways to reduce the amount of computation. Perhaps the simplest is to observe that you are wastefully recomputing the sum a[p]+...+a[q] from scratch each time. Observe that a[p]+...+a[q] and a[p]+...+a[q+1] are extraordinarily similar. In fact, you only need one extra addition, not (q - p), to go from the former to the latter. This observation will dramatically improve your code. Make this modification.

Call your program better.c, and test it out as in Part 1. Now, you should be able to solve much larger problems than before. What is the complexity of your algorithm as a function of N?

Part 3: a divide-and-conquer algorithm. The algorithm in Part 2 is far superior to that in Part 1. In this part, you will design an even higher performance algorithm, using the divide-and-conquer paradigm. The basic idea is as follows: to solve a problem of size N, divide the problem up into two problems of size roughly N/2; solve each subproblem (recursively, of course!); then figure out a way to combine their solution to determine a solution to the original problem. After dividing up the original problem at the middle block m, you will have two smaller subsequences, say b and c.

Recursively find the maximum contiguous sum within subsequences b and c. Call these subsequences b_max and c_max.

b_max

c_max

If the maximum contiguous sum in the original problem falls entirely in b or entirely in c, then you have solved the problem. The remaining case is if the maximum contiguous sum contains the single block m (and maybe also some adjacent blocks). Denote the maximum sequence containing m by d_max. Thus, the divide-and-conquer scheme is: compute b_max and c_max recursively; compute d_max directly; then return the best of the three.

d_max

It is possible to compute d_max in linear time by computing (i) the largest sequence whose right endpoint is m-1 and (ii) the largest sequence whose left endpoint is m+1. You will want to use a method similar to Part 2 to do (i) and (ii). An appropriate gluing together of (i) and (ii) with m yields d_max.

This essentially describes the complete algorithm, modulo a few details, like the base case. This code will be more subtle than the previous two, so be sure to carefully plan out your program. Call your program conquer.c, and test it out as in Part 1. Now, you should be able to solve some really huge problems! What is the complexity of your algorithm as a function of N? Justify your answer.

Challenge for the bored. Design an even faster algorithm. Either use your cleverness and develop an alternate strategy, or perform code tuning on the divide-and-conquer algorithm. For example, recursive programs can usually be sped up by not taking the recursion all the way down to the lowest level, e.g., in mergesort it is advantageous to switch over to insertion sort when the number of elements is less than 16 (instead of 1 in the pure mergesort algorithm). Or you can unrecursify your program - in this case, eliminating instructions from the innermost loop usually produces the best results.

What to submit. Submit a readme file along with the programs from Parts 1, 2, and 3. Use the given filenames, brute.c, better.c, and conquer.c. Be sure to provide answers to the questions from the checklist page in your readme file.

This assignment was developed by Robert Sedgewick and Kevin Wayne, and is an adaption of an assignment by Jon Bentley.