COS 226 Programming Assignment

Bin packing

princetontelecomm.com has just hired you as a consultant to help them reduce costs in their customer service department. They digitally record customer service phone calls and store all the sound files on portable disks at the end of each week. Since your marginal cost is proportional to the total number of disks used, your task is to assign the sound files to disks using as few disks as possible. Admirably, you recall from COS 126 that this problem is NP-hard; thus it seems hopeless to find an efficient algorithm for finding the optimal packing. Instead, your goal is to design a heuristic that runs fast and produce high quality solutions.

We formulate the bin packing problem as follows:  given a set of N file sizes between 0 and 1,000,000 KB (1 GB), find a way to assign them to a minimal number of disks, each of capacity 1 GB. An intuitive heuristic immediately springs to mind. The worst-fit heuristic is to consider the file sizes in the order they are presented:  if the sound file won't fit on any disk, create a new disk; otherwise place the file on a disk that has the most remaining space. For example, this algorithm would put the sizes 700,000, 800,000, 200,000, 150,000, 150,000 onto three disks: {700,000, 200,000}, {800,000, 150,000}, {150,000}.

Your task is to implement this heuristic.

Perspective.   The bin packing problem is a fundamental problem for minimizing the consumption of a scarce resource, usually space. Applications include: packing the data for Internet phone calls into ATM packets, optimizing file storage on removable media, assigning commercials to station breaks, allocating computer memory, and minimizing the number of identical processors needed to process a set of tasks by a given deadline. In the latter example, the scarce resource is time instead of space. Cloth, paper, and sheet metal manufacturers use a two-dimensional version of the problem to optimally cut pieces of material (according to customer demand) from standard sized rectangular sheets. Shipping companies use a three-dimensional version to optimally pack containers or trucks.

Item data type.   To use the priority queue, you need to specify and implement an appropriate Item data type to represent the disks. In addition to a list of the file sizes, this data type should contain a key field for use with a priority queue ADT. The files item.h and item.c provide a sample interface and implementation of the Item data type for ints. One of your challenges is to rewrite these two files to be useful in your file storing applications. This will require some thought (and perhaps a few iterations on the problem).

Priority queue ADTs.   You need to develop an efficient data structure that supports all of the basic operations needed to implement that heuristic. You are required to implement the heuristic using the client-implementation-interface paradigm.

Create an interface pq.h that specifies which operations your data structure will support. You will certainly need insert and delete the maximum, so a priority queue seems like a judicious starting point. Create a file pq.c that implements these operations and a client program worstfit.c that uses these operations to perform the worst-fit heuristic. As a starting point, you may use this priority queue implementation and interface. They are adaptions of the heap code in Sedgewick Program 9.2. You are free to add functionality to the interface and implementation to support the client.

Input and output. Your client program will read in the set of file sizes (guaranteed to be between zero and a million) from standard input. Your program should output the number of disks used by the heuristic and the sum of all file sizes divided by 1 million (a lower bound on the number of disks required). If the number of files is less than 100, you should also print out the disks in decreasing order of remaining space. For each disk, print out the amount of remaining space and the file sizes that it includes (in the order that they were inserted). For debugging, the output below also includes a unique id number (optional) for each disk that represents the order in which the disk was created.

% worstfit < input20.txt
file sizes sum = 6.58 GB
total disks    = 8
     5 325754: 347661 326585 
     0 227744: 420713 351543 
     7 224009: 383972 392019 
     4 190811: 324387 484802 
     6 142051: 340190 263485 254274 
     3 116563: 347560 204065 331812 
     2 109806: 396295 230973 262926 
     1  82266: 311011 286309 320414 

So in the above example, the total size of the stored files (the input) is 6.58 GB, the total number of disks that were consumed by the heuristic is 8. Disk #5 (the 6th disk that was created) has 325754 kb of free memory, and files of sizes 347661 and 326586 kb are stored on it. Disk #0 (the first disk that was created) has 227744 kb of free memory, and files of sizes 420713 and 351543 kb are stored on it....Disk #1 (the 2nd disk that was created) has 82266 kb of free space, and files of sizes 311011, 286309 and 320414 kb are stored on it.

Analysis.   Run your heuristic on a variety of inputs, with N = 100, 1,000, 10,000, 100,000, and 1,000,000. Do a few trials for each value of N using random weights between zero and one million, and make intelligent comments and guesses about the effectiveness of the heuristic.