COS 109: Problem Set 2

Wed Sep 9 10:20:00 EDT 2020

Due midnight, Thursday Sept 17 Eastern time (that is, late Thursday evening)

Collaboration policy for COS 109: Working together to really understand the material in problem sets and labs is encouraged, but once you have things figured out, you must part company and compose your written answers independently. That helps you to be sure that you understand the material, and it obviates questions of whether the collaboration was too close.

You must list any other class members with whom you collaborated.

Problem set answers need not be long, merely clear enough that we can understand what you have done, though for computational problems, show enough of your work that we can see where your answer came from. There is no need to repeat the question. Thanks.

And a reminder from the first problem set: don't forget that if the data you start with is approximate, the results cannot be precise.

The world is full of big numbers, some of which mean something, but most people have no understanding or intuitive grasp of how big such numbers are, nor are they comfortable doing arithmetic on them. This problem set requires you to work with some big numbers. In these problems, and throughout the course, try to use exponents like 1015 rather than words like "quadrillion" or long decimal numbers like 100000000000000; exponents are easier to read and compute with, so you are less likely to make arithmetic errors. It's also easier for us to grade answers if you use scientific notation -- we don't have to count long strings of zeros.

1. Big Data, Big Numbers

In the following, pay attention to units: we will follow the standard convention that B is for bytes and b is for bits. So, 1 Gb is one gigabit and 1 GB is one gigabyte. 1 Gbps represents a transfer rate of 1 gigabit per second, and 1 GBps is one gigabyte per second, which is 8 times faster. You should also ignore overhead and delays; this set of questions is entirely about how long it takes to transfer information.

(a) Companies like Amazon, Google and Facebook have multiple data centers that store replicated copies of their information. If a company builds a new data center, it must clone the data from an existing data center. Suppose that Google has 50 petabytes (PB) of Maps data. Approximately how many hours would it take to send the Google Maps data to a new data center over a 100 gigabyte per second data link?

(b) If instead you were to store the 50 petabytes on 256 GB SD cards like the ones in the picture below and drive them in one or more trucks from California to New York, what is the transfer rate in GBps? Make your own estimate of driving time from San Francisco to New York City first, then check it with Google Maps.

(c) How many tractor-trailers like the one below would be needed to hold the SD cards? Each card is roughly 1 inch by 1 inch by 1/10 inch.

(d) If an SD card like the one above costs $40, how much would Google have to spend to buy enough of them?

(e) Alternatively, you could store the information on DVDs. Each DVD can hold about 5 GB of data. If you were to store the information on DVDs, about how many would be needed?

(f) Suppose a DVD is 1 mm thick. If you were to stack the DVDs vertically, about how many meters tall would the stack be?

2. Powers of 2, powers of 10

The comedian Steven Wright says "I have a map of the United States...actual size. It says, 'Scale: 1 mile = 1 mile.' I spent last summer folding it."

(a) Suppose for simplicity that the map is 4000 km by 4000 km. How many times would you have to fold it in half to make it into a 1 meter by 1 meter square? (Ignore the practical reality that you can't fold a piece of paper more than a few times -- this is a gedanken experiment.)

(b) If the original map is made of standard paper, 0.1 mm thick, approximately how thick will the folded map be?

(c) Is the estimate of 0.1 mm much too high, much too low, or about right, and why? (You can make your own estimate by observation near printers.)

3. Bits, Bytes and Binary

"There are only 10 types of people in the world: those who can read binary and those who can't."

© Bill Amend, Foxtrot

This problem is a combination of estimation (how many cars are there in the US?) and converting decimal numbers into binary. If you are not yet comfortable with bits, bytes and binary numbers, check out the practice problem generator. It's a couple of short Python examples that you can run directly, and can modify yourself if you make a copy of the notebook; you should probably do that. Check out Chapter 7 of UDW for more info on Colab notebooks.

Get some practice on how many bits and bytes it takes to represent numbers of various sizes, then answer these questions.

(a) An electronic car door key and its car must share some unique identification number so that my key won't unlock your car and vice versa.

    (i) Roughly how many bits would be needed to provide each car in the United States with a unique identification number. Briefly explain your reasoning.

    (ii) Exactly how many bytes would this number require? [Note: there are no fractional bytes; it's always an integral number.]

    (iii) Suppose we include Canadian cars. How many bytes would now be required?

(b) While we're talking about digits:

    (i) What range of numbers can you represent with the fingers and thumbs of two hands if you treat each finger and thumb as a binary digit?

    (ii) What is the range if you include toes as well as fingers. (Ignore the fact that unless you are unusually dexterous, you can't move your toes independently.)


Please use this Word template for your answers. It would be a great help if you could type your answers so I can read them, but if you prefer to write by hand, that's OK as long as you are very neat. (That is, not like me!).

Submit your problem set in PDF format by uploading a file called pset2.pdf to You can submit as many times as you like; we will only look at the last one.