Due midnight, Wednesday Sep 20.
Collaboration policy for COS 109:
Working together to really understand the material in problem sets and
labs is encouraged, but once you have things figured out, you must part
company and compose your written answers independently. That helps
you to be sure that you understand the material, and it obviates questions
of whether the collaboration was too close.
You must list any other class members with whom you collaborated.
Do not use Google or other search engines for these questions. Your task is to work from what you know and can reason about, not what you can look up. Similarly, don't use ChatGPT or similar generative AI programs. Empirically, they are pretty bad at answering these kinds of questions, as we will discuss some day.
Problem set answers need not be long, merely clear enough that we can understand what you have done, though for computational problems, show enough of your work that we can see where your answer came from. There is no need to repeat the question. Thanks.
And a reminder from the first problem set: don't forget that if the
data you start with is approximate, the results cannot be precise.
The world is full of big numbers, some of which mean something, but most people have no intuitive grasp of how big such numbers are, nor are they comfortable doing arithmetic on them. The purpose of this problem set is to get you to work with some big numbers. In these problems, and in the course in general, it's much better to use exponents like 1015 rather than words like "quadrillion" or long decimal numbers like 100000000000000; exponents are far easier to read and compute with, so you are less likely to make arithmetic errors. It's also easier for us to grade answers if you use scientific notation -- we won't have to count long strings of zeros either.
In the following, pay attention to units: we will follow the standard convention that B is for bytes and b is for bits. So, 1 Gb is one gigabit and 1 GB is one gigabyte; 1 Gbps represents a transfer rate of 1 gigabit per second. You should also ignore overhead and delays; this set of questions is entirely about how long it takes to transfer information.
(a) Companies like Google and Facebook have multiple data centers -- big buildings full of computers -- that store replicated copies of their information. If a company builds a new data center, it must clone the data from an existing data center. Suppose that Google has 100 petabytes (100 PB) of Google Maps data. Approximately how many hours would it take to send the Google Maps data to a new data center over a 100 gigabyte per second (100 GB/sec) data link? (That's probably 1,000 to 10,000 times faster than your home Internet connection.)
(b) If you were to store the 100 petabytes on 1 TB micro SD cards, how many cards would that be?
(c) If you were to ship the SD cards from California to New York in a single truck, what is the approximate transfer rate in GB/sec? Make your own estimate of driving time from San Francisco to New York City first, then check it with mapping software if you like.
(d) Is a single truck enough? How many U-Haul trucks like the one below would be needed to hold the SD cards, if each card is 1cm by 1cm by 1mm?
(e) If a homing pigeon takes two hours to carry a 1 TB SD card
from Princeton to New York, what is its approximate throughput in MB/sec?
(This is not a new idea; check out this famous Internet standard on
In The Social Atom (2007), author Mark Buchanan says "Take a piece of whisper-thin paper, say 0.1 mm thick. Now suppose you fold it in half twenty-five times in a row, doubling its thickness each time. How thick will it be? Almost everyone asked such a question will grossly underestimate the result." Right now, make your own estimate, which you should be able to do in your head or with paper and pencil (no calculator!) on the basis of what we've done in class. When you do the rest of this exercise you can assess whether you are better than "almost everyone."
(a) If Buchanan is correct about the thickness of a piece of paper, about how many meters thick will the folded paper be?
(b) Is Buchanan's estimate of 0.1 mm much too high, much too low, or
about right, and why? (You can probably make your own estimate by
observation near printers.)
"There are only 10 types of people in the world: those who can read binary and those who can't."
This problem is a combination of estimation (how many cars are there in the US?) and converting decimal numbers into binary. If you are not yet comfortable with bits, bytes and binary numbers, check out the practice problem program. It's a couple of short Python examples that you can run directly, and can modify yourself if you make a copy of the notebook; you should probably do that. Check out Chapter 7 of UDW for more info on Colab notebooks.
Get some practice on how many bits and bytes it takes to represent numbers of various sizes, then answer these questions.
(a) An electronic car door key and its car must share some unique
identification number so that my key won't unlock your car and vice
(i) Roughly how many bits would be needed to provide each car in the United States with a unique identification number. Briefly explain your reasoning.
(ii) Exactly how many bytes would this number require? [Note: there are no fractional bytes; it's always an integral number.]
(iii) Suppose we include Canadian cars. How many bytes would now be required?
(b) While we're talking about digits:
(i) What range of numbers can you represent with the fingers and thumbs of two hands if you treat each finger and thumb as a binary digit?
(ii) What is the range if you include toes as well as fingers. (Ignore the fact that unless you are unusually dexterous, you can't move your toes independently.)
Either way, convert your answers to PDF and submit them by uploading a file called pset2.pdf to the "Pset2" assignment in Gradescope. When submitting on Gradescope, you will be asked to label each question with the page of the PDF your answer appears on (feel free to reach out if this causes any issues). You can submit as many times as you like; we'll only look at the last one.