Lab 8: Introduction to Spreadsheets
Sun Nov 25 12:24:41 EST 2001
By the end of this lab, you will be able to:
 Use Excel to do simple calculations
 Use shortcuts to extend data and formulas over ranges
 Use builtin functions for summing, counting, averaging, etc.
 Graph data
 Perform whatif experiments on real data
 Recognize some common statistical deceptions.
The original "killer app" for personal computers was an Apple II program
called VisiCalc, written in 1978 by Dan Bricklin and Bob Frankston.
VisiCalc made it possible to use a computer for the kind of analyses
that generations of business people had previously done by hand with paper "spreadsheets":
rows and columns of related numbers that could be used to
organize data and assess alternatives in a systematic way.
Spreadsheets are an essential tool,
along with a word processor and a program for making glitzy presentations.
In terms of market share, Microsoft's Excel is the
de facto standard, with Lotus 123 (the lineal descendant of VisiCalc)
and Corel's Quattro Pro distant second and third choices.
All of these programs share a common computational model and visual appearance,
however, and although we will
use Excel in this lab, whatever you see will transfer in spirit, though not in
exact detail, to the others.
Excel is enormously powerful and complicated, so we will
investigate only a tiny fraction of its features. As you work
through the specific instructions in this lab, take time to leave the
official track and experiment on your own with anything that looks interesting.
There's little risk in this:
Excel's Undo and Redo feature let you
back out of something that went wrong, or repeat the steps that got you
someplace interesting. Undo and Redo are the small curved arrows near
the middle of this screenshot:
If you want to know more about Excel, there are many hundreds of
books, some good, and many web pages.
The party line is found at
Microsoft's Excel site,
while
John Walkenbach's spreadsheet page
provides independent material from the author of several good books.
Part 1: Cells and Formulas
Part 2: Ranges and Functions
Part 3: Importing and Graphing Data
Part 4: How to Lie with Statistics (1)
Part 5: How to Lie with Statistics (2)
Part 6: Submitting your work
Part 1: Cells and Formulas
 Starting Excel
 Basic Concepts
 Formulas
 The Rule of 72
Starting Excel
If there's an Excel shortcut on the desktop, click that; otherwise,
find it with Explorer in
"C:  Program Files  Microsoft Office  Office"; there will be a
file called Excel.exe that you can click on.
Or use "Start  Find files  Excel.exe".
When things settle down, it should look approximately like this:
Take some time now to explore the menus, toolbars, and the like.
Basic Concepts
The basic organizational unit in Excel is the (work)sheet, which
consists of an array of
rows numbered 1 through 65536 (where does that number come from?) and
columns labeled A, B, C, ..., through IV (where does that label come from?).
Near the bottom of the Excel window you will see a tab labeled Sheet1,
which is Excel's default name for the first sheet.
A workbook is a collection of
one or more sheets, a useful way to group
related data sets in a single file but keep them cleanly separated.
The default workbook name is Book1, which appears in the title bar of the window.
You're going to put the results of various parts of this lab
into multiple sheets, so pay attention to where things are going.
The individual elements of a spreadsheet are called cells.
A cell is identified by its row and column name;
thus the cell in the upper left
corner (highlighted when you started Excel) is named A1.
The highlighted cell is called the active cell.
Cells can contain numbers (the most common case) or text, and their values
can be set by what you type into them, loaded from files, or
computed by a formula that derives a value from the values of other cells.
Time to do some experimenting.
 Type some numbers into cells: put 1 in A1, 2 in A2, etc., down to 5 in A5.
 If you start at A1, then push Enter after each value,
Excel will advance the active cell to the next row automatically.
 Experiment with correcting typing errors, going back to change a previous value, and so on.
You can fix mistakes by retyping, or you can edit in the small editing window
just above the column labels:
You can change the format of cell data by "Format  Cells",
and then select "Number" to set the numeric display, "Alignment"
to control centering, "Font" to set size and color,
and so on. The most common reason to adjust cell format is to cause data
to be treated as numeric and displayed with the same
number of significant digits, or to define
the format for data that represent dates.
You can instead put text in cells to serve as headings for columns
or rows; text can be set in various sizes and fonts as well.
You can use commands under "Edit" to clear cell contents entirely.
In all of these, you can select a single cell, a group of cells,
one or more entire rows, or one or more entire columns; the
action applies to the selected range.
To insert an additional row or column, use the "Insert" menu.
Formulas
Each cell can have attached to it a formula that computes the value
displayed in the cell, usually from values in other cells. Then when
cell values change, formulas are reevaluated, and updated values are displayed.
A formula is typed into a cell just like data except that the first character
must be an equals sign =.
To experiment with this:
 Make sure cells A1 through A5 contain the digits 1 through 5.
 Go to cell A7 and type =A1+A2+A3+A4+A5 and push Enter.
(You can type in lower case if you prefer; Excel will capitalize automatically.)
If this doesn't cause cell A7 to display the value 15, check your typing.
 Now experiment with changing values in cells A1 through A5,
verifying each time that the sum in A7 is properly updated.
Note that when you select A7 later, the cell value is displayed in
the cell itself, but the formula is displayed in the formula area just
above the sheet.
A further experiment:
 Type the formulas =A1*A1 in cell B1, =A2*A2 in cell B2,
and so on through B5. Verify that the computed values are correct.
 Place the corresponding summation formula in B7 and verify that it
produces the right answer.
It takes a lot of typing to enter data values and formulas this way,
so Excel provides some convenient shortcuts.
 Clear all the cells that you have used so far.
 Put 1 in A1 and 2 in A2.
 Select A1 and A2.
 Place the mouse on the lower right corner of A2; the cursor should
change to a plus sign.
 Drag the plus sign down the column to row 30 or so and release it.
The column should fill with the integers from 1 to 30: Excel has made a
(very good) guess about how you want the sequence 1, 2, ... extended and
has done it for you.
 Type the formula =A1*A1 in cell B1.
 Using the same technique, extend the series to cell B30 and
verify that the values are right.
Again, Excel has extended a sequence, but observe carefully that it has extended
the formulas, not the values.
Finally,
 Type the formula =2^A1 in cell C1.
 Extend this series down to cell C30.
Questions that you will have to answer:
 What happened at cell C27?
 Suppose you start with the value 2 in cell D1.
What different (not the same as in C2) formula could you put in D2 and then extend to D30
that would provide the same sequence of values in D1:D30 as in C1:C30?
The Rule of 72
There's a handy rule of thumb that is used to make quick estimates
of how fast interest or investments compound, called "the rule of 72."
If you invest a hundred dollars now at 10% per annum (assuming you could do that well!),
at the end of a year it will pay ten dollars of interest.
That gives you $110; if you invest that for the second year at the same
10%, at the end of two years, you'll have $121. Eventually, after about
7 or 8 years, you'll have $200; your money will have doubled.
The rule of 72 tells you how to estimate the doubling time
approximately: divide 72 by the interest rate, and that's the
number of periods. So at 10% per year, the doubling time is about 7.2 years.
If the interest rate is only 6%, the doubling time is about 12 years.
Or consider Moore's Law, that the capacity of semiconductor chips
doubles every 18 months. By the Rule of 72, this says that capacity
is compounding at about 72/18 = 4% per month. (How much is that per week?)
The approximation isn't perfect, but it's pretty reasonable
for the kinds of percentages found in daily life for home mortgage
interest rates, bond interest, car payments, and so on.
Your task is to make a spreadsheet that shows how good the approximation
is for a variety of interest rates.
 Clear the contents of Sheet1.
 Create a table with a heading row B1:L1 that contains the rate in percent
from 5 to 15.
 Put the year number in A3:A23, from 0 to 20 inclusive.
 Into the range B3:L23 put the compounded values, beginning with year 0.
The formula for compound interest is value = (1+rate)^years.
 Set the background color of the first cell in each column where the initial value
has doubled to yellow.
Try to do this with as little typing and as much use of Excel's
extension feature as possible.
Your table should look something like this when done:
The rule shows further doublings as well; for instance,
note that at 8%, one doubling takes 9 years, and the next takes
another 9 years  that is, 18 years doubles twice, or a factor of four.
Check some of the other entries for similar consistency.
In this lab, you will be using a new sheet for each part,
each with its own name. For this part,
 Doubleclick on the tab that says Sheet1.
 Type the name Rule72 in its place.
 Insert a comment (not a cell value) in cell A1 that
answers the two questions at the end of the previous section.
Part 2: Ranges and Functions
 Ranges
 Functions
 Inserting Rows and Columns
Ranges
So far we have talked about groups of cells by explicitly naming them
in a summation like =A1+A2+A3, or
implicitly by letting Excel extend a series for us.
It's also possible to specify a rectangular array of cells
in terms of the cells at the upper left and lower right corners.
Such an array is called a range, and is usually written
with the names of the two cells separated by a colon.
For example, A1:A10 describes a
column range 10 cells high but only one cell wide.
The range B2:K2 represents
a row of 10 cells starting at B2,
and A1:J10 represents an array of 100 cells, 10 by 10,
that starts in the upper left corner.
As a special case,
A2:A2 is a range that consists of a single cell,
and that can be abbreviated to just the familiar A2.
Excel provides more complicated ranges, but for the most part, simple
rectangular arrays are all we need.
It is also possible to name a range, which is easier to understand and
refer to in a big spreadsheet; we won't be using that facility here.
Functions
The range notation gives us a way to specify an arbitrarily large
group of cells, and thus write out computations much more compactly
and clearly.
For instance, it's impractical to type a formula like
=A1+A2+A3+... if there are more than a few terms in the
summation; a range is a lot easier.
Excel provides a great number of mathematical functions that
perform operations over a range of cells.
The simplest of these is sum, which adds up the numbers in a range:
the formula =SUM(range) produces the sum
of the values in the cells in the specified range.
 Using the methods of the previous page, put the numbers 1..10 in
cells A1 through A10.
 Put the formula =SUM(A1:A10) in cell A12.
 Verify that the answer is correct.
Among the other useful functions are average, product, max (which
computes the maximum value in a range), min, and
count, which counts the number of nonblank cells in the range.
 Using the data in the range A1:C5, put formulas for sum, average, max, min, and count
of each row in cells E1 through E5.
 Put numbers in some of the cells in the range
A1 through C5 and observe how the values of the formulas are updated.
Be sure you understand what is happening.
Inserting Rows and Columns
What happens if you need another row or column, because
your data set has expanded?
If you insert a row or a column within a range,
Excel is pretty clever about guessing what you mean, and will
extend the formula for you.
But if you add a row or column at one end of the range, Excel
isn't sure what you had in mind, and doesn't change the formula.
Verify this behavior:
 Put the numbers 1..10 in cells A1 through A10.
 Put the formula =SUM(A1:A10) in cell A12.
 Now insert a new row 10 and put the value 100 in it. Note
that the sum is now displayed in A13 instead of A12.
Does the value in A13 change? Does the formula in A13 change?
 Continue the experiment by inserting a new row before row 1,
and a new row after the last row of data. Check what happens to the
data and the formula.
There's nothing to save for this part of the lab, but be sure that
you understand how these functions work, since you will need some
of them in later parts.
Part 3: Importing and Graphing Data
 Importing Data From Files
 Graphing Data
Importing Data From Files
It's all well and good to create synthetic data to play with, but in
the real world, one usually works with real data. In this section,
you will get to experiment with data from the
Registrar's web site,
which contains, among other things, Princeton enrollments for many years.
The first step is to import a file of data extracted from one of
these pages; we've cleaned it up a bit for you. The file
7099.txt contains registration data for 1970 through 1999 as
ordinary text. It begins like this:
PRINCETON UNIVERSITY OPENING ENROLLMENT
19701999
Academic Undergraduate Undergraduate Graduate Graduate Total
Year Males Females Males Females Enrollment
197071 3247 391 1282 243 5163
197172 3202 751 1196 272 5421
197273 3107 975 1200 319 5601
...
 Import this file into your spreadsheet
and edit it so it looks exactly like this:
This is most easily done by
 In Netscape, select the part of 7099.txt that you want to copy.
 Copy it using the Netscape "Edit  Copy" menu.
 In Excel, go to Sheet2.
 Paste the data into cell A1.
This will put all the data into a single cell, cell A1,
which is not where you want it;
there seems to be no way to convince Excel to paste
the data into multiple cells directly.
So the next step is to convert the data
into the range A1:F30:
 Under Data, select "Text to Columns..."
to invoke the socalled Text to Columns wizard.
 Use this to get the data into the right form.
(Most of its choices are correct for this data set,
though you may need to experiment with column widths.)
Note that there are no headings in our display;
do not include headings in yours either.
The next step is to insert a new column D that contains the
total undergraduate population for each year.
 Insert a new column D.
 Fill in the cell values in column D with the total undergrad population
Use Excel's mechanisms for extending series
for the corresponding year.
 Use the max function to put the maximum value of columns
B, C and D in B32, C32, and D32 respectively.
Since you will be adding other data to the spreadsheet,
rename Sheet2 to Grads:
 Doubleclick the tab labeled Sheet2.
 Type the name Grads in its place.
Graphing Data
The next step is to draw a graph of some of this data.
Excel will let you display data in a lot of different ways,
some sensible and some definitely not. We want to see two
graphs here. One is a plain vanilla graph that shows male
and female undergraduate enrollments and the total in a
simple way, like this:
 Use the chart wizard
("Insert  Chart" or this icon
)
to create a graph that looks approximately
like the one above, and place it on sheet Grads.
 Use the chart wizard to create another graph that is as different
from the previous one as you can manage while still displaying
the same information in a form that can potentially be understood.
Place it on sheet Grads, near the other chart.
 Make the two charts approximately the same size and together about
the same height as the data, so everything can be seen at once.
Part 4: How to Lie with Statistics (1)
GeeWhiz Graphs
Last year the Newark StarLedger included these two graphs, showing the dramatic decline
of the Dow Jones and NASDAQ stock indexes over the period August 31 to September 29, 2000.
A quick glance at these graphs shows that both markets tumbled
badly, and by about the same amount.
Or did they? These are examples of what Darrell Huff, in the wonderful book
How to Lie with Statistics,
calls the GeeWhiz Graph, a form of statistical chicanery
that is all too common in newspapers and magazines.
Geewhiz graphs are deceptive because they use the entire chart area to
give the impression of a big change, when in fact not much
at all is really happening.
Consider the upper graph, for example.
Although it looks like the Dow has gone down by about two thirds,
it only went from 11,200 down to 10,650, which is about a 5 percent decline.
The deception is compounded when the NASDAQ graph is shown adjacent.
Again the drop appears to be enormous, but in fact it's from
4200 to 3670, which is about 13 percent. So the NASDAQ lost
more than the Dow by quite a bit, but neither loss is anywhere near as
big as the graphs imply.
The heart of the deception is plotting a graph that doesn't include
the full range of data. Each graph should be plotted with the
Y axis beginning at zero; that would give a much more accurate sense
of the magnitude of the change. And then if each is plotted with
a comparable upper bound, somewhat above the largest data value,
that makes it possible to compare the two graphs
in a meaningful way.
Your task is to produce two sensible graphs that permit such a fair comparison,
like this one for the Dow:
The files
dow.txt and nasdaq.txt
contain the raw data from which these graphs were produced.
Use those files to produce two graphs of about the same size (2 or 3 inches on a side) but where
the Dow scale goes from 0 to 12,000 and the NASDAQ scale from 0 to 4,500;
plot them side by side, along with the data values in three columns
(date, Dow, NASDAQ).
Note that the data in these text files is in the wrong order;
use Excel's sorting capabilities (e.g., Data  Sort...)
to get it into the right order.
 Go to Sheet 3 and rename it Lies
 Load the data from dow.txt into A1:B22 of sheet Lies.
Cut and paste is easiest.
 Draw a graph approximately like the one above, but with the Y origin set to zero.
 Load the data from nasdaq.txt into C1:C22 of sheet Lies.
 Draw a graph like the one above, with Y origin set to zero.
 Position the graphs side by side and near the 3 data columns.
Part 5: How to Lie with Statistics (2)
It's always nice when others recognize one's true greatness, as
US News and World Report
did in September 2000 and again in September 2001. But how
are these rankings really determined? And just how much do they
really mean?
US News and World Report
explains
their methodology. They collect data on 16 factors for each school,
weight the factors
according to how important they seem, and then sort the results.
For example, academic
reputation accounts for 25% of the score, and alumni giving rate counts for 5%;
Princeton ties with four other top schools on the former and wins
big on the latter. On the other hand, if only SAT scores mattered,
Princeton would be well behind Caltech, Harvard, MIT, and even Yale and Stanford.
There are two problems with these ranking schemes:
the data itself is suspect, and the weighting factors are arbitrary.
In this lab, we'll accept the data values,
however flaky they might be, and focus
on the weighting factors.
The file
usnwr.xls contains some carefully fiddled data and a set of weights,
loosely based on the original, that preserve the same ordering;
many factors have been unceremoniously dropped, so don't read too much into this.
Here is the display:
The first row shows the factors, and the second row gives weights that sum to 1
(cell I2). The range I4:I14 shows the computed scores.
The formula box shows
the formula being used to compute I4; subsequent rows have the same formula
except for cell references. A couple of factors are scaled (SAT) or complemented
(acceptance rate); for instance a low acceptance rate is deemed better than
a high acceptance rate.
Your task is to find several sets of nonnegative weights that will rearrange the schools in
various ways. Note that the weights in B2:H2 must sum to 1.
 Download file usnwr.xls from
the browser and open it in Excel. (It will appear in a new workbook.)
 Select and Copy all its cells.
 Go to Book1.
 Insert a new Sheet, and rename it Rank.
 Select cell A1 and Paste.
Now you can begin experimenting to find interesting weighting factors.
 Find a set of weights that drops Harvard as far down as you can
while keeping Princeton in first place.
Copy the weights into B17:H17.
 In the interests of fairness, find a set of weights that drops Princeton
as far down as you can
while placing Harvard first.
Copy them into B18:H18.
 Find a set of weights that puts Chicago as high up as you can.
Copy them into B19:H19.
You should also experiment with Excel's sorting capabilities here;
at the end, you should be able to sort the schools by any combination
of factors, for example, by decreasing reputation score and
within that by increasing acceptance rate.
Part 6: Submitting your Work
 Be sure everything is saved on your network drive
 Submit your spreadsheet by email
At this point you should have a workbook Book1 with four sheets:
Rule72, Grads, Lies, Rank. Check through them to make sure they
look right.

First, before you do anything else, be sure to save
your work as
lab8.xls on your network drive.
This is your backup in case something goes wrong with the submission.
 Mail a copy of lab8.xls to yourself, as another backup.
Make sure that it arrives OK, that the attachment is about the
right size, etc.
 Paranoids can save a copy on a floppy disk.
When you are absolutely sure that you have all the individual sheets
in lab8.xls and saved it on your network drive,
As usual:
 The subject of the message should be "Lab 8  Your Name"
 Make sure that you are logged in as yourself when you send mail
If you've completed the lab, transferred your work to your Unix account,
and sent your email to cs109@princeton.edu
or cs111@princeton.edu, you're all done.