2.2   Libraries and Clients


This section is under construction.


Each program that you have written consists of Java code that resides in a single .java file. For large programs, keeping all the code in a single file is restrictive and unnecessary. Fortunately, it is very easy in Java to refer to a method in one file that is defined in another. This ability has two important consequences on our style of programming:

Using static methods in other programs.

To refer to a static method in one class that is defined in another, we use the same mechanism that you have been using to invoke methods like StdOut.printf(): Keep both classes in the same directory in your computer. To call a method, prepend its class name and a period separator. For example, consider Gaussian.java. The definition of one of its methods requires the square root function. For purposes of illustration, suppose that we wish to use the sqrt() implementation from Newton.java. All that we need to do is to keep Gaussian.java in the same directory as Newton.java and prepend the class name when calling sqrt(). If we want to use the standard Java implementation, we call Math.sqrt() if we want to use our own implementation, we call Newton.sqrt().

Moreover, any other class in that directory can make use of the methods defined in Gaussian, by calling Gaussian.phi() or Gaussian.Phi(). For example, we might wish to have a simple program SATmyYear.java that takes a value x from the command line and prints (Φ(x) - 1019)/209, so that we do not need to type in the mean and standard deviation each time we want to know the percentage scoring less than a given value for a certain year. The files Gaussian.java, Newton.java, and SATmyYear.java implement Java classes that interact with one another: SATmyYear calls a method in Gaussian, which calls a method that calls a method in Newton.

Modular programming diagram
We describe several details about the process. Client, API, and Implementation

Libraries.

We refer to a class that comprises methods for use by other programs as a library. We focus in this section on the idea that we can build user-defined libraries, which are nothing more than classes that each contain a set of related methods for use by other programs. No Java library can contain all the methods that we might need for a given computation, so this ability is a crucial step in addressing complex programming applications.

This same mechanism is effective for user-defined libraries. The API allows any client to use the library without having to examine the code in the implementation. The guiding principle in API design is to provide to clients the methods they need and no others. For example, the Gaussian distribution functions are so important in applications that it is worthwhile to articulate this API:

Gaussian API

Random numbers.

StdRandom.java is a library for generating random numbers from various distributions.
Standard random API

Standard statistics.

Now we consider a library for a set of statistical calculations that arise in all sorts of applications in science and engineering. Such a library is useful, for example, when we perform a series of scientific experiments that yield measurements of a quantity. The average value of those measurements, otherwise known as the mean μ, is an estimate of the value of the quantity. The formula for the sample variance σ^2 is given below. The sample standard deviation σ isthe square root of the sample variance.
Mean and variance
The minimum and maximum values are also of interest, as are the median (the value which is smaller than and larger than half the values) and other order statistics such as the kth smallest value for some given k. StdStats.java implements the following API:
Standard Statistics API
The implementation of the median(), select(), and sort() methods require more advanced algorithms than the others for efficient implementations, so we defer these implementations until Section 4.2.

The main() test client for StdStats puts the command-line arguments into an array and calls each of the methods min(), mean(), stddev(), and max() to print out the minimum, mean, standard deviation (in parentheses) and maximum. In this case, a more extensive test of the calculations is called for (see Exercise 2.2.5). Typically, as we debug or test new methods in the library, we adjust the unit testing code accordingly.

Modular programming.

The library implementations that we have considered so far are each interesting in their own right, but in the present context, we are interested in them as building blocks for a first example of modular programming. Instead of writing a new program that is self-contained in its own file to address every task that we face, we break up each programming task into smaller, more manageable subtasks, then implement and independently debug programs (which we refer to as modules) that address each subtask. Good libraries facilitate modular programming by allowing us to define and providing solutions to important subtasks for future clients. Whenever you can clearly separate tasks within a program, you should do so.

For example, CouponExperiment.java runs experiments to estimate the value of the quantity of interest in the coupon collector problem. It is a relatively small program because it is a client of several of the modules that we have considered in this chapter (Coupon.java, StdStats.java, and StdRandom.java, Harmonic.java). The experimental results that we get are in agreement with mathematical analysis, which says that the expected number of coupons collected before all N values are found should be about N times the Nth Harmonic number (1 + 1/2 + 1/3 + ... + 1/N) and the standard deviation should be about N π / sqrt(6). Indeed, checking experimental results for coupon collection against mathematical analysis is well-known as the coupon collector test for the quality of a random number generator. The scientific conclusion that we can draw from this experiment is that our random number generator passes the coupon collector test: we could not distinguish the numbers it produces from uniformly random numbers.

We emphasize modular programming throughout this book because it has many important advantages, including the following:

To describe the relationship among modules in a modular program, we often draw a dependency graph, where we connect two class names with an arrow labeled with the name of a method if the first class contains a call on the method and the second class contains the definition of the method. Such diagrams become increasingly useful as the number of modules grows and relationships among them become more complicated.
Dependency graph

If you encounter an old program (or a new program written by an old programmer!) you are likely to find one huge module - a long sequence of statements, stretching to several pages or more, where any statement can refer to any variable throughout the program. Variables whose scope extends to a whole program are known as global variables. We avoid global variables in modular programs, but their use is common in lower-level and older programming languages. Huge modules that use global variables are extremely difficult to understand, maintain, and debug.

Coin flipping and the Gaussian distribution.

Program Flip.java uses an array to simulate a version of the Gambler's ruin experiment we considered in Section 1.3. In each trial, the program flips N fair coins and counts the number of times i of the N coins are heads. The program animates the histogram using standard graphics. According to the Central Limit Theorem, the resulting histogram is approximately normally distributed with mean N/2 and variance N/4. Below are sample outputs when N = 16, 32, and 64.

Histogram of 16 coin flips Histogram of 32 coin flips Histogram of 64 coin flips

Chaos game.

Program Sierpinski.java plays the chaos game on a triangle: Start at one of the vertices of an equilateral triangle. Then pick one of the three vertices at random and move halfway from the current point to that vertex. Draw each point that you visit. What pattern emerges? The images below represent snapshots of the chaos game after 1,000, 10,000, and 100,000 steps. You might recognize the figure as the Sierpinski triangle.

chaos game with 1,000 points      chaos game with 10,000 points      chaos game with 100,000 points

Q + A

Q. I tried to use StdRandom, but get the following error message. What's wrong?

Exception in thread "main" java.lang.NoClassDefFoundError: StdRandom.

A. You need to download StdRandom.java into the directory containing your client, or use your operating system's classpath mechanism.

Q. Is there a keyword that identifies a class as a library?

A. No, any set of public methods will do. There is a bit of a conceptual leap in this viewpoint, because it is one thing to sit down to create a .java file that you will compile and run (perhaps run again sometime later with different data), quite another thing to create a .java file that you will rely on much later in the future, and still another thing to create a .java file for someone else to use in the future. You need to develop some libraries for your own use before engaging in this sort of activity, which is the province of experienced systems programmers.

Q. How do I develop a new version of a library that I have been using for a while?

A. With care. Any change to the API might break any client program, so it is best to work in a separate directory. But then you are working with a copy of the code. If you are changing a library that has a lot of clients, you can appreciate the problems faced by companies putting out new versions of their software. If you just want to add a few methods to a library, go ahead: that is usually not too dangerous.

Q. How do I know that an implementation behaves properly? Why not automatically check that it satisfies the API?

A. We use informal specifications because writing a detailed specification is not much different than writing a program. Moreover, a fundamental tenet of theoretical computer science says that doing so does not even solve the basic problem because there is no way in general to check that two different programs perform the same computation.

Exercises

  1. Add to Gaussian.java an overloaded method implementation phi(x, mu, sigma) that computes the Gaussian distribution with a given mean μ and standard deviation σ, based on the formula φ(x, μ, σ) = φ((x - μ)/σ)/σ. Also include an overloaded implementation of the associated cumulative distribution function Phi(x, mu, sigma), based on the formula Φ(x, μ, σ) = Φ((x - μ)/σ).
  2. Write a static method library Hyperbolic.java that implements the hyperbolic trigonometric functions based on the definitions sinh(x) = (e^x - e^-x)/2 and cosh(x) = (x^x + e^-x)/2, with tanh(x), coth(x), sech(x), csch(x) defined in a manner analogous to standard trigonometric functions.
  3. Write a test client for both StdStats and StdRandom that checks that all of the methods in both libraries operate as expected. Take a command-line parameter N, generate N random numbers using each of the methods in StdRandom, and print out their statistics. Extra credit : Defend the results that you get by comparing them to those that are to be expected from mathematical analysis.
  4. Develop a client that does stress testing for StdRandom. Pay particular attention to discrete(). Do the values sum to 1?
  5. Develop a client that does stress testing for StdStats.
  6. Implement the bars() method for PlotArray.
  7. Write a method that takes double values ymin and ymax with ymin strictly less than ymax and an array double a[] as parameters and uses the StdStats library to linearly scale the values in a[] so that they are all between ymin and ymax.
  8. Add to StdRandom a static method maxwellBoltzmann() that returns a random value drawn from a Maxwell-Boltzmann distribution with parameter σ. To produce such a value, return the square root of the sum of the squares of three Gaussian random variables with mean 0 and standard deviation σ. The speeds of molecules in an ideal gas have a Maxwell-Boltzmann distribution.
  9. Draw dependency graphs for Bernoulli.java and IFS.java.
  10. Modify Bernoulli.java to animate the bar graph, replotting it after each experiment, so that you can watch it converge to the normal distribution. Then add a command-line argument and an overloaded binomial() implementation to allow you to specify the probability p that a biased coin comes up heads, and run experiments to get a feeling for the distribution corresponding to a biased coin. Be sure to try values of p that are close to 0 and close to 1.
  11. Write a library Matrix.java that implements the following API.
  12. Write a program MarkovSquaring.java that is a version of Markov.java that uses matrix squaring. Use StdArrayIO.java and Matrix.java to modularize your code.
  13. Develop a complete implementation of StdArrayIO.java.

Creative Exercises

  1. Array plot library. Develop your own version of PlotArray.java. Be creative! Try to make a plotting library that you think that you will use for some application in the future.
  2. Sicherman dice. Suppose that you have two six-sided dice, one with faces labeled 1, 3, 4, 5, 6, and 8; and the other with faces labeled 1, 2, 2, 3, 3, and 4. Compare the probabilities of occurrence of each of the values of the sum of the dice with those for a standard pair of dice. Use StdRandom, StdStats, and StdPlot.

    Answer: dice with these properties are called Sicherman dice: they produce sums with the same frequency as regular dice (2 with probability 1/36, 3 with probability 2/36, and so on).

  3. Craps. Here are the rules for a pass bet in the game of craps: Roll two 6-sided dice, and let x be their sum.
    • if x is 7 or 11, you win
    • if x is 2, 3, or 12, you lose
    otherwise, repeatedly roll two the dice until their sum is either x or 7
    • if their sum is x, you win
    • if their sum is 7, you lose
    Write a modular program to estimate the probability of winning a pass bet. Modify your program to handle loaded dice, where the probability of a die landing on 1 is taken from the command line, the probability of landing on 6 is 1/6 minus that probability, and 2-5 are assumed equally likely. Hint: Use StdRandom.discrete().
  4. Tukey plot. Suppose that the standard input stream is a sequence of pairs of numbers where the first number in each pair is an int and the second a double value. Write a program that takes an integer N from the command line and, assuming that all the int values on the input stream are between 0 and N-1, uses StdDraw to make a Tukey plot. For each i, draw a vertical line with x-coordinate i whose endpoints are determined by the statistics of all the double values associated with i in the standard input, running from the y value such that 10 per cent of the values are lower to the y value such that 10 per cent of the values are higher. Then draw a thin rectangle centered on the line that runs from one standard deviation below the mean to one standard deviation above the mean. Hint: Use StdStats.
  5. IFS. Run IFS.java to learn the pattern that emerges for the input tree.txt. Experiment with other inputs to IFS to create a pattern of your own design like this one, the Sierpinski triangle, or the Barnsley fern.
  6. Dynamic histogram. Suppose that the standard input stream is a sequence of double values. Write a program that takes an integer N and two double values l and r from the command line and uses StdPlot to plot a histogram of the count of the numbers in the standard input stream that fall in each of the N intervals defined by dividing (l , r) into N equal sized intervals. Use your program to add code to your solution to exercise 2.2.3 to plot a histogram of the distribution of the numbers produced by each method, taking N from the command line.
  7. Voting machines. Develop a StdRandom client (with appropriate static methods of its own) to study the following problem: Suppose that in a population of 100 million voters, 51% vote for candidate A and 49% vote for candidate B. However, the voting machines are prone to make

Web Exercises

  1. Sample standard deviation. The sample standard deviation of a sequence of N observations is defined similar to the standard deviation except that we divide by N-1 instead of N. Add a method sampleStddev() that computes this quantity.
  2. Maxwell-Boltzmann distribution. Generate 3 Gaussian random variables z1, z2, and z3. Let z = sqrt(z1*z1 + z2*z2 + z3*z3). Modify Flip.java to plot a histogram of values of z. In statistical physics the resulting distribution is known as the Maxwell-Boltzmann distribution.
  3. Barnsley fern. Write a program Barnsley.java that takes a command line argument N and plots a sequence of N points according to the following rules. Set (x, y) = (0.5, 0). Then update (x, y) to one of the following four quantities according to the probabilities given.

    PROBABILITY NEW X NEW Y
    2% 0.5 0.27y
    15% -0.139x + 0.263y + 0.57 0.246x + 0.224y - 0.036
    13% 0.170x - 0.215y + 0.408 0.222x + 0.176y + 0.0893
    70% 0.781x + 0.034y + 0.1075 -0.032x + 0.739y + 0.27


    The pictures below show the results after 500, 1000, and 10,000 iterations.

    Barnsley fern Barnsley fern Barnsley fern


  4. Iterated function system. An Iterated function system (IFS) is a general way to produce fractals like the Sierpinski triangle and the Barnsley Fern. Experiment with different affine transformations and produce some interesting pictures. Program IFS.java is a data-driven version of Chaos.java. You can run it on the inputs sierpinski.txt or fern.txt or tree.txt. For example, try the following to get trees.

    PROB NEW X NEW Y
    10% 0.05 0.6y
    10% -0.05x -0.5y + .5
    20% 0.46x - 0.15y 0.39x + 0.38y + 0.3
    20% 0.47x - 0.15y 0.17x + 0.42y + .55
    20% 0.43x + 0.28y -0.25x + 0.45y + .5
    20% 0.42x + 0.26y -0.35x + 0.31y + 0.35

    The Black-Scholes model predicts that the asset price at time t will be S' = S exp { (rt - 0.5*sigma^2*t + sigma ε sqrt(t) }, where epsilon is a standard Gaussian random variable. Can use Monte Carlo simulate to estimate. To estimate the value of the option at time T, compute max(S' - X, 0) and take mean over many trials of epsilon. The value of the option today is e^-rT * mean. European put = max(X - S', 0). Reuse function. Name your program BlackScholes.java. See Exercise 2.1.30 for an exact formula for this case.

  5. Simulation. Application: some kind of simulation which uses StdRandom and StdStats to flip coins and analyze mean/variance. [Ex: physics, financial based on Black-Scholes hedge simulation. Simulation needed to price options whose payoff depends on the price path, not just the price at the maturity time T. Ex: Asian average price call = max(0, S_bar - X) where S_bar is the average price of the asset from time 0 to T. Lookback option = max(0, S(T) - min_t S_t). Idea: discretize time into N periods.] another reference Break up simulation into various pieces encapsulated as functions.
  6. Flaming fractals. Implement a generalization of IFS to produce fractal flames like Water Lilies by Roger Johnston. Flaming fractals differ from classic IFS by using nonlinear update functions (sinusoidal, spherical, swirl, horseshoe), using a log-density display to color pixels according to how many times they result in the process, and incorporating color based on which rule was applied to get to that point.
  7. Random point on a sphere. Use StdRandom.gaussian() to generate a random point on the surface of a sphere or hypersphere using the following method: generate N random values from the gaussian distribution, x[0], ..., x[N-1]. Then (x[0]/scale, ..., x[N-1]/scale) is a random point on the N-dimensional sphere, where scale = sqrt(x[0]^2 + ... + x[N-1]^2).