Generating Cryptographically Strong Random Numbers

Cos496 - Security - Final Project

Evan Greenberg and Philip Nikolov


Introduction

This project is modeled after SGI's LavaRand.  It is an attempt to implement a cryptographically secure pseudo-random number generator (PRNG) through video and audio input.  The program currently runs on an SGI machine, but can be adapted with a minimum of effort to work on other platforms as well.  The user selects a method of gathering entropy (in our case, either a snapshot from an attached video camera or a segment of audio (about 10 seconds)), and this is then hashed (using SHA-1) and used as the seed for a PRNG (such as the counter based generator in RSAREF).  These numbers are, obviously, as random as the images which are fed into the camera or microphone.  Since the exact positioning of the user's body or exact tone of the user's voice (combined with the low quality and high noise level of current input devices) is almost impossible to duplicate, this seems a good source of randomness.  To test the quality of the pseudo-random numbers generated, we use a suite of sophisticated statistical tests, DIEHARD.  The result of these tests indicated that our PRNG could use some improvement, but was a significant improvement over other commercially available random number generators (Java's java.security.SecureRandom and C's rand()).


Installation

The sources are available as a tarball here.  These should compile more or less out-of-the-box for SGI machines running IRIX 6.x (although they assume the existence of the utilities /usr/sbin/vidtomem, /usr/sbin/videoin, and /usr/sbin/recordaifc).  The files included are:

Compilation

Type make

Running

The executable, randGen, takes one argument N and produces a file "rand_data" which contains N randomly generated bytes for one seed.  It runs the script do_capture to grab a snapshot of chaotic input to determine this seed. (The included scripts video_capture and audio_capture provide a model of such a script.  The script should take one argument, which is the name of the temporary file to be created and filled with data by the time the script exits.)  A simple hack of the source file rng.C should suffice to modify the performance of the program to meet most needs.

Please note that the video_capture script requires the user to close the window it opens before the seed is generated.  To remove this feature, simply comment out the line which runs videoin in the video_capture script.


Generating a Snapshot of Chaos

Creation of a random seed requires first obtaining a handle on some chaotic (or unpredictable) phenomenon.  In our implementation, this is either a snapshot taken from a quickcam or a segment of audio recorded from an attached microphone.  The theory behind both of these is that there is sufficient "random" noise added by the environment, the low quality of these input devices, and the varying position of objects in the room to produce a chaotic system.  The bits extracted from the input device are then run through a 13-way SHA-1 hash, to produce a 255 byte seed.

How do we grab the data from these input devices without losing all cross-platform portability?  The answer is to call out to a script (called do_capture) and allow this script to actually generate the data and place it in a runtime specified location (a temp file).  Thus, adding a different chaotic bitstream requires only that a new script be written.  We use two different scripts, one for audio and one for video, and so the do_capture script is in fact a symbolic link to the script which does the actual work.

The generation of a seed in this manner need not lack a user interface.  The script video_capture, for instance, displays a window on the screen which shows the user what the camera is capturing, and waits for this window to be closed before actually taking the snapshot.  This may be useful in reminding the user to open the lens of the camera, and is a simple example of how a UI might be implemented.


Turning a Seed into a Sequence of Pseudo-Random Numbers

Every pseudo-random number generator (PRNG) contains some state, say N bits. At each step, some transformation function leads from one state to another. Some bits related to the new state are output. PRNG computation is deterministic, but the hope is that to someone unaware of the internal state, the output bits will appear random. Clearly, after at most 2^N transitions the finite state machine of the PRNG will enter a state already encountered. From then on, ouput will repeat. Also, obviously we don't want to start the PRNG in the same state every time. Otherwise each time we want a fresh random stream, the bits will be the same. This is why the starting state must somehow be selected from outside the PRNG. The process of selecting a starting state is called seeding. Seeding is normally done with "truly random" data, i.e. somehting that could not be predicted by anyone.
There is a number of fast PRNGs with good statistical properties for scientific use. However, there are few satisfactory results regarding cryptographically strong random number generation. In particular, there seems to be just one PRNG that is provably secure under certain intractability assumptions. This is the Blum-Blum-Shub generator that JavaRand uses. However, this generator is slow and difficult to get right, because of various configuration constraints. An excellent paper about cryptographically strong PRNGS can be found here. The paper by Scheneir discusses the PRNG that comes with the crypto toolkit called RSAREF. Except for some pretty academic (i.e highly theoretical) attacks, the PRNG looks reasonably suited for cryptographic use. The RSAREF PRNG consists of the following:

Description of Testing

The "industry standard" for testing the statistical properties of PRNGs is called DIEHARD. George Marsaglia, the creaator of this software, is asking that each user downloaded it individually so that he can keep track of the usage. To do a good job of testing randomness, Diehard requires about 11 MB of binary data and it runs a total of 15 various test. We do not pretend to understand much about how these tests work. Basically, some random process is made up, and expected frequencies of various values are computed. Then, the sample data is used to simulate that random process. A Chi-Squared goodness of fit is performed to see how the experimental values match up with those theoretically expected. Our data did not do well on one of the fifteen tests, called "Minimum Distance Test". For a comparison, consider that the standard C library rand() function failed 7 of the tests.


Analysis of Weaknesses

There are, unfortunately, a variety of attacks on the seed generation module.  These all come from within the computer the seed is being generated on, however.  If we allow the adversary to have super-user priviliges on the machine which generates the random number, then all is lost, since this system role could easily capture the same chaotic data which the seed depends upon.  Additionally, it would be trivial for the super-user to modify the programs which the scripts depend upon for getting their data from the devices.  So let us assume that the adversary does not have super-user access to the computer.  Depending on the exact permissions allowed on the host computer, there are additional attacks from normal users.  For instance, if another user could grab a copy of the temporary file created by the scripts, she would have access to the same random data as the seed generator.  Additionally, an adversary with physical access to the machine could unplug the quickcam or microphone and cause mischief in this manner.  Our response to these potential security holes is to assert that malicious users should not have access to machines which generate seeds.  This, we assert, is a better set of requirements for producing a working implementation.  Users without significant access to the machine at the time that the random seed is generated should not be a risk to the security of the see generation, however.

Let us summarize the trust model of our PRNG. The initial state C[0] will depend on the seed value. If our seeding technique is good, an adversary will not be able to guess which of the 2^128 possible starting states we are using. From then on, we are relying on the one-wayness of the hash MD5 to keep our internal state secret. Quote from Scneier: "While there have been interesting cryptanalytic results on MD5 in the last several years, none of them offer an obvious way to attack the RSAREF PRNG." The PRNG seems cryptographically secure in the sense that observing many random ouptut bits up to now does not tell an attacker how to predict the bits to come. Even if the state of the PRNG is suspected compromised, it is trivial to just add in more random seed bytes until the attacker loses track of the state.