Generating Cryptographically Strong Random Numbers
Cos496 - Security - Final Project
Evan Greenberg and Philip Nikolov
Introduction
This project is modeled after SGI's LavaRand.
It is an attempt to implement a cryptographically secure pseudo-random
number generator (PRNG) through video and audio input. The program
currently runs on an SGI machine, but can be adapted with a minimum of
effort to work on other platforms as well. The user selects a method
of gathering entropy (in our case, either a snapshot from an attached video
camera or a segment of audio (about 10 seconds)), and this is then hashed
(using SHA-1) and used as the seed for a PRNG (such as the counter based
generator in RSAREF).
These numbers are, obviously, as random as the images which are fed into
the camera or microphone. Since the exact positioning of the user's
body or exact tone of the user's voice (combined with the low quality and
high noise level of current input devices) is almost impossible to duplicate,
this seems a good source of randomness. To test the quality of the
pseudo-random numbers generated, we use a suite of sophisticated statistical
tests, DIEHARD.
The result of these tests indicated that our PRNG could use some improvement,
but was a significant improvement over other commercially available random
number generators (Java's java.security.SecureRandom and C's rand()).
Installation
The sources are available as a tarball here.
These should compile more or less out-of-the-box for SGI machines running
IRIX 6.x (although they assume the existence of the utilities /usr/sbin/vidtomem,
/usr/sbin/videoin,
and /usr/sbin/recordaifc). The files included are:
Compilation
Type make
Running
The executable, randGen, takes one argument N and produces a file
"rand_data" which contains N randomly generated bytes for one seed.
It runs the script do_capture to grab a snapshot of chaotic input
to determine this seed. (The included scripts video_capture and
audio_capture
provide a model of such a script. The script should take one argument,
which is the name of the temporary file to be created and filled with data
by the time the script exits.) A simple hack of the source file rng.C
should suffice to modify the performance of the program to meet most needs.
Please note that the video_capture script requires the user to
close the window it opens before the seed is generated. To remove
this feature, simply comment out the line which runs videoin in
the video_capture script.
Generating a Snapshot of Chaos
Creation of a random seed requires first obtaining a handle on some chaotic
(or unpredictable) phenomenon. In our implementation, this is either
a snapshot taken from a quickcam or a segment of audio recorded from an
attached microphone. The theory behind both of these is that there
is sufficient "random" noise added by the environment, the low quality
of these input devices, and the varying position of objects in the room
to produce a chaotic system. The bits extracted from the input device
are then run through a 13-way SHA-1 hash, to produce a 255 byte seed.
How do we grab the data from these input devices without losing all
cross-platform portability? The answer is to call out to a script
(called do_capture) and allow this script to actually generate the
data and place it in a runtime specified location (a temp file).
Thus, adding a different chaotic bitstream requires only that a new script
be written. We use two different scripts, one for audio and one for
video, and so the do_capture script is in fact a symbolic link to
the script which does the actual work.
The generation of a seed in this manner need not lack a user interface.
The script video_capture, for instance, displays a window on the
screen which shows the user what the camera is capturing, and waits for
this window to be closed before actually taking the snapshot. This
may be useful in reminding the user to open the lens of the camera, and
is a simple example of how a UI might be implemented.
Turning a Seed into a Sequence of Pseudo-Random Numbers
Every pseudo-random number generator (PRNG) contains some state, say N
bits. At each step, some transformation function leads from one state to
another. Some bits related to the new state are output. PRNG computation
is deterministic, but the hope is that to someone unaware of the internal
state, the output bits will appear random. Clearly, after at most 2^N transitions
the finite state machine of the PRNG will enter a state already encountered.
From then on, ouput will repeat. Also, obviously we don't want to start
the PRNG in the same state every time. Otherwise each time we want a fresh
random stream, the bits will be the same. This is why the starting state
must somehow be selected from outside the PRNG. The process of selecting
a starting state is called seeding. Seeding is normally done with "truly
random" data, i.e. somehting that could not be predicted by anyone.
There is a number of fast PRNGs with good statistical properties for
scientific use. However, there are few satisfactory results regarding cryptographically
strong random number generation. In particular, there seems to be just
one PRNG that is provably secure under certain intractability assumptions.
This is the Blum-Blum-Shub generator that JavaRand uses. However, this
generator is slow and difficult to get right, because of various configuration
constraints. An excellent paper about cryptographically strong PRNGS can
be found
here.
The paper by Scheneir discusses the PRNG that comes with the crypto toolkit
called RSAREF. Except for some pretty academic (i.e highly theoretical)
attacks, the PRNG looks reasonably suited for cryptographic use. The RSAREF
PRNG consists of the following:
-
The state is kept in a 128 bit counter C[i]
-
Seeding is processed as C[i+1] = C[i] + MD5(X) mod 2^128 where X is the
random seed given
-
The output at each stage is out[i] = MD5(C[i]) mod 2^128
The transition function is C[i+1] = C[i] + 1 mod 2^128
Description of Testing
The "industry standard" for testing the statistical properties of PRNGs
is called
DIEHARD.
George Marsaglia, the creaator of this software, is asking that each user
downloaded it individually so that he can keep track of the usage. To do
a good job of testing randomness, Diehard requires about 11 MB of binary
data and it runs a total of 15 various test. We do not pretend to understand
much about how these tests work. Basically, some random process is made
up, and expected frequencies of various values are computed. Then, the
sample data is used to simulate that random process. A Chi-Squared goodness
of fit is performed to see how the experimental values match up with those
theoretically expected. Our data did not do well on one of the fifteen
tests, called "Minimum Distance Test". For a comparison, consider that
the standard C library rand() function failed 7 of the tests.
Analysis of Weaknesses
There are, unfortunately, a variety of attacks on the seed generation module.
These all come from within the computer the seed is being generated on,
however. If we allow the adversary to have super-user priviliges
on the machine which generates the random number, then all is lost, since
this system role could easily capture the same chaotic data which the seed
depends upon. Additionally, it would be trivial for the super-user
to modify the programs which the scripts depend upon for getting their
data from the devices. So let us assume that the adversary does not
have super-user access to the computer. Depending on the exact permissions
allowed on the host computer, there are additional attacks from normal
users. For instance, if another user could grab a copy of the temporary
file created by the scripts, she would have access to the same random data
as the seed generator. Additionally, an adversary with physical access
to the machine could unplug the quickcam or microphone and cause mischief
in this manner. Our response to these potential security holes is
to assert that malicious users should not have access to machines which
generate seeds. This, we assert, is a better set of requirements
for producing a working implementation. Users without significant
access to the machine at the time that the random seed is generated should
not be a risk to the security of the see generation, however.
Let us summarize the trust model of our PRNG. The initial state C[0]
will depend on the seed value. If our seeding technique is good, an adversary
will not be able to guess which of the 2^128 possible starting states we
are using. From then on, we are relying on the one-wayness of the hash
MD5 to keep our internal state secret. Quote from Scneier: "While there
have been interesting cryptanalytic results on MD5 in the last several
years, none of them offer an obvious way to attack the RSAREF PRNG." The
PRNG seems cryptographically secure in the sense that observing many random
ouptut bits up to now does not tell an attacker how to predict the bits
to come. Even if the state of the PRNG is suspected compromised, it is
trivial to just add in more random seed bytes until the attacker loses
track of the state.