Lab 8: Sound

By the end of this lab, you will learn:

how sound is represented and processed in a computer
the implications of different sampling rates for sound files
how to use a sophisticated sound manipulation program (GoldWave)

You will be asked to experiment with sound that originates on an audio CD. If you have some favorite CD, bring it to the lab. Otherwise, you will be forced to impose on your friends or suffer the instructor's antediluvian tastes.

Use a good sound system if possible; the built-in speakers in laptops are so bad you can't hear differences. If you have earphones, that's usually a good alternative.

In this lab we explore the manipulation of sound -- the other half of audio-visual media (we did graphics in an earlier lab). Sound is an integral part of many web pages; with the appropriate tools, you can turn your web site into a multimedia experience.

In this lab we also examine how the amount of data saved affects the quality of a sound, and study the different qualities of sound available and the amount of storage required when using the different alternatives.

We'll do these with a really first-rate program called Goldwave. If you do not have GoldWave on your machine, you can download it from the GoldWave web site; a local copy is also available. GoldWave is shareware; if you decide to keep it, you must pay for it.

Part 1: How sound is represented in a computer
Part 2: Sound manipulation with GoldWave
Part 3: Sound representations
Part 4: Special effects
Part 5: Finishing up

A heads-up on things that might cause problems:

If you use the version of Goldwave that is already installed on cluster machines, you will probably not then be able to add the MP3 encoder lame_enc.dll to the Windows directory. You should instead download Goldwave (onto the Desktop is fine), then download lame_enc.dll into the same directory, and use that version.
There are a lot of compression algorithms, with mysterious names. To find the ones that are likely to give most compression without trying them all, bear in mind that mono < stereo, 8 bits < 16 bits, 8 KHz < 44.1 KHz, etc., etc.
When you're saving a file from Goldwave, setting the name to xxx.mp3 does not change the type; you have to set that too.

Part 1: How sound is represented in a computer

Sound is a varying air pressure wave, produced by all manner of sources, that is sensed by our ears. In an analog sound system, the pressure wave is captured by some kind of transducer (a microphone, most often) that produces an electrical voltage or current that varies proportionally to the sound pressure. This electrical signal can then be transmitted by telephone, or broadcast by radio, or preserved on magnetic tape (old audio cassettes) or used in other ways. En route, the sound might be processed to change its character in some way, for example to reduce noise or squeeze out unwanted or unneeded frequencies.

At the other end, the electrical signal is used to recreate the sound by vibrating some mechanical surface in a loudspeaker or an earphone, reproducing the original pressure wave (with varying degrees of fidelity) so that we can hear the sound.

The higher-pitched the sound, the more rapid the vibration. A pure tone is a regular sine wave like this:

while a more complicated sound is a composite of such waves, and might look like this:

This is reflected in the analog mechanisms used to preserve the sound. For example in old vinyl LP technology, a record has a long spiral groove whose shape is a representation of the sound waves it records. When the record is played, a fine needle follows the changes in the groove and creates an electric signal proportional to them; when amplified, this drives a loudspeaker.

Today, almost all sound systems are "digital", in the sense that the electrical signal from a transducer like a microphone is converted into a sequence of numeric values, proportional to the strength of the signal, and those numeric values are stored, transmitted, processed, etc., before ultimately being converted back into an electrical signal and then to sound. Thus, for example, an audio CD contains about an hour's worth of sampled voltage values: 44,100 samples per second in each of two stereo channels. Each sample is 16 bits, representing one of 65,536 possible voltage values; multiplying this out gives about 650 MB for an hour, or roughly 10 MB per minute. The WAV (wave) file format uses this same basic representation.

The conversion from continuous analog electrical signals to discrete digital numeric values and back again presents a number of issues that we will explore in this lab:

How often do we sample? Sampling more often means that we can track the changes in a rapidly-varying signal more accurately. If we don't sample often enough, we miss meaningful changes in the signal.
How accurately do we sample? More precise signal measurements mean more accurate representation of the signal wave.
How much space do we need to store the sound? More frequent and more accurate samples means more data, which can add up very quickly indeed.
How much bandwidth (information carrying capacity) do we need to transmit the sound? It takes more communications capacity to preserve all the low and high frequencies that might be needed (for hi-fi sound, perhaps) or might be unnecessary (telephone quality speech, for instance).
How do we convert back from numbers to an analog waveform? If we intend to reproduce the original waveform with reasonable fidelity, we have to do this direction carefully as well.

Many of the sound systems we see are differentiated mainly in how they choose answers to these questions. This lab will explore some of the tradeoffs.

Analog processing is always lossy -- information leaks away and noise creeps in as analog sounds are processed and transmitted -- and the kinds of transformation of analog information that can be performed are quite limited. By contrast, once a set of numeric values has been obtained that capture a desired sound, those values can be stored, copied, transmitted, and processed in a rich variety of ways, without losing anything of the original. Thus for most purposes, digital sound is preferred.

One common digital process is compression: by taking advantage of how human hearing works, it is possible to compress sound data a great deal without having a perceptible effect on its quality. This fact is at the heart of music formats like MP3, which are typically 10 times smaller than the equivalent uncompressed sound. We'll do some experiments with MP3 in this lab.

Analogously, we can take advantage of properties of human speech to compress telephone speech a great deal; this is used in cellphones. But the compression techniques that work for speech do not work as well for music, as we'll try to demonstrate too.

Other digital transformations include speeding up or slowing down sound without changing its frequency, removing noise and other artifacts, and mixing sounds from multiple sources into a single one. We'll do a little bit of this in this lab.

Finally, it's possible to add carefully controlled redundancy to digital information that makes it possible to detect and even correct some kinds of errors; this is used extensively in digital sound, especially in audio CDs and in cellphones.

There's a lot on the web about sound. If you want to do some further reading, here is one clear description of how sound works, out of many.

MIDI

MIDI ("Musical Information Digital Interface") is widely-used representation for instrumental music that does not store sound waves at all. Instead it stores a digital representation of the notes to be played, including what note, what instrument, and what duration. The resulting form is very compact and very flexible for many purposes: it's easy to transpose into a different key, play on different instruments, synthesize musical notation from it, and the like.

The device that is going to produce the sounds has to synthesize the sounds from an internal definition of what they sound like. For example, a low-end synthesizer will typically have 64 "voices", representing the 64 different instrumental voices it can approximate. Depending on the device, the fidelity to a real instrument might be anywhere from very good to very bad. Typically, piano sounds are pretty good; the human voice is not, save as a sort of characterless choral effect.

MIDI is used in all kinds of synthesizers; pop bands are very fond of the kinds of sounds it produces. We won't do anything with MIDI in this lab, but it's worth knowing about. If you're interested in creating MIDI, there are software packages like Cakewalk and Noteworthy.

Streaming Audio

Many Internet audio sources do not provide file download, but "stream" the sound to a player on your machine; RealPlayer is probably most common, but there are many others. The idea is that you don't have to have giant files on your machine; the sounds come in as they are needed but without occupying space on your computer. The other reason why many sources like streaming media (audio or video) is that in theory it prevents you from making your own copy. Of course that's a vain hope; programs like TotalRecorder will let you make copies if you like (but read the warning below about fair use of copyrighted material).

Part 2: Sound manipulation with GoldWave

GoldWave is an elaborate sound-processing program that handles a variety of sound formats, with Wave (.wav) and MP3 (.mp3) the most common. You should find GoldWave under "Cluster Applications". The program has two windows, the main program window:

and the device controls:

The device controls window also shows the frequency spectrum as sound is being played; right-click the title bar to bring up a Properties menu to change this display.

The main window displays the waveforms from sound files you are working with and gives you tools to edit them. The device controls window is for playing sounds, recording, and the like. The display above shows 33 seconds of the Aria from Bach's Goldberg Variations, played on a piano, in WAV format, in a MIDI version, and in a very low bit rate encoding suitable for cell phones. Note the file sizes (the MIDI version has the whole piece, more like 33 minutes), then listen to all three to get a sense of the tradeoffs between space and quality.

Take a few minutes to become familiar with the main window in GoldWave. Notice that when you drag the mouse across the sound wave (which should be a flat line right now), it highlights it. This is how you will select areas of your recording to use special effects on. The Undo button will allow you to undo the effects of a menu choice, but you can only go back one step. You can cut and paste highlighted segments of your sound file, just as you would a normal text file. These allow you to rearrange any sound file.

Part 3: Sound representations

In this part you will compress each of five sound files, first with MP3 and then with the best compression algorithm you can find, and report on what you discover.

Here are five WAV files, each roughly the same length:

zero.wav 10 seconds of total silence
sine1k.wav 10 seconds of a pure 1 KHz tone
rand.wav 10 seconds of random noise ("white noise")
finehour.wav 13 seconds of a famous speech
pachel.wav 13 seconds of a famous composition

If you prefer, in place of pachel.wav you may use a short excerpt from any music that you like that originates at CD audio quality (i.e., WAV format). GoldWave will extract audio from a CD: use Tools / CD Audio Extraction and set the From and To times to give yourself about 10-30 seconds. You can also use an excerpt from aria.wav.

Load these sound files into GoldWave (select Open from the File menu). Familiarize yourself with the Device Controls, then experiment. Play each file and verify that it is what it says. Using Paintshop Pro, make an image like this one that illustrates how the waveforms of random and sine waves differ, and that shows the frequency spectrum as it is playing the random noise file. (The image below shows the spectrum for the sine wave, showing the energy concentrated around 1 KHz.)

GoldWave is capable of converting a file into any of a large number of other formats. You can explore the available conversions by doing Save As.... Be careful not to overwrite your original file if the output format has the same extension as the original, and also be careful not to replace the original file with the new format when you save; GoldWave by default uses the converted format.

MP3 is the compressed format that is used for most music on the Internet (and the conversion is the "ripping" process that you're probably familiar with as a precursor to burning a CD). MP3 is usually about 1/10 the size of the corresponding .wav file. Does your experience bear out this rule of thumb? Note that there are several variants of MP3. Pick the one that is closest to the original .wav encoding, probably 44,100 stereo at 160 Kbps.

Create MP3 versions of these five files, using the same names except with extension .mp3.
Compute the ratio of WAV size / MP3 size, using the precise sizes found by right-clicking on the file name in Explorer, selecting Properties, and using the Size value (not the "size on disk")
For each file, find the encoding from among those offered by GoldWave that makes the resulting file as small as possible. You can eliminate a large number from contention right away if they use a high bit rate, if they do stereo, etc. This is part is about size, not fidelity.
For each of these compressions, report exactly which compression algorithm you used (with all its parameters), how big the resulting file is, and what the ratio of WAV size / compressed size is.
Put the five maximally compressed files in your public_html directory. Remember what you did, since it will be part of the report at the end.

MP3 conversion may not work if your version of GoldWave does not have an MP3 converter installed (which often seems to be true in the cluster systems). GoldWave will offer to help you install one. You can do this yourself: save the file lame_enc.dll in the folder C:\WINNT. Ask a TA for help if necessary. If you have a different ripper, it's fine to use that instead.

Part 4: Special effects

The last section of the lab explores some ways of getting special effects.

First, use the Effects menu to apply some interesting combination of special effects to your chosen music file; you can also use pachel.wav or aria.wav if you prefer, and you can also include any speech files you like. Mix and match anything you like.

Create a music and/or speech file with special effects.
Save it as effects.mp3 (in MP3 format, to save space) in your public_html folder.
In the file lab8.html, tell us what effects you used and how, as spelled out in Part 5.

Here are some notes on the Effects menu:

Playback Rate: this determines the quality of the sound you will hear back. The higher the number, the better (but the more storage space it will take should you choose to save it).
Transpose: allows you to take your recording and make it start on different notes, thereby making it sound brighter or darker.
Doppler: changes the pitch of the selection. It presents you with a graph, where you can drag the sound line to indicate the points at which you would like the pitch to be high and low.
Reverse: makes the selection go backwards
Silence: eliminates any sound from the selected area

Second, use the Tools / Expression evaluator menu to create the most interesting totally synthetic sound file that you can manage. It should be no more than about 15-20 seconds, and should be stored in MP3 format again. Our sine wave file was created with

	sin(2*pi*f*t)

with f set to 1000 and t set to 10. Use the Help button to get started.

Create a sound file with special effects computed by the expression evaluator.
Save it as expression.mp3 (in MP3 format, to save space) in public_html.
In lab8.html, tell us what expressions you used and how.

Part 5: Finishing up

Place each of the sound files that we asked you to save in your public_html directory and create a new HTML file (not your home page) with links to these files. Call this file lab8.html. Here is a template for lab8.html that must be used to organize this information. You can download a copy by right-clicking on the link.

We would also like you to put in the HTML file lab8.html some text to explain what you have done.

These are the files to which you should have links and the explanations you should include in lab8.html:

Use the template lab8.html.
Five .mp3 files, with the compression ratio of each.
Five .wav files compressed as much as possible, with the compression ratio and an explanation of what you did with each.
A special effects file effects.mp3 that does something interesting with speech and/or music files, with an explanation of what you did.
A special effects file expression.mp3 generated by the expression evaluator tool, with an explanation of what you did.

Note that we want .mp3 files and even more compressed ones. In general the .wav files are too big, so you don't need to save them. But make sure that you've saved everything we've asked for. And make sure the files are readable: we can't grade it if we can't read it.

When you're done, send email to cos111@cs.princeton.edu with subject "Lab 8 -- Your name".

If you've completed the lab, sent your email to cos111@cs.princeton.edu and transferred your work to your Unix account, then you're done.

And since this is the last lab of the year, you really are done. We hope you enjoyed the labs.