Assignment 4: Lab work

Make some music using Linear Predictive Coding.

LPC works best with speech, and at lower sampling rates.  I recommend 22k mono.   The speech should also be clean, and unaccompanied by distracting sounds.
If you can, record your own voice.  Do it in a quiet room and then use the cassette deck in the lab to transfer the sound.

You can locate lots of speech on the internet.  Check out http://www.geek-girl.com/audioclips.html.  You really need clean, clear speech, however, and this is harder to find.



1) Analysis

Once you have a 22k mono speech sample, called speech.snd, let us say.  The first thing you need to do is approximate the pitch range of the sample.  Male voices are generally between 80 and 250 hz, and female voices generally between 120 and 350 hz.  You can try these numbers but you may need to adjust them for given voices.  If your guesses are bad you will hear the pitch change suddenly, or jump an octave at the wrong places (like an adolescent whose voice is changing).   Getting a good pitch analysis is one of the most important steps towards getting good synthesis.  Having these in hand just say

makelpc   speech.snd    lowpitch    highpitch   inputskip    duration    npoles

makelpc is a shell script in /u/paul/m325/bin.  It does a pitch analysis of the sound, an lpc analysis, stabilizes the lpc analysis, merges the pitch and lpc analyses, and does a flat resynthesis of the sound.  You will end up with a files called  speech.snd.pch,   speech.snd.lpc  and  speech.snd.lpc.snd, which are, respectively, the pitch analysis, the lpc/pitch analysis and the synthesis test soundfile.  If the result sounds crummy try changing the lowpitch and/or highpitch values.  A program called lpcplot will allow you to look at the data set.

24 poles is generally a good number for a 22k file, but it is sometimes worth experimenting with a few more or less.



2) Synthesis

Now that you've got a good data set you can start to play with it.

The Cmix program which does the synthesis is called lpcplay.

Here is a sample minc script (/u/paul/m325/bin/samplelpc.minc)

===========================================
/* sample lpc minc script */

numpoles = 24

output("test.snd")

nframes = dataset("f.snd.lpc",numpoles)   /* open data set, second arg is npoles */

lpcstuff(thresh=.04,  randamp=.10, unvoicedrate= 0,rise=0,decay=0)
                                     /* threshold is usually between .001
                                          and .04, randamp about .05 to .15 */
fps = sr(1)/125               /* define frame rate */
frame1=1                      /* first frame */
frame2=nframes-4       /* final frame,  rounded down for safety */
timescale = 1                /* amount of stretching */
warp = 0                       /* formant warp,  -1 to +1 */
transp =-.01                /* transposition in semitones, .01 = 1 semitone,
                                         -.12 = -12  semitones etc
                                        absolute value must be < 1*/
start = 0
dur=timescale*(frame2-frame1+1)/fps    /* scale dur according to number of frames */
amp=1

/*    p0=start,p1=dur,p2=8ve.pch,p3=frame1, p4=frame2,p5=amp,p6=warp   */

lpcplay(start,dur,transp, frame1,frame2,amp,warp)
==============================================

If you would like to use a soundfile as the excitation rather than the built-in buzz/rand combination

1) add an input() statement before the output() statement
2) use lpcin() and specify an inputskip in place of the transp argument.
e.g.
lpcin(start,dur,inskip,frame1,frame2,amp,warp)


3) Fancier features

If you specify a value for transp which is > 1, it will be interpreted as a pitch, in 8ve.pc form.  Then the pitch contours of the voice will be centered around  that pitch, using some weighted average formula I figured out once, and forgot.  You can also expand or flatten the contours of the voice by using the setdev(scale) command before calling lpcplay().  The a large value for scale (> 70) will expand the contours while a small value (.001) will return a flat pitch.  (setdev() will tell you what the actual deviation for the speech sample is.)

You can also draw your own contours.  Pfields 9 -> contain pairs of [frame,#transposition] arguments which will draw a curve. (Specify 0 for p7 and p8).  Thus

lpcplay(start,dur,transp, frame1,frame2,amp,warp,0,0,  frame1,-.12,  frame1+200,.12,  frame2,.11)

will create a contour in which the voice starts out an octave lower, goes up an octave after 200 frames and settles down to an 11 semitone transposition at the end.


4) Doctoring your analysis

The key to good synthesis is a good pitch analysis, and voiced/unvoiced values which are consistent.  A program called lpcplot will allow you to doctor the details of an lpc data set, changing, smoothing and replacing values for pitches, and voiced unvoiced flags.  All changes are made as soon as you enter
them.  There is no undo, so make a copy of your lpc data set before you begin.  Simply type lpcplot to get instructions.


5) Recording soundfiles
 

There is a cassette deck attached to the rightmost machine in the 2nd row in the lab, called nervi.

The easiest way to record sound is to use the program capture.  Invoke it from a commandline.

When you get the capture window, click on the icon in the lower left hand corner, and drag to the microphone. It will then turn into an audio capture window.
Next go to the Actions menu and open the Settings panel. Here set the filename you want to write to and the number of channels (I recommend mono). You
can also set the input sampling rate to 22k (or you can convert it to 22k later on.) It is recommended that you write to the /tmp directory of the
computer you are using and then copy the file to an appropriate place. Then go to the tools menu and open the audio panel. Here set the input to
line (if you are using the cassette deck) or microphone (if you are using the sgi mic.)  I recommend recording some clean speech on a good cassette in a
quiet room and using the cassette deck in the lab to play the sounds.

Now just hit record on the sound window and start up the cassette deck.

In case nervi is being used the cassette deck has a 10 foot chord, and it can therefore be attached to neighboring machines.  The machine to the left of nervi is
running IRIX 6.5, however, so the sequence is a bit different.  I recommend using a commandline program called sfrecord.  Then, simply go to the audio panel,
and click in the 'Analog In' window.  Having done that , go to the Selected menu, and choose the input source.

All of the machines seem to have microphones attached to them.  You can probably get something reasonable from these.  Just set the analog input to
microphone and go the the other steps as described.