PVC

INTRODUCTION

INSTALLATION
UNPACKING
SETTING THE C++ COMMENT STYLE
SETTING THE DESTINATION DIRECTORY
COMPILING THE BINARIES
UNIX COMMAND
INFORMATION PAGE
SETTING FLAG VALUES
CONTROLLING PARAMETERS WITH FUNCTIONS
RUNNING COMMANDS WITH SHELL SCRIPTS
ROUTINES: SHORT DESCRIPTIONS
PLAINPV
TWARP
NOISEFILTER
COMPANDER
BANDAMP
HARMONIZER
CHORDMAPPER
FILTER
FREQRESPONSE
CHORDRESPONSEMAKER
PVANALYSIS
TVFILTER
CONVOLVER
RING
RINGFILTER
RINGTVFILTER
FILTDEVIATOR
ENVELOPE
RESHAPE
TERMS AND COMMON FEATURES
OVERLAP/ADD METHOD VS. OSCILLATOR BANK METHOD AND RESYNTHESIS THRESHOLDS
SOURCE
MULTIPLE CHANNELS
PLAYBACK DURING PROCESSING
INPUT SOUND FILE
OUTPUT SOUND FILE
OUTPUT STATISTICS
ANALYSIS FILES
DECIBELS
LOW/HI SHELF EQUALIZATION
WARP INDEX
PITCH TRANSPOSITION
FREQUENCY SHIFT
ENVELOPE RESPONSE TIME
RING DECAY TIME
FFT SIZE
WINDOW SIZE
FRAMES PER SECOND
TIME EXPANSION/CONTRACTION
BEGIN/END TIMES
GAIN
FILTERING: SOURCE SIGNAL LEVEL
TRANSPOSITION/SHIFT APPLICATION FLAG
FILTER TYPES: PASS OR REJECT
RESPONSE FUNCTION SMOOTHING
ANALYSIS DATA ACCESS MODE
CONVOLVER PANPOT
FREQUENCY RESPONSE ACCUMULATION METHOD
RING ROUTINES: FILTER PLACEMENT
COMPRESSION AND EXPANSION

USING THE SHELL SCRIPTS

SAMPLE SCRIPT: S.PLAINPV
GEN FUNCTION CONTROL OF PARAMETERS
SAMPLE OUTPUT: S.PLAINPV

INTRODUCTION

PVC is a collection of phase vocoder signal processing routines and accompanying shell scripts for use in the transformation and manipulation of sounds. It is written in C and designed to be used in a UNIX environment. It has come about as a result of my path of education and research into phase vocoder technology. It follows in the spirit of the work by Eric Lyon, out of which PVC is built, and Chris Penrose whose particular dsp research springs from the coding and tutorial work of F.R. Moore and Marl Dolson. Moore's book, Elements of Computer Music, published by Prentice Hall, is therefore a great resource for making sense of the phase vocoder engine behind these routines which I am unable to go into here. I have, however, attempted to offer some explanation of the parametric controls I have set up which you will find below.

The routines of this release reflect my need for tools which can perform different spectral resynthesis tasks. These range from the simple to the experimental, and continue to grow as my skill and curiousity increase. Most can be viewed in terms of traditional additive or subtractive synthesis tasks, coming about as a result of my interest in the potentials of the phase vocoder to perform these tasks with greater finese and control. Some are more idiosyncractic as a result of speculative ideas about sound manipulation and require experiment and patience if not explanation which may not yet be available. All require a good ear tuned towards sound and idea as none of these routines are automatic, although many hold great potential for the diligent.

This first, 1.0 release contains only those routines which I think are stable, useful and moderately transparent. There are other, more experimental routines still proving themselves which I hope will appear subsequently. Someday I will deal with AIFF headers, and with time, approach a better interface structure, but for now, this is it.

Paul Koonce
koonce@music.princeton.edu

INSTALLATION

The PVC package contains both my PVC routines and the CMUSIC gen functions written by F.R. Moore, included here in a separate directory. Moore's standalone functions are useful tools for creating files to control dynamic parameters. The gen functions included are cspline, gen0, gen1, gen2, gen3, gen4, gen5, gen6, and genraw. Without arguments, the commands produce a one-line summary. A detailed explanation of each can be found in Moore's Elements of Computer Music.

You can compile and install the PVC and gen function routines separately or together following unpacking and attention to a few small issues.

1) Unpack:

First move PVC.tar.gz to the directory of your choice. Unzip it with gunzip.

gunzip PVC.tar.gz

Then, unarchive it with tar.

tar xvf PVC.tar

This will produce a PVC directory in which you will find several other directories. Before you can compile the routines, you must address the compiler comment style and destination directory.

2) Set C++ Comment style:

While the code is written in C, the comments are often in C++ style ( i.e. //), so you will need a C compiler capable of accomodating this. The C compiler on the SGI is set up to handle the C++ comments with the -Xcpluscomm flag. If your compiler does the same, you will be fine. Otherwise you will need to do the following.

Change to the PVC source code directory /PVC/PVC_SRC and open the Makefile in a text editor. In it the line

CFLAGS = -g -O -Xcpluscomm

specifies flags for the compiler the last of which is the C++ comment style. Change it to conform to the flags language of your compiler.

3) Set Destination Directory:

Next you will need to set the destination directory in the Makefile located in the /PVC directory. This is the master makefile for both the PVC and gen routines. Change the directory specified by:

DESTDIR = /here/there/everywhere?

to the directory in which the routines should be installed.

4) Compile:

To compile and install the PVC and gen function routines both, type:

make

which if successful should be followed with:

make install

To compile and install only the PVC routines, type the following.

make PVC

make install

And to compile and install only the gen routines, type the following.

make GEN

make install

In all cases the make install moves the compiled routines from the /PVC/bin directory to your specified destination directory. If the directory is in your .cshrc path, and you have sourced your shell file, as in:

source .cshrc

you should be able to type any of the routines and see their flag information page. Try typing:

plainpv

for example.

UNIX COMMAND-LINE FORMAT

The routines are UNIX, command-line routines in the form of:

routine [flags] input_soundfile output_soundfile

At present the only soundfile format excepted (both input and output) is NEXT/SUN 16-bit short files. All processing is done in floats. Control is provided through flags which allow you to specify parameter constants or files. Wherever possible, parameters are initialized to default values.

INFORMATION PAGE

Information about any routine can be seen by typing the name of the routine without any arguments/files. Typing:

plainpv

produces the following information about plainpv.

plainpv:  generic phase vocoder with dynamic controls  
plainpv   [flags] [input file (16-bit shorts)] [output file (optional)]
            (values in brackets denote defaults)
        N:      FFT length (must be a power of 2) [1024]
        M:      window size in samples (must be a power of 2) [2*FFT]
                    (0 will automatically set window to 2*FFT size or larger)
        D:      analysis frames per second [200]
        I:      time expansion/contraction factor  [1.] 
                  (duration = duration * factor, 1. = original time) 
        P:      pitch transposition in semitones (func) [0]
        a:      frequency shift factor 
                    (bin frequency adder, before -P )(func) [0.] 
        b:      begin time in seconds  [0.] 
        e:      end time in seconds (0. = end of file) [0.] 
        C:      resynthesis channel (1 -> ?) (0 = all) [0] 
             SHELF EQ:(post transpose/shift)
        H:      SHELF EQ: Low shelf gain in dB [0.] 
        X:      SHELF EQ: High shelf gain in dB [0.] 
        m:      SHELF EQ: Low shelf frequency in Hz [200.] 
        R:      SHELF EQ: High shelf frequency in Hz [2000.] 
        W:      warp index for reshaping magnitude response (func) [0.] 
                    values > 0 close down response, < 0 open it up
        A:      gain in decibels (func)[0.] 
        l:      envelope attack time  (func) [0.]
        L:      envelope release time   (func) [0.]
        T:      BRICKWALL FILTER TYPE: 0 = bandpass, not 0 = band reject [0]
        f:      frequency window: low boundary  
                    (before -P and -a) (in Hz) [0.] 
        F:      frequency window: high boundary 
                    (before -P and -a)(in Hz) [Nyquist frequency] 
        p:      amplitude reports print mode: 0 = off, 1 = on [0]
        i:      time interval between amplitude reports [.25]
        S:      TERMINAL DISPLAY: display option  [0]
                  (0 = off,  1 = phase data,  2 = amp data, 3 = both)
        n:      TERMINAL DISPLAY: number of frames  [0]
        u:      TERMINAL DISPLAY: low bin  [0]
        U:      TERMINAL DISPLAY: high bin 
                    (-1 = nyquist) [Nyquist frequency]
        t:      oscillator resynthesis threshold in decibels [ -96 ]

SETTING FLAG VALUES

If no output file is specifed, the name pv.out.snd will be used in the local directory. The bracketed values at the end of each parameter represent the default which can be changed by specifying the flag preceeded by a minus sign and followed by the new value, with no spaces on either side. For example, the following:

plainpv -N2048 inputfile outputfile

would change the FFT size to 2048. Some flags require files rather than constants. For these simply supply the full pathname of the needed file as in:

twarp -F/here/there/everywhere/analysis_file

which supplies twarp with the necessary analysis file.

CONTROLLING PARAMETERS WITH FUNCTIONS:

Parameters which have the word (func) on the info page just before the default as in:

W: warp index for reshaping magnitude response (func) [0.]

can be controlled dynamically. This is done by providing a full pathname function file in place of the constant, same as in the previous example. The file is assumed to be a headerless series of 32-bit floats representing how the parameter will evolve as a function of time. The function file can have any number of values as it is fitted to the specified duration, and linearly interpolated for continuity. Function files can be created with the CMUSIC gen routines provided with this package. (See INSTALLATION.)

RUNNING THE COMMANDS WITH SHELL SCRIPTS:

While all routines can be run at the commandline, they are most easily run using the shell scripts found in the SCRIPTS directory. These scripts are useful for saving and managing the parameters, and provide what is in many ways a poor-man's GUI. All scripts contain a top section for setting variables, and a bottom section where those variables are placed into the commandline flag structure and run. Some scripts perform short analysis routines before the main synthesis routine. The variables for both are set in the top section. Take note that shell script variable assignments do not allow for spaces. The numerous parameters, which in some routines run as high as 53, make these scripts a necessity. They will be your friend if you take care to leave the bottom part alone, and don't corrupt your variable names. Someday I will make a better way to interface with the routines; for now this is the way it is.

Run the scripts using sh as in the following.

sh S.plainpv

(See the explanation below about using shell scripts. )

ROUTINES: SHORT DESCRIPTIONS

Below is a listing of the routines contained in this release along with a description of what each does.

PLAINPV:

Plainpv is a basic phase vocoder with control of pitch transposition, frequency shift, time scale, and amplitude warp with output shelf equalization. It's also has some nice controls for looking at the data produced by the phase vocoder. Run this routine with S.plainpv.

TWARP:

Twarp is like plainpv except that it works from an analysis file rather than a soundfile. This allows you to move forwards/backwards through time according to a time function file. Use pvanalysis through the script S.pvanalysis to make the analysis file, followed by the script S.twarp.

NOISEFILTER:

Noisefilter filters out a frequency response. You must provide a frequency response representing the spectrum you wish to remove. Use freqresponse to construct an average or peak frequency response of a short noise section which will be used as the input file to noisefilter. The entire analysis/filtering process can be run using the script S.noisefilter.

COMPANDER:

Compander is a classic compressor/expander. What is different here is the use of the peaks response file. The peaks response file is a frequency response which specifies the amplitude which will function as the 0 dB reference point for each frequency. Each frequency is companded relative to this reference. The entire analysis/companding process can be run using the script S.compander.

BANDAMP:

Bandamp is an amplitude windowing routine. Like compander, it uses a response file to orient where 0 dB lies for each frequency. Using this reference it gives you a window of amplitudes. While bandamp can be used to select only the stronger amplitudes to produce a result similiar to noise filtering or expansion, its real use is for zeroing in on the weaker amplitudes by windowing out the stronger. Setting a window range of -20 to -96 will do this. Wispy violin notes windowed this way will be reduced to their noise in a kind of unvoiced mode. Bandamp is difficult to make sound good, but effective when it does. The entire analysis/windowing process can be run using the script S.bandamp.

HARMONIZER, CHORDMAPPER, AND INHARMONATOR:

These routines all allow for a kind additive synthesis remapping of phase vocoder data according to some model. Each requires an ascii data file specifying how phase vocoder information will be replicated or mapped. This mapping is constant for the run of the routine.

HARMONIZER:

Harmonizer works much like a commercial harmonizer in that it allows you to create harmony against the source by adding a transposed copy of it. Here the concept is extended by allowing for multiple harmonizations, each taken from a different band of frequencies, output with seperate gain. Run this using the script S.harmonizer.

CHORDMAPPER:

Chordmapper lets you specify how harmonically related phase vocoder bins will be replicated or mapped to produce chords. The data file specifies how the harmonic structure of any number of tones will be defined, limited and then mapped. This routine is useful for building up chords from single tones or even pulling out tones from textures where a little grit doesn't bother you. Run this using the script S.chordmapper.

FILTER:

Filter is a very useful routine for filtering a sound by a frequency response arrived at through either synthesis or analysis. Filtering is achieved by first creating the frequency response with a selected routine, followed by filtering with filter. The frequency response can be created synthetically using chordresponsemaker which models a spectrum as a collection of harmonic tones, or with freqresponse which analyzes a sound file segment and constructs a response representing the peak or average amplitudes. Once made, the magnitudes of the FFT response are multiplied against the time varying magnitudes of the input sound's FFT. In addition, filter allows control of the source/filter mix, response shape warp, and response transposition/shift, making this a very useful tool for quickly manipulating the spectral characteristics of a sound according to your synthetic or analytic goals. The synthetic form can be run with the script S.filter_with_chord_synthesis, and the analytic form with script S.filter_with_analysis. The analytic form is a remarkable tool for bringing the color of one sound into the realm of another, particularly so with the controls which allow you to manipulate the presence, EQ, character, and position of the response.

FREQRESPONSE:

Freqresponse is a routine used by several others to prepare an average or peak spectrum for use in filtering, compression or limiting. The response can normalized or not depending on the needs of the routine which will use the response.

CHORDRESPONSEMAKER:

Chordresponsemaker is a routine used by several others to create synthetic frequency responses. The response can normalized or not depending on the needs of the routine which will use the response.

PVANALYSIS:

Pvanalysis is the time varying form of freqresponse in that it performs and saves a phase vocoder analysis for subsequent use by other routines. The routines which require pvanalysis files are twarp, convolver, tvfilter, and ringtvfilter. Run this using the script S.pvanalysis.

TVFILTER:

Tvfilter is the time-varying (tv) form of filter. Tvfilter uses a pvanalysis file to change the magnitudes of the input sound file. As it is with filter, tvfilter multiplies the magnitudes of the analysis FFT against the magnitudes of the input sound's FFT, while preserving the frequency/phase characteristics of the input sound. Preserving the phase of the input sound file results in a cross-synthesis which sounds like the input sound file with the shadow of the analysis file characteristic imposed upon it. Like filter, tvfilter offers a variety of controls for manipulating the filter characteristic including the source/filter mix, response shape warp, and response transpostion/shift. The use of a phase vocoder analysis to represent the filter characteristic also makes possible temporal control of the filter file as found with twarp. Run this using the script S.twarp.

CONVOLVER:

Convolver is setup and controlled the same as tvfilter. It's processing is slightly different in that the FFT of analysis file and input sound file are multiplied in Cartesian/imaginary form. Unlike tvfilter which produces a shadowlike cross-synthetic intersection, shadowing the analysis file characteristic onto the input sound file, convolver creates a true spectral intersection, allowing only that which is common to both to pass. The effect is a sound which is somewhat garbly as it outputs the more intermittently common spectral components of the two. The form of the multiplication in convolver does not allow some of the filter transposition controls associated with tvfilter. There is however a convolution panpot which offers control of the mix between the convolution and source sounds. Run this using the script S.convolver.

RING:

Ring uses the phase vocoder structure to create an all-pass resonator, effectively allowing the time varying components of a sound to be resonated wherever they may be. The sound here is much like comb filter feedback resonance. The difference is that the spectral resonance is not created through the selection of a collection of comb filter frequencies, but by the input sound itself in a kind of self resonance. Ring is a nice way of increasing the resonant pitch characteristics of a sound, although it has its weaknesses. Ring works best with larger FFT sizes as it is attempting to synthesize or accentuate the more pitched/harmonic characteristics of the sound which larger FFTs, with their increased frequency resolution, handle better. In adition, there is a threshold for preventing the noise features of a sound from being resonated, and an EQ which can be positioned to filter either the source input to the feedback loop, or the feedback return where it has the effect of inducing variable rates of decay depending on how the EQ is set. Run this using the script S.ring.

RINGFILTER:

Ringfilter marries the routine filter with ring by allowing a filter frequency response to be imposed on the resonance. Ringfilter begins to look more like multiple-delay, comb filter resonance as the static filter selects which frequencies will ring. What is unique here is that the spectral characteristic can come from analysis, allowing a sound to be resonated by the average spectral characteristic of another sound. Like the EQ in ring, the filter in ringfilter can be positioned to either filter the source input to the feedback loop, or the feedback return where it will have the effect of introducing the filter characteristic more slowly through the resulting variable rates of decay. Run ringfilter with S.ringfilter_with_chord_synthesis to create a synthetic filter, and with S.ringfilter_with_analysis for an analyzed filter response.

RINGTVFILTER:

Ringtvfilter is to ringfilter what tvfilter is to filter; that is, it makes the filter in ringfilter time-varying. This is a sophisticated idea, that is ,the time-varying filtering of a time-varying sound. Ringtvfilter requires some thought and finese in order to separate and articulate the evolutions of the source, resonance and filter. Like tvfilter, ringtvfilter requires an analysis file. Run this routine using S.ringtvfilter.

FILTDEVIATOR:

The idea behind filtdeviator is to use a frequency response function to not only filter a sound (as with filter), but to to create a topology of frequency deviation working in correlation with the filter. Consequently, filtdeviator is filter with added parameters for specifying how the filter frequency response function will be mapped into the deviation of frequency. The added parameters set the base and peak deviation for the how the response will be mapped into both pitch transposition and frequency shift, and how the function will be warped within the range set by these limits. Their is also a master (0-1) deviation control for globally controlling the deviation. All the controls of filtdeviator allow you to dynamically vary the presence and effect of amplitude filtering and frequency deviation, making filtdeviator an interesting routine for exploring the way filters can be used to impede/transform the resonant signature of a sound. The result is always somewhere between the frequency deviating effects of phase shifters to the floppy resonant behavior of slide whistles. The scripts to run filtdeviator, S.filtdeviator_with_ chord_synthesis and S.filtdeviator_with_analysis, are designed with frequency response synthesis/analysis sections like those for filter and ringfilter.

ENVELOPE:

Envelope is a routine for tracking the amplitude envelope of a sound. Output can be ascii, floats or a NeXT soundfile. Selecting floats will produce a file suitable for control of a parameter.

RESHAPE:

Reshape is a routine for transforming function streams to meet the needs of different parameters.

TERMS AND COMMON FEATURES

Listed and explained below are various terms, parameters, or ways of doing things which are common to many of the routines.

OVERLAP/ADD VS. OSCILLATOR BANK METHODS AND RESYNTHESIS THRESHOLDS:

The phase vocoder resynthesizes the signal using one of two methods, depending on the type of changes made to the FFT. If the changes are only to the magnitudes (amplitudes), then the faster overlap/add method is used. If however changes in frequency are made, then the FFT integrity is compromised, necessitating use of the oscillator bank method in which each bin is synthesized as a sine wave. This method is slower, although a resynthesis threshold is available which can be used to increase the computation speed by turning off bins whose amplitude falls below the threshold. Thresholds of -60dB or lower are appropriate.

SOURCE:

The source sound is the original input sound. Some routines allow for the mix of the processed sound with the original source sound.

MULTIPLE CHANNELS:

All routines allow for the processing of both monophonic or multiple channel input files. With multiple channels you can either select one channel and produce a monophonic output file, or process all the channels. Channels are numbered beginning with 1. Processing of multiple channel files is done one channel at a time beginning with channel 1, with zeros written to channels which have yet to be processed. Prcessing one channel at a time requires less memory and allows you to audition the output sooner than if you did all channels at once.

INPUT SOUND FILE

The input sound file must be a NeXT/Sun 16-bit shorts file of one or more channels.

OUTPUT SOUND FILE

The output sound file is written as a NeXT/Sun 16-bit shorts file of one or more channels, one channel at a time beginning with the first channel. The first pass writes zeros in the channels yet to be processed.

PLAYBACK DURING PROCESSING:

The header is periodically updated to allow for playback of the file during processing.

OUTPUT STATISTICS:

Two flags are provided for controlling the output amplitude statistics; one turns the statistics on or off, and the other sets how often they will be reported. The statistics provide the peak output in amplitude and decibels at some regular interval. If the output is greater than an amplitude of 1. (0 dB), the output is clipped to a value of 1.0, and the statistics placed in clip mode in which reports are only of frames where clipping occurs. The peak amplitude, its time, and the number of clipped samples are reported at the end of processing.

FREQUENCY RESPONSE: OUTPUT TO TERMINAL

In many filtering or companding routines, a crude terminal print of the frequency response is a available through a flag toggle. With a value of 1, this flag prints out the response to the terminal. Values greater than 1 employ the flag as a specification of the hi cutoff frequency for the response report. For example:

filter -P2000

would print the response from 0 to 2000 Hz. 0 turns printing off.

ANALYSIS FILES:

Analysis files are binary, float-valued files written by pvanalysis, containing frames of FFT analysis data for one or more channels. Analysis file data is preceeded by a header containing information about the analysis. Analysis files are larger than the sound files they represent, and increase in proportion to the FFT size used. As such, files can become very large, so it is advisable to only make them when needed unless you have disk space to spare.

DECIBELS:

Amplitude is always handled in decibel units. The 16-bit short integer representing its greatest magnitude value is equated with an amplitude of 1.0, which is 0 dB. 0 dB functions as unity gain, and the peak amplitude in issues of compression, expansion, and amplitude windowing. A change of +/- 6 dB represents a doubling or halving of the amplitude. Increments of 10 dB are loosely associated with one change in dynamic level. 16-bit shorts allow for a 96 dB dynamic range.

LOW/HI SHELF EQUALIZATION:

Equalization has been provided at various points in routines to allow for the needed adjustment of spectra. The EQ consists of low and hi shelf segments, whose width is adjusted through control of the shelf breakpoint frequency. The region between the shelf segments is represented by a linear decibel gradient between the decibel levels of the two shelves. Some routines implement the EQ before pitch changes, others after. EQ placed before pitch changes (pre-transpose/shift) will cause the EQ to be transposed with the pitch changes, whereas afterwards (post-transpose/shift) will keep them fixed as shifts and transpositions occur.

WARP INDEX:

Many of the routines employ the principle of warping in which a distribution of values or functional characteristic is transformed by an identity function. In these places an exponential function is employed to remap a 0-1 range of values into a new orientation which preserves the minima (0) and maxima (1) while bringing the distribution closer to either extreme as a result of the curvature of the exponential function selected. The curvature of the exponential function is selected through a warp index. Specifically, warp index w will reorient the input x through the function below (^ = exponentiation).

y = (1. - (e^(x * w))) / (1. - (e^w))

In this function, the warp index of 0 produces a linear function and an untransformed output. Positive values of increasing magnitudes produce curves of increasing concavity (increasing slope) which draw values towards the 0-valued minima, and reduce the function integral. Negative values do the opposite, drawing values towards the maxima of 1, increasing the integral.

The practical uses of this mechanism are found in various places. One such place is the reshaping of frequency response distribution characteristics. In this, positive values cause the peaks to be accented while the weaker frequencies to be expanded out by the pushing of their values towards 0. Negative values have the opposite effect, compressing the dynamic range of the response which then draws out the weaker noise components. Another case where warp applies is in the remapping of FFT amplitudes. In this, the sucessive FFT frames have their amplitudes remapped by the identity function, similiarly expanding or compressing by the dynamic range depending upon the warp specified.

PITCH TRANSPOSITION:

Pitch transposition represents the linear translation of a spectrum through multiplication of all its frequency components by a constant. This is classic transposition, here specified in semitones where 12 semitones equal an octave. Conversion is made to produce the appropriate frequency multiplier.

FREQUENCY SHIFT:

Frequency shift is the nonlinear translation of a spectrum through addition of a constant to all its frequency components. This is related to things like ring modulation in which spectra are compressed, stretched or shrunk by the nonlinear transpositional effect of addition upon a collection of frequency components. Use this to create small distortions of the harmonic integrity of a sound.

ENVELOPE RESPONSE TIME:

The rate at which amplitude changes are allowed to occur is a factor in how smooth the spectral evolution of a sound may be. To control this, many routines contain attack and decay response times which have been setup to indirectly manipulate the coefficients of the following filter.

y(n) = (1. - A) * x(n) + A * y(n)

This is a lowpass filter which increasingly smooths out the sudden changes in a signal as the value of the coefficient, A, is increased. Its control is through the response time parameter which is the time in seconds it takes a signal shifting from one state to another to decay to -60 dB of its former state. Specified response times are transformed to create the necessary coefficients for the selected frame rate. The response time is separated into attack and decay which simply applies different coefficients to the smoothing depending upon whether the signal is increasing or decreasing in amplitude.

RING DECAY TIME:

Decay time is an issue in the feedback of the ring routines. Like response time, it is the time it takes the signal to decay to -60dB of its former state, or better, the time it takes the reverb to decay to -60dB.

FFT SIZE:

The FFT size must be a power of 2. Larger FFT sizes resolve frequencies better but transient behavior more poorly. Choose your FFT size according to the sound you are working with. A size of 1024 or 2048 works well in most cases.

WINDOW SIZE:

The window size is a less opaque parameter which like the FFT must be a power of 2. Windows which are twice the size of the FFT work well. Larger window sizes may resolve frequencies better. Specifying 0 for the window size will automatically set the window to twice the FFT size.

FRAMES PER SECOND:

This controls how often the phase vocoder will perform an analysis on the signal. It is a translation of the classic decimation control which specifies how many samples to skip between analysis frames. More frames increases the resolution of time but decrease speed. 200 frames per second is a good reference point. If you expand time you might want to increase this proportionately.

TIME EXPANSION/CONTRACTION:

Once the spectral modifications are made to the FFT analysis, an inverse FFT is invoked to produce the samples of a time-domain signal. The classic phase vocoder paradigm controls the number of samples through the interpolation value and its relation to the decimation. The arcane relationship of decimation and interpolation is here translated into the parameter of time expansion/contraction which scales time accordingly; with values greater than 1 expanding time, less than 1 contracting it.

BEGIN/END TIMES:

Processing may be performed on an entire file or a segment of it by specifying begin and end times. End times less than or equal to 0 default to the end of the input file.

GAIN:

The output and other components can be gained. 0 dB represents unity gain, no change. See decibels.

FILTERING: SOURCE SIGNAL LEVEL

The mix of source and filtered signals in the filter routines can be controlled by the source decibels floor. This value, taken from the -96 to 0 dB range, specifies the level of the source signal. The filtered signal level is (1 - source amplitude). Consequently, the source level functions as a floor above which lies the filtered signal. A source floor of 0 dB would neutralize filtering since there would be no filter range above the floor, a floor of -96 dB would produce the full effect of the filter.

TRANSPOSITION/SHIFT APPLICATION FLAG

Filter routines which allow for transposition and frequency shifting of both the filter and source have a flag which specifies whether the transposition/shift should be applied before or after filtering. If it is applied before, it's trajectory evolves independently of the filter's transpositional trajectory. If it is applied after, then the source trajectory will be added to the filter trajectory, the net effect being that the filter moves in parallel with the source movements plus any which its own trajectory adds.

FILTER TYPES: PASS OR REJECT

Filters can be toggled to use frequency responses in pass or rejection mode. In pass mode, the greater response's magnitudes the more source which will pass or be maintained. In rejection mode, the greater the response's magnitudes the more source which will impeded or rejected from the spectrum. In rejection mode, the response is created by first converting to a decibel scale, inverting, and then converting back into amplitude. In time-varying filtering (tvfilter), rejection can be in mode 1 in which the response is inverted against a constant 0 dB peak, or in mode 2 in which the response is inverted against the frame's peak amp. Spectral warping is always applied after the response has been transformed by rejection.

RESPONSE FUNCTION SMOOTHING

Routines, such as bandamp and compander, which use frequency response files to set amplitude reference points have a control which allows the response to be smoothed. The smoothing is produced by replacing the magnitude of a frequency bin with an average taken from a band which has the bin's frequency as its center. The degree of smoothing is controlled through manipulation of the bandwidth value, specified through the flag in octave units. Larger bandwidths produce greater degrees of smoothness, 0 turns smoothing off.

ANALYSIS DATA: ACCESS MODES

Routines, such as twarp, convolver, tvfilter, and ringtvfilter, which use analysis data made with pvanalysis all access the data the same using the time point, rate, and data window boundary parameters in either rate or explicit mode. In rate mode, the time point control intializes a time pointer, with the rate then controlling the speed of movement through the time of the file. The rate may be positive (forward in time) or negative (backwards in time) and vary according to a function. Explicit mode uses only the constant or functionally controlled time point parameter to explicitly specify where in the time of the analysis file the data should be obtained. Both rate and explicit modes abide by the upper and lower data window boundaries which delimit the data range. When the time pointer moves beyond the specified upper and lower time boundaries, it re-enters the window from the other end, making the window into a circular/modular structure. The boundaries can be controlled with functions as well giving this mode an expressive dimension far surpassing the time expansion/contraction parameter.

CONVOLVER PANPOT

The convolver routine has a unique panpot mechanism for controlling the mix of sounds A or B with the convolution of A and B. The panpot is a crossfade mechanism with a -1 to 1 control range in which -1 corresponds to sound A, 1 to sound B, and 0 to the convolution of A and B. Values between these points produce degrees of crossfade mixing between sound A or B and the convolution. A trajectory from -1 to 1, for example, would crossfade from A to the convolution to B. Separate gain controls on A, B and the convolution make it possible to tune the continuity of this trajectory. As well the presence or spread of the convolution into this crossfade trajectory can be tuned with the domain warp controls. The warp reshapes the movement through the crossfade range, allowing you to create a more gradual approach from A or B into the convolution center. This is achieved through a simple nonlinearizing of the crossfade domain in warp index style. Increasingly positive domain warp values, specified independently for each side, transform the linear trajectory towards the convolution into a decellerating one, causing the subtle mix area around 0 to be expanded.

FREQUENCY RESPONSE ACCUMULATION METHOD

Several routines which produce frequency responses by analytical or synthetic means have the option of accumulating the response by peak or average means. Whereas peak responses represent the threshold record of a sound or synthesis specification's highest values, average responses represent the most common characteristics.

RING ROUTINES: FILTER PLACEMENT

Ringfilter and ringtvfilter use frequency response functions to filter the reverb. In these, the filter can be placed on the source feeding forward or the component feeding ba ck. When placed so as to filter the input to the feedback, it simply filters the signal before it enters the feedback mechanism, imposing its characteristic on the the feedback from the start. However, when placed so as to filter the feedback component, the appearance of the spectral characteristic in the reverb appears gradually as the signal decays. In this mode, the time it takes the signal to decay into the response characteristic is controlled by the decay time.

COMPRESSION AND EXPANSION

Along with the routines which offer signal compression and expansion through spectral warping (see warp), are those which employ the more traditional method using thresholds and magnitudes of compression/expansion. While compander is the most obvious example, traditional compression can be found in other routines such as tvfilter. In all of these, the dynamic range is assumed to lie between 0 and -96 dB, with thresholds lying within this range. The degree of compression or expansion is expressed in decibels representing how much the signal lying beyond the threshold will be reduced. A value of -6 dB would produce 2 to 1 compression or expansion, depending upon where it was being used. Compander implements compression for each frequency bin separately rather than as a macro gain change. It does this by using a frequency response file, created with freqresponse, to establish a 0 dB point of reference, with every frequency bin then being compressed or expanded in relation to this reference point. Contained in this is not only the potential to compand frequencies individually, but to actually use a sound file or pink noise as a reference against which a sound is companded. When used this way, the compression threshold should be set around 0 dB followed by tasteful adjustment of the decibels of compression.

USING THE SHELL SCRIPTS

SAMPLE SCRIPT: S.plainpv

Below is a copy of the complete script for running plainpv which you can examine here to understand the basic structure of this shell script mechanism. In it you find a top section for variables and a bottom section for execution of the plainpv command. Set the variables of the top section with the appropriate files and constants; do not use spaces. Then run the command as input to the shell command as in:

sh S.plainpv

The shell command will copy the value of the assigned variables into the respective flag positions and run the command as if you had typed it out at the prompt in a shell window. Each shell script is set up to print the command to the terminal just before running it. A sample of the output follows the script below.

Gen Function Control of Parameters

Any parameter whose flag on the routine's information page has the word (func) after it can be controlled by a function file. The pitch transposition parameter in the following script has been set up to to do this through the file /tmp/ptrans. The file is assumed to contain floating point values representing the trajectory which the pitch transposition should take. To make the file, a complete CMUSIC gen command line has been inserted into the script, that is:

gen4 -L1000 0 -3 0 1 3 > /tmp/ptrans ;

Since it is positioned in the script before the plainpv flags and command, it is therefore executed before the plainpv routine. The creation of gen routine function files within the script is a convenient way of keeping track of and storing the parameter control used in conjunction with particular processing ideas.

Lines in shells can be continued onto new lines with the backslash, which comes in handy with gen functions. The above, for example, could be entered as:

gen4 -L1000 \

0 -3 0 \

1 3 \

> /tmp/ptrans ;

which would simplify our parse of it.

#******************************************************

#******************** PLAINPV ************************

#******************************************************

output_file=/S1/cm.mix.snd

input_file=/S1/mysound.snd

#******** BEGIN/END TIMES ***************************

# (-1 end time defaults to end of file)

begintime=0

endtime=-1

#*** ANALYSIS PARAMETERS *************************

FFT_length=1024

windowsize=0

frames_per_second=200

time_expansion_contraction_factor=1

#**** OUTPUT CHANNEL(S) ***************************

resynthesis_channel_1_to_max__0_all=0

# (channels are numbered from 1-maximum)

# (0 = all channels)

oscillator_resynthesis_threshold_in_dB=-80

#****** RESYNTHESIS PARAMETERS *******************

frequency_shift=0

gain_in_decibels=-0

pitch_transposition_in_semitones=/tmp/ptrans

#************ENVELOPE RESPONSE *********************

release_time_in_seconds=0

attack_time_in_seconds=0

#********** SPECTRUM WARPSHAPE ********************

spectrum_warpshape_index=-0

#*** BRICKWALL BAND OR REJECT FILTER WINDOW ****

# (-1 selects respective lowest or highest)

FILTER_TYPE__0_bandpass__1_bandreject=0

BRICKWALL_FILTER_window_low_frequency=0

BRICKWALL_FILTER_window_high_frequency=-1

#*************** LOW/HIGH SHELF EQ ********************

LOW_SHELF_EQ_gain_in_decibels=-96

LOW_SHELF_EQ_frequency=1000

HIGH_SHELF_EQ_gain_in_decibels=-0

HIGH_SHELF_EQ_frequency=1100

#******** TERMINAL DISPLAY ****************************

TERMINAL_DISPLAY_0_off__1_phase__2_magnitude__3_both=0

TERMINAL_DISPLAY_low_bin=10

TERMINAL_DISPLAY_high_bin=30

TERMINAL_DISPLAY_number_of_frames=1

#********** AMPLITUDE STATISTICS **********************

print_amplitude_statistics_0_no__1_yes=1

amplitude_statistics_time_interval=.25

gen4 -L1000 0 -3 0 1 3 > /tmp/ptrans ;

#====================================================

# COMMAND LINE SETUP -- OFFICE USE ONLY

# (DO NOT WRITE BELOW THIS LINE)

#====================================================

pvroutine=plainpv

PVFLAGS="\

-N$FFT_length \

-M$windowsize \

-D$frames_per_second \

-I$time_expansion_contraction_factor \

-a$frequency_shift \

-P$pitch_transposition_in_semitones \

-A$gain_in_decibels \

-C$resynthesis_channel_1_to_max__0_all \

-t$oscillator_resynthesis_threshold_in_dB \

-b$begintime \

-e$endtime \

-H$LOW_SHELF_EQ_gain_in_decibels \

-m$LOW_SHELF_EQ_frequency \

-X$HIGH_SHELF_EQ_gain_in_decibels \

-R$HIGH_SHELF_EQ_frequency \

-L$release_time_in_seconds \

-l$attack_time_in_seconds \

-W$spectrum_warpshape_index \

-T$FILTER_TYPE__0_bandpass__1_bandreject \

-f$BRICKWALL_FILTER_window_low_frequency \

-F$BRICKWALL_FILTER_window_high_frequency \

-S$TERMINAL_DISPLAY_0_off__1_phase__2_magnitude__3_both \

-u$TERMINAL_DISPLAY_low_bin \

-U$TERMINAL_DISPLAY_high_bin \

-n$TERMINAL_DISPLAY_number_of_frames \

-p$print_amplitude_statistics_0_no__1_yes \

-i$amplitude_statistics_time_interval \

echo "\n\n$pvroutine $PVFLAGS $input_file $output_file "

$pvroutine $PVFLAGS $input_file $output_file ;

SAMPLE OF OUTPUT FROM S.PLAINPV

Below is a sample of the output for S.plainpv with notes inserted in parenthesis.

(Below is the command. The entire run could be regenerated by typing this in a shell window. Note the -P/tmp/ptrans segment which tells plainpv to use the /tmp/ptrans file to control the pitch transposition. )

plainpv -N1024 -M0 -D200 -I1 -a0 -P/tmp/ptrans -A-0 -C0 -t-80 -b0 -e-1 -H-96 -m1000 -X-0 -R1100 -L0 -l0 -W-0 -T0 -f0 -F-1 -S0 -u10 -U30 -n1 -p1 -i.25 /S1/typewriter.snd /S1/cm.mix.snd

(The following tells us that plainpv has found the /tmp/ptrans file and that it has 1000 values which it will fit to the duration of the run. )

/tmp/ptrans has 1000 values.

(The following consists of the parameter settings for this run of plainpv. )

---------------------------------------------------------------------

============================== PLAINPV ==============================

---------------------------------------------------------------------

========================== INPUT SOUNDFILE ==========================

INPUT FILE: FILENAME = /S1/typewriter.snd

INPUT FILE: SAMPLE RATE = 44100

INPUT FILE: NUMBER OF CHANNELS = 2

INPUT FILE: DURATION = 2.882517

INPUT FILE: BEGIN TIME = 0.000000

INPUT FILE: END TIME = 2.882517

/S1/cm.mix.snd DOES NOT YET EXIST. I'LL MAKE IT.

========================== OUTPUT SOUNDFILE =========================

OUTPUT FILE: FILENAME = /S1/cm.mix.snd

OUTPUT FILE: SAMPLE RATE = 44100

OUTPUT FILE: NUMBER OF CHANNELS = 2

OUTPUT FILE: DURATION = 2.882517

======================== ANALYSIS PARAMETERS ========================

FFT SIZE = 1024

FUNDAMENTAL ANALYSIS FREQUENCY = 43.066406

WINDOW SIZE = 2048

FRAMES/SECOND = 200

TIME EXPANSION/CONTRACTION FACTOR = 1

DECIMATION SAMPLES (samples between analysis frames) = 220

INTERPOLATION SAMPLES (samples between resynthesis frames) = 220

OSCILLATOR RESYNTHESIS THRESHOLD (in dB) = -80.000000

GAIN (in dB) = 0.000

(Parameters controlled by function files represent the parameter setting by showing the range, number of values, and average.)

PITCH TRANSPOSITION (in semitones) (range): -3.000 - 3.000

(1000 values, average = 0.000 )

FREQUENCY SHIFT (in Hz) = 0.000

ENVELOPE ATTACK TIME (in seconds) = 0.000

ENVELOPE RELEASE TIME (in seconds) = 0.000

FREQUENCY WINDOW: LOW BOUNDARY = 0.000000

FREQUENCY WINDOW: HIGH BOUNDARY = 22050.000000

*............. LOW/HIGH SHELF EQ............*

LOW SHELF FREQUENCY = 1000.000000

.......... LOW SHELF DECIBELS = -96.000000

HIGH SHELF FREQUENCY = 1100.000000

.......... HIGH SHELF DECIBELS = 0.000000

*...........................................*

INPUT SPECTRUM WARPSHAPE INDEX = 0.000

(From here on, the processing takes place, one channel at a time. THe following represents the amplitude statistics which appear if turned on.)

=====================================================================

ANALYSIS: CHANNEL = 1

........USING EQ.........

.....USING OSCILLATOR BANK RESYNTHESIS


*********************************************************************
**  PEAK AMPLITUDE STATISTICS **
*********************************************************************
     TIME          PEAKAMP      DECIBELS    (LAST DECIBELS PEAK)
*********************************************************************
(  0.00 -  0.25)    0.5995        -4.444      -4.444
(  0.25 -  0.50)    0.4522        -6.893
(  0.50 -  0.75)    0.7761        -2.201      -2.201
(  0.75 -  1.00)    0.4199        -7.538
(  1.00 -  1.25)    0.6147        -4.227
(  1.25 -  1.50)    0.7136        -2.931
(  1.50 -  1.75)    0.6714        -3.460
(  1.75 -  2.00)    0.5623        -5.001
(  2.00 -  2.25)    0.4029        -7.897
(  2.25 -  2.50)    0.2372       -12.497
(  2.50 -  2.75)    0.1751       -15.132

============= PEAK AMPLITUDE ========================================
CHANNEL       TIME          PEAKAMP    DECIBELS    (CLIPPED SAMPLES)
.....................................................................
1            0.738           0.7761      -2.201
*********************************************************************


=====================================================================
ANALYSIS: CHANNEL = 2
........USING EQ.........
*********************************************************************
**  PEAK AMPLITUDE STATISTICS **
*********************************************************************
     TIME          PEAKAMP      DECIBELS    (LAST DECIBELS PEAK)
*********************************************************************
(  0.00 -  0.25)    0.2371       -12.502     -12.502
(  0.25 -  0.50)    0.2595       -11.717     -11.717
(  0.50 -  0.75)    0.3922        -8.130      -8.130
(  0.75 -  1.00)    0.2259       -12.920
(  1.00 -  1.25)    0.5250        -5.597      -5.597
(  1.25 -  1.50)    0.4354        -7.223
(  1.50 -  1.75)    0.5230        -5.630
(  1.75 -  2.00)    0.3521        -9.067
(  2.00 -  2.25)    0.3022       -10.394
(  2.25 -  2.50)    0.1330       -17.523
(  2.50 -  2.75)    0.0840       -21.512

============= PEAK AMPLITUDE ========================================
CHANNEL       TIME          PEAKAMP    DECIBELS    (CLIPPED SAMPLES)
.....................................................................
2            1.088           0.5250      -5.597
*********************************************************************


=====================================================================

                 PEAK AMPLITUDES: ALL CHANNELS
---------------------------------------------------------------------
CHANNEL       TIME          PEAKAMP    DECIBELS    (CLIPPED SAMPLES)
.....................................................................
1            0.738           0.7761      -2.201
2            1.088           0.5250      -5.597
=====================================================================


PLAINPV: RESYNTHESIS COMPLETED