Manipulation, Analysis and Retrieval Systems for Audio Signals (Thesis)
Abstract:
Digital audio and especially music collections are becoming a major
part of the average computer user experience. Large digital audio
collections of sound effects are also used by the movie and animation
industry. Research areas that utilize large audio collections
include: Auditory Display, Bioacoustics, Computer Music, Forensics,
and Music Cognition.In order to develop more sophisticated tools for interacting with
large digital audio collections, research in Computer Audition
algorithms and user interfaces is required. in this work a series
of systems for manipulating, retrieving from, and analysing large
collections of audio signals will be described. The foundation of
these systems is the design of new and the application of existing
algorithms for automatic audio content analysis. The results of
the analysis are used to build novel 2D and 3D graphical user
interfaces for browsing and interacting with audio signals and
collections. The proposed systems are based on techniques from
the fields of Signal Processing, Pattern Recognition, Information
Retrieval, Visualization and Human Computer Interaction. All the
proposed algorithms and interfaces are integrated unde MARSYAS,
a free software framework designed for rapid prototyping of
computer audition research. In most cases the proposed algorithms
have been evaluated and informed by conducting user studies.New contributions of this work to the area of Computer Audition
include: a general multifeature audio texture segmentation
methodology, feature extraction from mp3 compressed data,
automatic beat detection and analysis based on the Discrete
Wavelet Transform and musical genre classification combining
timbral, rhythmic and harmonic features. Novel graphical user
interfaces developed in this work are various tools for
browsing and visualizing large audio collections such as the
Timbregram, TimbreSpace, GenreGram, and Enhanced Sound Editor.