Quick links

Similarity Search with Multimodal Data

Report ID:
October 2011
Download Formats:


Similarity search systems are designed to help
people to organize multimedia non-text data and find valuable
information. The multimedia data intrinsically has multiple modalities
(e.g., visual and audio features from video clips) which can be
exploited to construct better search systems. Traditionally, various
integration techniques have been used to
aggregate multiple modalities. However, such algorithms do
not scale well for large datasets. As the multimedia
data grows, it is a challenge to build a search system to
handle large-scale multimodal data efficiently and provide users with
information they need.

The goal of this dissertation is to study how to effectively combine
multiple modalities to implement similarity search systems for large
datasets. I have carried out my study through three similarity search
systems each designed for different application.
Each system combines multiple modalities to help
users find desired information quickly. With VFerret system, I studied
how to combine visual features with audio features for effective
personal video search. With Image Spam Detection System, I explored
several aggregation methods to integrate
multiple image spam filters to detect image
spams. With my Product Navigation System, I studied how to combine text
search with image similarity search to help user find desired
products. This thesis has also studied a rank-based model which helps
system designers to construct more efficient large-scale multimodal
similarity search systems.

Although the general solution to using multimodal data in a similarity
search system is still unknown, this dissertation shows that it is
possible to substantially improve search accuracy and efficiency by
leveraging domain specific knowledge of multimodal
data. The VFerret system improves search accuracy from an average
precision of 0.66 to 0.79 by combining visual and audio features. The
Image Spam Detection System significantly lowers the false positive
rate from a previous result of 1% to 0.001% while maintaining comparable
detection rates by combining multiple image filters intelligently. My
Product Navigation System reduces number of user clicks by 60%
compared to traditional systems through a new method of combining text
search with image similarity search. These results support further
adoption and
study of multimodal data in similarity search system designs.

Follow us: Facebook Twitter Linkedin