Heterogeneous Visual Codebook Integration via Consensus Clustering for Visual Categorization

Overview

Most recent category-level object and activity recognition systems work with visual words, \ie vector-quantized local descriptors. These visual vocabularies are usually built by using a local feature, such as SIFT, and a single clustering algorithm, such as $K$-means. However, very different clusterings algorithms are at our disposal, each of them discovering different structures in the data. In this paper, we explore how to combine these heterogeneous codebooks and introduce a novel approach for their integration via consensus clustering. Considering each visual vocabulary as one modal, we propose the Visual Word Aggregation (VWA) methodology, to learn a common codebook, where: the stability of the visual vocabulary construction process is increased, the size of the codebook is determined in an unsupervised integration, and more discriminative representations are obtained. With the aim of obtaining contextual visual words, we also incorporate the spatial neighboring relation between the local descriptors into the VWA process: the Contextual-VWA (C-VWA) approach. We integrate over-segmentation algorithms and spatial grids into the aggregation process to obtain a visual vocabulary that narrows the semantic gap between visual words and visual concepts. We show how the proposed codebooks perform in recognizing objects and scenes on very challenging datasets. Compared with unimodal visual codebook construction approaches, our multi-modal approach always achieves superior performances.

Downloads

Software

This section contains the Matlab and C codes implementing all the aggregation techniques described in the paper. Download - Last update (1-March-2013).

Paper

Heterogeneous Visual Codebook Integration via Consensus Clustering for Visual Categorization, R. J. López-Sastre, J. Renes-Olalla, P. Gil-Jiménez, S. Maldonado-Bascón, S. Lafuente-Arroyo. IEEE TCSVT, 2013.

Citing

If you make use of this software, please cite the following reference in any publication:

@ARTICLE{lopez2013tcsvt,
  author = {Lopez-Sastre, R.~J. and Renes-Olalla, J. and Gil-Jimenez, P. and Maldonado-Bascon, S. and Lafuente-Arroyo, S.},
  title = {Heterogeneous Visual Codebook Integration via Consensus Clustering for Visual Categorization},
  journal = {IEEE Transaction on Circuits and Systems for Video Technology},
  year = {2013},
  volume = {23},
  pages = {1358--1368}
}