research-article

Wavelet, active basis, and shape script: a tour in the sparse land

Authors:

Ying Nian WuAuthors Info & Claims

MIR '10: Proceedings of the international conference on Multimedia information retrieval

Pages 201 - 210

https://doi.org/10.1145/1743384.1743420

Published: 29 March 2010 Publication History

Abstract

Sparse coding is a key principle that underlies wavelet representation of natural images. In this paper, we explain that the effort of seeking a common wavelet sparse coding of images from the same object category leads to an active basis model, where the images share the same set of selected wavelet elements, which form a linear basis for representing the images. The selected wavelet elements are allowed to perturb their locations and orientations to account for shape deformations, so that the basis becomes active, and the active basis serves as a mathematical representation of a deformable template. We show that a recursive application of the strategy underlying the active basis model leads to a shape script model, which is a composition of shape motifs such as ellipsoids, parallel bars, angles, etc. These shape motifs are allowed to change their locations, orientations, scales and aspect ratios, and the shape motifs themselves are modeled by active bases. Compared to the active basis model, the shape script model is a sparser representation and therefore has stronger generalization power. It can also be considered another layer of sparse coding of the selected wavelet elements that themselves provide sparse coding of the image intensities.

References

[1]

X. Bai, X. Wang, W. Liu, L. J. Latecki, and Z. Tu. Active skeleton for non-rigid object detection. In Proceedings of International Conference on Computer Vision, 2009.

[2]

E. J. Candes and D. L. Donoho. Curvelets - a surprisingly effective nonadaptive representation for objects with edges. Curves and Surfaces L. L. Schumakeretal. (eds), 1999.

[3]

J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2:1160--1169, 1985.

[4]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1--38, 1977.

[5]

A. Dubinsky and S.-C. Zhu. A multiscale generative model for animate shape and parts. In Proceedings of International Conference on Computer Vision, 2003.

Digital Library

[6]

J. H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82:249--266, 1987.

[7]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119--139, 1997.

Digital Library

[8]

P. Konye and K. Ashforth. Funky Things to Draw. Hinkler Books, 2008.

[9]

S. Geman, D. F. Potter, and Z. Chi. Composition systems. Quarterly of Applied Mathematics, 60:707--736, 2002.

[10]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278--2324, 1998.

[11]

S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41:3397--3415, 1993.

Digital Library

[12]

B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607--609, 1996.

[13]

M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019--1025, 1999.

[14]

Z. Si, H. Gong, S.-C. Zhu, and Y. N. Wu. Learning active basis models by EM-type algorithms. Statistical Science, in press, 2009.

[15]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B, 58: 267--288, 1996.

[16]

D. Toll. You Can Draw: Over 100 Drawings to Master. Hinkler Books, 2006.

[17]

P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57:137--154, 2004.

Digital Library

[18]

Y. N. Wu, Z. Si, H. Gong, and S.-C. Zhu. Learning active basis model for object detection and recognition. International Journal of Computer Vision, in press, 2009.

Digital Library

[19]

A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8:99--111, 1992.

Digital Library

[20]

L. Zhu, C. Lin, H. Huang, Y. Chen, and A. Yuille. Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Proceedings of European Conference on Computer Vision, 2008.

Digital Library

[21]

S.-C. Zhu and D. B. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2:259--362, 2006.

Digital Library

Cited By

Walha RDrira FLebourgeois FGarcia CAlimi A(2014)Sparse Coding with a Coupled Dictionary Learning Approach for Textual Image Super-resolutionProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.763(4459-4464)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1109/ICPR.2014.763
Shi LZhao Y(2010)Batch Mode Sparse Active LearningProceedings of the 2010 IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2010.175(875-882)Online publication date: 13-Dec-2010
https://dl.acm.org/doi/10.1109/ICDMW.2010.175

Index Terms

Wavelet, active basis, and shape script: a tour in the sparse land
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Hierarchical representations
        Image representations

Recommendations

Learning Active Basis Model for Object Detection and Recognition

This article proposes an active basis model, a shared sketch algorithm, and a computational architecture of sum-max maps for representing, learning, and recognizing deformable templates. In our generative model, a deformable template is in the form of an ...
Editor's choice article: On growth and formlets: Sparse multi-scale coding of planar shape

We propose a sparse representation of 2D planar shape through the composition of warping functions, termed formlets, localized in scale and space. Each formlet subjects the 2D space in which the shape is embedded to a localized isotropic radial ...
Deep sparse dictionary-based representation for 3D non-rigid shape retrieval
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

In this paper, we address the problem of non-rigid 3D shape retrieval. The proposed method extract high-level features that are invariant to non-rigid shape deformations by integrating deep dictionary learning and a sparse coding approach. A stacked ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '10: Proceedings of the international conference on Multimedia information retrieval

March 2010

600 pages

ISBN:9781605588155

DOI:10.1145/1743384

General Chairs:
James Z. Wang
The Pennsylvania State University, USA
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Nuria Oliver Ramirez
Telefonica Research, Spain
,
Apostol Natsev
IBM Research, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MIR '10

Sponsor:

SIGMM

MIR '10: International Conference on Multimedia Information Retrieval

March 29 - 31, 2010

Pennsylvania, Philadelphia, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
146
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Walha RDrira FLebourgeois FGarcia CAlimi A(2014)Sparse Coding with a Coupled Dictionary Learning Approach for Textual Image Super-resolutionProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.763(4459-4464)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1109/ICPR.2014.763
Shi LZhao Y(2010)Batch Mode Sparse Active LearningProceedings of the 2010 IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2010.175(875-882)Online publication date: 13-Dec-2010
https://dl.acm.org/doi/10.1109/ICDMW.2010.175

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents