Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

One-Shot Learning of Object Categories

Published: 01 April 2006 Publication History

Abstract

Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

References

[1]
Merriam-Webster's Collegiate Dictionary, 10th ed., Springfield, Mass.: Merriam-Webster, Inc., 1994.
[2]
Y. Amit and D. Geman, “A Computational Model for Visual Selection,” Neural Computation, vol. 11, no. 7, pp. 1691-1715, 1999.
[3]
H. Attias, “Inferring Parameters and Structure of Latent Variable Models by Variational Bayes,” Proc. 15th Conf. Uncertainty in Artificial Intelligence, pp. 21-30, 1999.
[4]
I. Biederman, “Recognition-by-Components: A Theory of Human Image Understanding,” Psychological Rev., vol. 94, pp. 115-147, 1987.
[5]
M. Burl and P. Perona, “Recognition of Planar Object Classes,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 223-230, 1996.
[6]
M. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proc. European Conf. Computer Vision, pp. 628-641, 1996.
[7]
A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 29, pp. 1-38, 1976.
[8]
L. Fei-Fei, R. Fergus, and P. Perona, “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories,” Proc. Ninth Int'l Conf. Computer Vision, pp. 1134-1141, Oct. 2003.
[9]
L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. Workshop Generative-Model Based Vision, 2004.
[10]
L. Fei-Fei, R. Fergus, and P. Perona, supplemental material,
[11]
P. Felzenszwalb, and D. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 1, pp. 55-79, 2005.
[12]
P. Felzenszwalb and D. Huttenlocher, “Representation and Detection of Deformable Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 208-220, Feb. 2005.
[13]
R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. Computer Vision and Pattern Recognition, pp. 264-271, 2003.
[14]
R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. Eighth European Conf. Computer Vision, 2004.
[15]
R. Fergus, P. Perona, and A. Zisserman, “A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition,” Proc. Computer Vision and Pattern Recognition, 2005.
[16]
D. Forsyth and A. Zisserman, “Shape from Shading in the Light of Mutual Illumination,” Image and Vision Computing, pp. 42-29, 1990.
[17]
A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis. Chapman Hall/CRC, 1995.
[18]
R. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman Hall, 1992.
[19]
R. Gilks and P. Wild, “Adaptive Rejection Sampling for Gibbs Sampling,” Applied Statistics, vol. 41, pp. 337-348, 1992.
[20]
W. Grimson and D. Huttenlocher, “On the Sensitivity of the Hough Transform for Object Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 3, pp. 255-274, Mar. 1990.
[21]
K. Humphreys and M. Titterington, “Some Examples of Recursive Variational Approximations for Bayesian Inference,” Advanced Mean Field Methods. M. Opper and D. Saad, eds., MIT Press, 2001.
[22]
D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing Images Using the Hausdorff Distance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863, Sept. 1993.
[23]
T. Kadir and M. Brady, “Scale, Saliency and Image Description,” Int'l J. Computer Vision, vol. 45, no. 2, pp. 83-105, 2001.
[24]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[25]
Y. LeCun, F. Huang, and L. Bottou, “Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting,” Proc. Conf. Computer Vision and Pattern Recognition, 2004.
[26]
T. Leung, M. Burl, and P. Perona, “Finding Faces in Cluttered Scenes Using Labeled Random Graph Matching,” Proc. Int'l Conf. Computer Vision, pp. 637-644, 1995.
[27]
D. Lowe, “Object Recognition from Local Scale-Invariant Features,” Proc. Int'l Conf. Computer Vision, pp. 1150-1157, 1999.
[28]
K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” Proc. European Conf. Computer Vision, vol. 1, pp. 128-142, 2002.
[29]
R. Neal and G. Hinton, “A View of the EM Algorithm that Justifies Incremental, Sparse and Other Variants,” Learning in Graphical Models, M.I. Jordan, ed., pp. 355-368, Norwell, Mass.: Kluwer Academic Press, 1998.
[30]
W. Penny, “Variational Bayes for d-Dimensional Gaussian Mixture Models,” technical report, Univ. College London, 2001.
[31]
F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, “3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints,” Proc. Computer Vision and Pattern Recognition, pp. 272-280, 2003.
[32]
H. Rowley, S. Baluja, and T. Kanade, “Neural Network-Based Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, Jan. 1998.
[33]
E. Sali and S. Ullman, “Combining Class-Specific Fragments for Object Classification,” Proc. British Machine Vision Conf., vol. 1, pp. 203-213, 1999.
[34]
H. Schneiderman and T. Kanade, “A Statistical Approach to 3D Object Detection Applied to Faces and Cars,” Proc. Computer Vision and Pattern Recognition, pp. 746-751, 2000.
[35]
K. Sung and T. Poggio, “Example-Based Learning for View-Based Human Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[36]
P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.
[37]
P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Proc. Int'l Conf. Computer Vision, pp. 734-741, 2003.
[38]
M. Weber, W. Einhaeuser, M. Welling, and P. Perona, “Viewpoint-Invariant Learning and Detection of Human Heads,” Proc. Fourth Int'l Conf. Automated Face and Gesture Recognition, pp. 20-27, 2000.
[39]
M. Weber, M. Welling, and P. Perona, “Unsupervised Learning of Models for Recognition,” Proc. European Conf. Computer Vision, vol. 2, pp. 101-108, 2000.
[40]
A. Torralba, K.P. Murphy, and W.T. Freeman, “Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 762-769, 2004.
[41]
M. Weber, “Unsupervised Learning of Models for Object Recognition,” PhD thesis, Calif. Inst. of Technology, Pasadena, 2000.
[42]
R. Fergus, “Visual Object Category Recognition,” PhD thesis, Univ. of Oxford, U.K., 2005.
[43]
A. Berg, T. Berg, and J. Malik, “Shape Matching and Object Recognition Using Low Distortion Correspondence,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 26-33, June 2005.

Cited By

View all
  • (2024)Learning Task-Specific Embeddings for Few-Shot Classification via Local Weight AdaptationProceedings of the 2024 16th International Conference on Machine Learning and Computing10.1145/3651671.3651746(485-491)Online publication date: 2-Feb-2024
  • (2024)SRCPT: Spatial Reconstruction Contrastive Pretext Task for Improving Few-Shot Image ClassificationProceedings of the 2024 16th International Conference on Machine Learning and Computing10.1145/3651671.3651701(424-432)Online publication date: 2-Feb-2024
  • (2024)How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversationsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3645081(202-206)Online publication date: 15-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 28, Issue 4
April 2006
175 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2006

Author Tags

  1. Recognition
  2. few images
  3. learning
  4. object categories
  5. priors.
  6. unsupervised
  7. variational inference

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Learning Task-Specific Embeddings for Few-Shot Classification via Local Weight AdaptationProceedings of the 2024 16th International Conference on Machine Learning and Computing10.1145/3651671.3651746(485-491)Online publication date: 2-Feb-2024
  • (2024)SRCPT: Spatial Reconstruction Contrastive Pretext Task for Improving Few-Shot Image ClassificationProceedings of the 2024 16th International Conference on Machine Learning and Computing10.1145/3651671.3651701(424-432)Online publication date: 2-Feb-2024
  • (2024)How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversationsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3645081(202-206)Online publication date: 15-Apr-2024
  • (2024)mmSign: mmWave-based Few-Shot Online Handwritten Signature VerificationACM Transactions on Sensor Networks10.1145/360594520:4(1-31)Online publication date: 11-May-2024
  • (2024)Fast and Robust Sparsity-Aware Block Diagonal RepresentationIEEE Transactions on Signal Processing10.1109/TSP.2023.334356572(305-320)Online publication date: 1-Jan-2024
  • (2024)Attribute-Based Robotic Grasping With Data-Efficient AdaptationIEEE Transactions on Robotics10.1109/TRO.2024.335348440(1566-1579)Online publication date: 12-Jan-2024
  • (2024)Property-Aware Relation Networks for Few-Shot Molecular Property PredictionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336809046:8(5413-5429)Online publication date: 1-Aug-2024
  • (2024)Robust Meta-Representation Learning via Global Label Inference and ClassificationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332818446:4(1996-2010)Online publication date: 1-Apr-2024
  • (2024)Diffusion Mechanism in Residual Neural Network: Theory and ApplicationsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.327234146:2(667-680)Online publication date: 1-Feb-2024
  • (2024)Few-Shot Learning With a Strong TeacherIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.316036246:3(1425-1440)Online publication date: 1-Mar-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media