Depth-assisted rectification for real-time object detection and pose estimation

João Paulo Silva do Monte Lima^1,2,
Francisco Paulo Magalhães Simões²,
Hideaki Uchiyama³,
Veronica Teichrieb² &
…
Eric Marchand³

1889 Accesses
Explore all metrics

Abstract

RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and tracking in real time. In this context, the work presented in this paper investigates the use of consumer RGB-D sensors for object detection and pose estimation from natural features. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D datasets using microsoft kinect or similar sensors: a survey

Article Open access 19 March 2016

3D Reconstruction from RGB-D Data

High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review

Article Open access 06 March 2022

References

Álvarez, H., Borro, D.: Junction assisted 3d pose retrieval of untextured 3d models in monocular images. Comput. Vis. Image Underst. 117(10), 1204–1214 (2013)
Article Google Scholar
Barrow, H.G., Tenenbaum, J.M., Bolles, R.C., Wolf, H.C.: Parametric correspondence and chamfer matching: two new techniques for image matching. Technical report, DTIC Document (1977)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Benhimane, S., Malis, E.: Real-time image-based tracking of planes using efficient second-order minimization. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004 (IROS 2004). Proceedings, vol. 1, pp. 943–948. IEEE (2004)
Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994)
Article Google Scholar
Borgefors, G.: Distance transformations in digital images. Comput. Vis. Graph. Image Process. 34(3), 344–371 (1986)
Article Google Scholar
Bradski, G., Kaehler, A.: Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media Inc, Sebastopol (2008)
Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Computer Vision—ECCV 2010, pp. 778–792. Springer, Berlin (2010)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Article Google Scholar
Cruz, L., Lucio, D., Velho, L.: Kinect and RGBD images: challenges and applications. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 36–49. IEEE (2012)
Del Bimbo, A., Franco, F., Pernici, F.: Local homography estimation using keypoint descriptors. In: Analysis, Retrieval and Delivery of Multimedia Content, pp. 203–217. Springer, Berlin (2013)
Donoser, M., Kontschieder, P., Bischof, H.: Robust planar target tracking and pose estimation from a single concavity. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 9–15. IEEE (2011)
Eyjolfsdottir, E., Turk, M.: Multisensory embedded pose estimation. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 23–30. IEEE (2011)
Falahati, S.: OpenNI Cookbook. Packt Publishing Ltd, Birmingham (2013)
Google Scholar
Gossow, D., Weikersdorfer, D., Beetz, M.: Distinctive texture features from perspective-invariant keypoints. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 2764–2767. IEEE (2012)
Hagbi, N., Bergig, O., El-Sana, J., Billinghurst, M.: Shape recognition and pose estimation for mobile augmented reality. In: 8th IEEE International Symposium on Mixed and Augmented Reality, 2009. ISMAR 2009, pp. 65–71. IEEE (2009)
Haralock, R.M., Shapiro, L.G.: Computer and robot vision. Addison-Wesley Longman Publishing Co., Inc. (1991)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50. Manchester, UK (1988)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Google Scholar
Hinterstoisser, S., Benhimane, S., Navab, N., Fua, P., Lepetit, V.: Online learning of patch perspective rectification for efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)
Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., Lepetit, V.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 858–865. IEEE (2011)
Hinterstoisser, S., Kutter, O., Navab, N., Fua, P., Lepetit, V.: Real-time learning of accurate patch rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 2945–2952. IEEE (2009)
Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., Navab, N.: Dominant orientation templates for real-time detection of texture-less objects. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2257–2264. IEEE (2010)
Hofhauser, A., Steger, C., Navab, N.: Edge-based template matching and tracking for perspectively distorted planar objects. In: Advances in Visual Computing, pp. 35–44. Springer, Berlin (2008)
Holzer, S., Hinterstoisser, S., Ilic, S., Navab, N.: Distance transform templates for object detection and pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 1177–1184. IEEE (2009)
Konolige, K.: Projected texture stereo. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–155. IEEE (2010)
Koser, K., Koch, R.: Perspectively invariant normal features. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8. IEEE (2007)
Kurz, D., Benhimane, S.: Gravity-aware handheld augmented reality. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 111–120. IEEE (2011)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011)
Lee, W., Park, N., Woo, W.: Depth-assisted real-time 3d object detection for augmented reality. In: ICAT’11, pp. 126–132 (2011)
Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 2, pp. 775–781. IEEE (2005)
Lieberknecht, S., Benhimane, S., Meier, P., Navab, N.: A dataset and evaluation methodology for template-based tracking algorithms. In: ISMAR, pp. 145–151 (2009)
Lima, J.P., Simoes, F., Uchiyama, H., Teichrieb, V., Marchand, E., et al.: Depth-assisted rectification of patches using RGB-D consumer devices to improve real-time keypoint matching. In: International Conference on Computer Vision Theory and Applications, Visapp 2013, pp. 651–656 (2013)
Lima, J.P., Teichrieb, V., Uchiyama, H., Marchand, E., et al.: Object detection and pose estimation from natural features using consumer RGB-D sensors: applications in augmented reality. In: IEEE International Symposium on Mixed and Augmented Reality (Doctoral Symposium), ISMAR’12, pp. 1–4 (2012)
Lima, J.P., Uchiyama, H., Teichrieb, V., Marchand, E.: Texture-less planar object detection and pose estimation using depth-assisted rectification of contours. In: 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 297–298. IEEE (2012)
Liu, M.Y., Tuzel, O., Veeraraghavan, A., Chellappa, R.: Fast directional chamfer matching. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1696–1703. IEEE (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Marcon, M., Frigerio, E., Sarti, A., Tubaro, S.: 3d wide baseline correspondences using depth-maps. Signal Process. Image Commun. 27(8), 849–855 (2012)
Article Google Scholar
Martedi, S., Thomas, B., Saito, H.: Region-based tracking using sequences of relevance measures. In: 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 1–6. IEEE (2013)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 384–393. BMVA (2002)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2), 43–72 (2005)
Article Google Scholar
Morel, J.M., Yu, G.: ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009)
Morwald, T., Richtsfeld, A., Prankl, J., Zillich, M., Vincze, M.: Geometric data abstraction using b-splines for range image segmentation. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 148–153. IEEE (2013)
Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., Fitzgibbon, A.: Kinectfusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136. IEEE (2011)
Ozuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07, pp. 1–8. IEEE (2007)
Pagani, A., Stricker, D.: Learning local patch orientation with a cascade of sparse regressors. In: BMVC, pp. 1–11 (2009)
Park, Y., Lepetit, V., Woo, W.: Texture-less object tracking with online training using an RGB-D camera. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 121–126. IEEE (2011)
ROS: openni_launch_tutorials_intrinsiccalibration—ros wiki (2015). http://goo.gl/cEYyaG. Accessed 28 Aug 2015
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Computer Vision—ECCV 2006, pp. 430–443. Springer, Berlin (2006)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)
Rusu, R.B., Cousins, S.: 3D is here: point cloud library (PCL). In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–4. IEEE (2011)
Shotton, J., Blake, A., Cipolla, R.: Multiscale categorical object recognition using contour fragments. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1270–1281 (2008)
Article Google Scholar
Suzuki, S., et al.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985)
Article MATH Google Scholar
Taylor, S., Drummond, T.: Multiple target localisation at over 100 fps. In: Proceedings of the British Machine Vision Conference, pp. 1–11. BMVA (2009)
Uchiyama, H., Marchand, E.: Toward augmenting everything: detecting and tracking geometrical features on planar objects. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 17–25. IEEE (2011)
Woodfill, J.I., Gordon, G., Buck, R.: Tyzx deepsea high speed stereo vision system. In: Conference on Computer Vision and Pattern Recognition Workshop, 2004. CVPRW’04, pp. 41–41. IEEE (2004)
Wu, C., Clipp, B., Li, X., Frahm, J.M., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)
Yang, M.Y., Cao, Y., Förstner, W., McDonald, J.: Robust wide baseline scene alignment based on 3d viewpoint normalization. In: Advances in Visual Computing, pp. 654–665. Springer, Berlin (2010)
Zeisl, B., Köser, K., Pollefeys, M.: Viewpoint invariant matching via developable surfaces. In: Computer Vision—ECCV 2012. Workshops and Demonstrations, pp. 62–71. Springer, Brelin (2012)
Zeisl, B., Koser, K., Pollefeys, M.: Automatic registration of RGB-D scans via salient directions. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2808–2815. IEEE (2013)

Download references

Acknowledgments

The authors would like to thank Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)/Institut National de Recherche en Informatique et en Automatique (INRIA)/Comisión Nacional de Investigación Científica y Tecnológica (CONICYT) STIC-AmSud project ARVS and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (process 141705/2010-8) for partially funding this research.

Author information

Authors and Affiliations

Departamento de Estatística e Informática (DEINFO), Universidade Federal Rural de Pernambuco (UFRPE), Recife, PE, Brazil
João Paulo Silva do Monte Lima
Voxar Labs, Centro de Informática (CIn), Universidade Federal de Pernambuco (UFPE), Recife, PE, Brazil
João Paulo Silva do Monte Lima, Francisco Paulo Magalhães Simões & Veronica Teichrieb
INRIA Rennes Bretagne-Atlantique, Rennes, France
Hideaki Uchiyama & Eric Marchand

Authors

João Paulo Silva do Monte Lima
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Paulo Magalhães Simões
View author publications
You can also search for this author in PubMed Google Scholar
Hideaki Uchiyama
View author publications
You can also search for this author in PubMed Google Scholar
Veronica Teichrieb
View author publications
You can also search for this author in PubMed Google Scholar
Eric Marchand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Paulo Silva do Monte Lima.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mpg 62920 KB)

Supplementary material 2 (mpg 42922 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

do Monte Lima, J.P.S., Simões, F.P.M., Uchiyama, H. et al. Depth-assisted rectification for real-time object detection and pose estimation. Machine Vision and Applications 27, 193–219 (2016). https://doi.org/10.1007/s00138-015-0740-8

Download citation

Received: 26 June 2014
Revised: 28 August 2015
Accepted: 10 November 2015
Published: 12 December 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00138-015-0740-8

Depth-assisted rectification for real-time object detection and pose estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RGB-D datasets using microsoft kinect or similar sensors: a survey

3D Reconstruction from RGB-D Data

High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Depth-assisted rectification for real-time object detection and pose estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RGB-D datasets using microsoft kinect or similar sensors: a survey

3D Reconstruction from RGB-D Data

High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation