Abstract
We present an attempt to improve the performance of multi-class image segmentation systems based on a multilevel description of segments. The multi-class image segmentation system used in this paper marks the segments in an image, describes the segments via multilevel feature vectors and passes the vectors to a multi-class object classifier. The focus of this paper is on the segment description section. We first propose a robust, scale-invariant texture feature set, named directional differences (DDs). This feature is designed by investigating the flaws of conventional texture features. The advantages of DDs are justified both analytically and experimentally. We have conducted several experiments on the performance of our multi-class image segmentation system to compare DDs with some well-known texture features. Experimental results show that DDs present about 8 % higher classification accuracy. Feature reduction experiments also show that in a combined feature space, DDs remain in the list of most effective features even for small feature vector sizes. To describe a segment fully, we introduce a multilevel strategy called different levels of feature extraction (DLFE) that enables the system to include the semantic relations and contextual information in the features. This information is very effective especially for highly occluded objects. DLFE concatenates the features related to different views of every segment. Experimental results that show more than 4 % improvement in multi-class image segmentation accuracy is achieved. Using the semantic information in the classifier section adds another 2 % improvement to the accuracy of the system.
Similar content being viewed by others
References
Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)
Shotton, J., Johnson, M., Cipolla, R.: Semantic Texton Forests for Image Categorization and Segmentation. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–8 (2008)
Schroff, F., Criminisi, A., Zisserman, A.: Object class segmentation using random forests. In: Proceedings of the British machine vision conference, pp. 54.1–54.10 (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. IEEE Conf. Comput. Vis. Pattern Recogn. 1, 886–893 (2005)
Bosch, Z., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401–408 (2007)
Overett, G., Tychsen-Smith, L., Petersson, L., Petersson, N., Andersson, L.: Creating robust high-throughput traffic sign detectors using centre-surround HOG statistics. Mach. Vis. Appl., pp. 1–14 (2011)
Xu, C., Prince, J.L.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7, 359–369 (1998)
Prewitt, J.: Object Enhancement and Extraction. In Picture Processing and Psychopictorics. Academic Press, New York (1970)
Kekre, H.B., Gharge, S.M.: Image segmentation using extended edge operator for mammographic images. Int. J. Comput. Sci. Eng. 2(04), 1086–1091 (2010)
Marr, D., Hildreth, E.: Theory of edge detection. Proc. R. Soc. Lond. Ser. B Biol. Sci. 207(1167), 187–217 (1980)
Mostajabi, M., Gholampour, I.: A framework based on the affine invariant regions for improving unsupervised image segmentation. Information sciences signal processing and their applications (ISSPA), 11th International conference on, Montreal, Canada, pp. 17–22 (2012)
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. Euro. Conf. Comput. Vis. (2010)
Tighe, J., Lazebnik, S.: Finding things: image parsing with regions and per-exemplar detectors. IEEE Conf. Comput. Vis. Pattern Recogn. (2013)
Carreira, J., Sminchisescu, C.: CPMC: Automatic object segmentation using constrained parametric min-cuts. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI), July (2012)
Ion, A., Carreira, J., Sminchisescu, C.: Probabilistic joint image segmentation and labeling by figure-ground composition. Int. J. Comput. Vis. (IJCV), November (2013)
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: ECCV, pp. 430–443 (2012)
Kohli, P., Ladick, L., Torr, P.: Robust higher order potentials for enforcing label consistency. Int. J. Comput. Vis. 82, 302–324 (2009)
Plath, N., Toussaint, M., Nakajima, S.: Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of international conference on machine learning, pp. 817–824 (2009)
Bertelli, L., Tianli, Y., Vu, D., Gokturk, B.: Kernelized structural SVM learning for supervised object segmentation. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, pp. 2153–2160 (2011)
Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression.In: Proceedings of the 10th European conference on computer vision, Marseille, France, pp. 2–15 (2008)
Li, N., DiCarlo, J.: Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron 67, 1062–1075 (2010)
Oliva, Z., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 (2007)
Ma, W., Navalpakkam, V., Beck, J., Berg, R., Pouget, A.: Behavior and neural basis of near-optimal visual search. Nat. Neurosci. 14, 783–790 (2011)
Oliva, A.: Gist of the scene. In: The encyclopedia of neurobiology of attention, pp. 251–256 (2005)
Mylonas, P., Spyrou, E., Avrithis, Y., Kollias, S.: Using visual context and region semantics for high-level concept detection. Trans. Multi. 11, 229–243 (2009)
Torralba, Z., Murphy, K. P., Freeman, W. T.: Contextual models for object detection using boosted random fields. Adv. Neural Inf. Process. Syst. (NIPS), pp. 1401–1408 (2005)
Jian, Y., Fidler, S., Urtasun, R.: Describing the scene as a whole: joint object detection, scene classification and semantic segmentation. Comput. Vis. Pattern Recogn.(CVPR), pp. 702–709 (2012)
Galleguillos, C., Belongie, S.: Context based object categorization:a critical survey. Comput. Vis. Image Underst. 114, 712–722 (2010)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Class. pp. 61–74 (1999)
Liang, H., Zhang, H., Yan, Y.: Decision trees for probability estimation: an empirical study. IEEE 24th international conference on tools with artificial intelligence, pp. 756–764 (2012)
Herbert, B., Andreas, E., Tinne, T., Van Luc, G.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008)
Comaniciu, D., Meer, P.: Mean Shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., pp. 603–619 (2002)
Jamie, S., John, W., Carsten, R., Antonio, C.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. Presented at the proceedings of the 9th European conference on computer vision, Volume Part I, pp. 1–15 (2006)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th international conference on machine learning, pp. 282–289 (2001)
Gould S., Fulton R., Koller D.: Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of international conference on computer vision (ICCV) (2009)
Chang, C.-C., Lin, C.-J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)
Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies. Featur. Extr. 207, 315–324 (2006)
CIE.: ‘Colorimetry’, CIE Pub. 15.2, 2nd ed., Commission International de L’Eclairage, pp. 29–30 (1986)
Domke, J.: Graphical models / conditional random field toolbox. http://users.cecs.anu.edu.au/~jdomke/JGMT/. Accessed 18 April (2013)
Domke, J.: Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2454–2467 (2013)
Socher R., Lin C., Ng A., Manning C.: Parsing natural scenes and natural language with recursive neural networks. In ICML (2011)
Kumar M., Koller D.: Efficiently selecting regions for scene understanding. In CVPR (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mostajabi, M., Gholampour, I. A robust multilevel segment description for multi-class object recognition. Machine Vision and Applications 26, 15–30 (2015). https://doi.org/10.1007/s00138-014-0642-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-014-0642-1