Abstract
Scene attribute recognition is to identify attribute labels of one scene image based on scene representation for deeper semantic understanding of scenes. In the past decades, numerous algorithms for scene representation have been proposed by feature engineering or deep convolutional neural network. For models based on only one kind of image feature, it is still difficult to learn the representation of multiple attributes from local image region. For models based on deep learning, despite multi-label can be directly used for learning attributes representation, huge training data are usually necessary to build the multi-label model. In this paper, we investigate the problem by the way of scene representation modeling with multi-feature and non-deep learning. Firstly, we introduce linear mixing model (LMM) for scene image modeling, then present a novel approach, referred to as the mini-batch minimum simplex estimation (MMSE), for attribute-based scene representation learning from highly complex image data. Finally, a two-stage multi-feature fusion method is proposed to further improve the feature representation for scene attribute recognition. The proposed method takes advantage of the fast convergence of nonnegative matrix factorization (NMF) schemes, and at the same time using mini-batch to speed up the computation for large-scale scene dataset. The experimental results based on real image scene demonstrate that the proposed method outperforms several other advanced scene attribute recognition approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J (2019) Context and attribute grounded dense captioning. In: 2019 IEEE Conference on computer vision and pattern recognition, CVPR 2019, 15–20
Choi S, Kim JT, Choo J (2019) Cars Can’t Fly up in the sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks. In: 2019 IEEE Conference on computer vision and pattern recognition, CVPR 2019, 15–20
Zhang R, Lin L, Wang G, Wang M, Zuo W (2019) Hierarchical scene parsing by weakly supervised learning with image descriptions. IEEE Trans Pattern Anal Mach Intell 41(3):596–610
Sulistiyo AMD, Kawanishi Y, Deguchi D, Hirayama T, Ide I, Zheng JY, Murase H (2018) Attribute-aware Semantic Segmentation of Road Scenes for Understanding Pedestrian Orientations. In: IEEE 21st international conference on intelligent transportation systems, ITSC
Vitor GB, Victorino AC, Ferreira JV (2021) Modeling evidential grids using semantic context information for dynamic scene perception. Knowledge-Based Systems 215:106777
Xie L, Lee F, Liu L, Kotanic K, Chen Q (2020) Scene recognition: A comprehensive survey. Pattern Recognit 102:107205
Zeng H, Song X, Chen G (2020) Learning scene attribute for scene recognition. IEEE IEEE Trans Multimed 22(6):1519– 1530
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3):145–175
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Conference on computer vision and pattern recognition, CVPR 2006, 17–22
Patterson G, Xu C, Su H, Hays J (2014) The SUN attribute database: beyond categories for deeper scene understanding. Int J Comput Vis 108:59–81
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, in: NIPS’01, MIT Press, pp 681–687
Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Chen L, Zhan W, Tian W, He Y, Zou Q (2019) Deep integration: a Multi-Label architecture for road scene recognition. IEEE Trans Image Process 28(10):4883–4898
Song L, Liu J, Qian B, Sun M, Yang K, Sun M, Abbas S (2018) A deep multi-modal CNN for multi-instance multi-label image classification. IEEE Trans Image Process 27(12):6025–6038
Khan N, Chaudhuri U, Banerjee B, Chaudhuri S (2019) Graph convolutional network for multi-label VHR remote sensing scene recognition. Neurocomputing 357:36–46
Wang S, Wnag Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491
Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: 2005 IEEE Conference on computer vision and pattern recognition, CVPR
Lalonde J-F, Hoiem D, Efros AA, Rother C, Winn J, Criminisi A (2007) Photo clip art. ACM Transactions on Graphics 26(3):2007
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE Conference on computer vision and pattern recognition, CVPR
Zhu J, Wu T, Zhu S-C, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166
Tung F, Little JJ (2015) Improving scene attribute recognition using web-scale object detectors. Comput Vis Image Underst 138:86–91
Chen X, Shrivastava A, Gupta A (2013) NEIL: Extracting visual knowledge from web data. In: IEEE International conference on computer vision
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: 2014 British machine vision conference
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, CVPR
Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans Image Process 26(4):2055–2068
Qi K, Yang C, Shen S (2021) A multi-level improved circle pooling for scene classification of high-resolution remote sensing imagery. Neurocomputing
Yuan X, Qiao Z, Meyarian A (2021) Scale attentive network for scene recognition. Neurocomputing
Lin C, Lee F, Chen Q (2022) Scene recognition using multiple representation network. Applied Soft Computing
Zou Z, Liu W, Xing W (2021) AdaNFF: A new method for adaptive nonnegative multi-feature fusion to scene classification. Pattern Recognit
Nascimento JMP, Bioucas-Dias JM (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 43(4):898–910
Li J, Agathos A, Zaharie D, Bioucas-Dias JM, Plaza A, Li X (2015) Minimum volume simplex analysis: a fast algorithm for linear hyperspectral unmixing. IEEE Trans Geosci Remote Sens 53(9):5067–5082
Lin C-H, Chi C-Y, Wang Y-H, Chan T-H (2016) A fast hyperplane-based minimum-volume enclosing simplex algorithm for blind hyper-spectral unmixing. IEEE Transactions on Signal Processing 64(8):1946–196
Zhang S, Agathos A, Li J (2017) Robust minimum volume simplex analysis for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 55(11):6431–6439
Fu X, Huang K, Yang B, Ma W-K, Ni D (2016) sidiropoulos, Robust volume minimization-based matrix factorization for remote sensing and document clustering. IEEE Trans Signal Process 64(23):6254–6268
Leplat V, Ang AMS, Gillis N (2019) Minimum-volume rank-deficient nonnegative matrix factorizations. ICASSP, pp 3402–3406
Marrinan T, Gillis N (2020) Hyperspectral unmixing with rare endmembers via minimax nonnegative matrix factorization. EUSIPCO, pp 1015–1019
Wang X, Zhong Y, Zhang L, Xu Y (2019) Blind hyperspectral unmixing considering the adjacency effect. IEEE Trans Geosci Remote Sens 57(9):6633–6649
Mangai UG, Samanta S, Das S, Roy PC (2010) A survey of decision fusion and feature fusion strategies for pattern classification. IETE Tech Rev 27(4):293–307
Charte D, Charte F, Garcia S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion 44:78–96
Ma AJ, Yuen PC, Lai JH (2013) Linear dependency modeling for classifier fusion and feature combination. IEEE Trans Pattern Anal Mach Intell 35(5):1135–1148
Baggenstoss PM (2016) Maximum entropy feature fusion. In: International conference on information fusion, pp 1163–1169
Liu Y, Tang A, Cai F, Ren P, Sun Z (2019) Multi-feature based Question–Answerer Model Matching for predicting response time in CQA. Knowledge-Based Systems 182:104794
Shekhar S, Patel VM, Nasrabadi NM, Chellapa R (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2281
Lin CJ (2007) Projected gradient methods for non-negative matrix factorization. Neural Comput 19(10):2756–2779
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: 2009 IEEE Conference on computer vision and pattern recognition, CVPR
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Xie L, Lee F, Liu L (2020) Hierarchical coding of convolutional features for scene recognition. IEEE Transactions on Multimedia 22(5):1182–1192
Chenga X, Lub J, Fengb J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recogn 74:474–487
Liu Y, Chen Q, Chen W, Wassell I (2018) Dictionary learning inspired deep network for scene recognition. In: Proceedings of AAAI conference on artificial intelligence, pp 7178–7185
Acknowledgements
This research is partially supported by the Beijing Natural Science Foundation (No.4212025), National Natural Science Foundation of China (No.61876018, No.61976017).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, Z., Liu, W., Xing, W. et al. Minimum volume simplex-based scene representation and attribute recognition with feature fusion. Appl Intell 53, 8959–8977 (2023). https://doi.org/10.1007/s10489-022-03697-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03697-9