A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection
<p>Proposed deep learning architecture for object classification.</p> "> Figure 2
<p>Modified VGG-19 architecture for features extraction.</p> "> Figure 3
<p>Modified Inceptionv3 architecture for features extraction.</p> "> Figure 4
<p>Proposed system’s predicted labeled output for the Caltech-101 dataset.</p> "> Figure 5
<p>Proposed system’s predicted labeled output for the Birds dataset.</p> "> Figure 6
<p>Proposed system’s predicted labeled output for the Butterflies dataset.</p> "> Figure 7
<p>Confusion matrix of the proposed selection accuracy on ESD classifier.</p> "> Figure 8
<p>Confusion matrix for Birds dataset using proposed selection method on ESD classifier.</p> "> Figure 9
<p>Confusion matrix for Butterflies dataset.</p> "> Figure 10
<p>Confusion matrix of CIFAR-100 dataset for proposed P-Selection method.</p> "> Figure 11
<p>Statistical analysis of ESD classifier using all three methods. Where (<b>a</b>) represent M1 method, (<b>b</b>) denotes P-Fusion method, and (<b>c</b>) denotes P-Selection method, respectively.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Challenges and Contributions
- It uses two pre-trained deep learning architectures, namely-VGG19 and Inception V3, and performs TL to retrain the selected datasets. The FC7 and Average Pool layers of the CNN are utilized for feature extraction.
- A parallel maximum covariance (PMC) technique is proposed for the fusion of both deep learning feature vectors.
- While the Multi Logistic Regression controlled Entropy-Variances (MRcEV) method is employed for selecting the robust features, the Ensemble Subspace Discriminant (ESD) classifier is used as a fitness function.
- A detailed statistical analysis of the proposed method is conducted and compared with recent techniques to examine the stability of the proposed architecture.
4. Materials and Methods
4.1. Deep Learning Features Extraction
4.2. Features Fusion
4.3. Feature Selection
5. Results
5.1. Caltech-101 Dataset Results
5.2. Birds Dataset Results
5.3. Butterflies Dataset
5.4. CIFAR-100 Dataset
5.5. Analysis and Comparison with Existing Techniques
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A
Sr No. | Name | Type | Activation | Learnable | Total Learnables | |
---|---|---|---|---|---|---|
Weights | Bias | |||||
1 | Input | Image Input | 224 × 224 × 3 | - | - | - |
2 | conv1_1 | Convolution | 224 × 224 × 64 | 3 × 3 × 3 × 64 | 1 × 1 × 64 | 1792 |
3 | relu1_1 | ReLU | 224 × 224 × 64 | - | - | - |
4 | conv1_2 | Convolution | 224 × 224 × 64 | 3 × 3 × 64 × 64 | 1 × 1 × 64 | 36,928 |
5 | relu1_2 | ReLU | 224 × 224 × 64 | - | - | - |
6 | pool1 | Max Pooling | 112 × 112 × 64 | - | - | - |
7 | conv2_1 | Convolution | 112 × 112 × 128 | 3 × 3 × 64 × 128 | 1 × 1 × 128 | 73,856 |
8 | relu2_1 | ReLU | 112 × 112 × 128 | - | - | - |
9 | conv2_2 | Convolution | 112 × 112 × 128 | 3 × 3 × 128 × 128 | 1 × 1 × 128 | 147,584 |
10 | relu2_2 | ReLU | 112 × 112 × 128 | - | - | - |
11 | pool2 | Max Pooling | 56 × 56 × 128 | - | - | - |
12 | conv3_1 | Convolution | 56 × 56 × 256 | 3 × 3 × 128 × 256 | 1 × 1 × 256 | 295,168 |
13 | relu3_1 | ReLU | 56 × 56 × 256 | - | - | - |
14 | conv3_2 | Convolution | 56 × 56 × 256 | 3 × 3 × 256 × 256 | 1 × 1 × 256 | 590,080 |
15 | relu3_2 | ReLU | 56 × 56 × 256 | - | - | - |
16 | conv3_3 | Convolution | 56 × 56 × 256 | 3 × 3 × 256 × 256 | 1 × 1 × 256 | 590,080 |
17 | relu3_3 | ReLU | 56 × 56 × 256 | - | - | - |
18 | conv3_4 | Convolution | 56 × 56 × 256 | 3 × 3 × 256 × 256 | 1 × 1 × 256 | 590,080 |
19 | relu3_4 | ReLU | 56 × 56 × 256 | - | - | - |
20 | pool3 | Max Pooling | 28 × 28 × 256 | - | - | - |
21 | conv4_1 | Convolution | 28 × 28 × 512 | 3 × 3 × 256 × 512 | 1 × 1 × 512 | 1,180,160 |
22 | relu4_1 | ReLU | 28 × 28 × 512 | - | - | - |
23 | conv4_2 | Convolution | 28 × 28 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
24 | relu4_2 | ReLU | 28 × 28 × 512 | - | - | - |
25 | conv4_3 | Convolution | 28 × 28 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
26 | relu4_3 | ReLU | 28 × 28 × 512 | - | - | - |
27 | conv4_4 | Convolution | 28 × 28 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
28 | relu4_4 | ReLU | 28 × 28 × 512 | - | - | - |
29 | pool4 | Max Pooling | 14 × 14 × 512 | - | - | - |
30 | conv5_1 | Convolution | 14 × 14 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
31 | relu5_1 | ReLU | 14 × 14 × 512 | - | - | - |
32 | conv5_2 | Convolution | 14 × 14 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
33 | relu5_2 | ReLU | 14 × 14 × 512 | - | - | - |
34 | conv5_3 | Convolution | 14 × 14 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
35 | relu5_3 | ReLU | 14 × 14 × 512 | - | - | - |
36 | conv5_4 | Convolution | 14 × 14 × 512 | 3 × 3 × 512 × 512 | 1 × 1 × 512 | 2,359,808 |
37 | relu5_4 | ReLU | 14 × 14 × 512 | - | - | - |
38 | pool5 | Max Pooling | 7 × 7 × 512 | - | - | - |
39 | fc6 | Fully Connected | 1 × 1 × 4096 | 4096 × 25,088 | 4096 × 1 | 102,764,544 |
40 | relu6 | ReLU | 1 × 1 × 4096 | - | - | - |
41 | drop6 | Dropout | 1 × 1 × 4096 | - | - | - |
42 | fc7 | Fully Connected | 1 × 1 × 4096 | 4096 × 4096 | 4096 × 1 | 16,781,312 |
43 | relu7 | ReLU | 1 × 1 × 4096 | - | - | - |
44 | drop7 | Dropout | 1 × 1 × 4096 | - | - | - |
45 | fc8 | Fully Connected | 1 × 1 × 1000 | 1000 × 4096 | 1000 × 1 | 4,097,000 |
46 | Prob | Softmax | 1 × 1 × 1000 | - | - | - |
47 | Output | Classification | - | - | - |
S/N | Name | Type | Activation | Learnable | |||
---|---|---|---|---|---|---|---|
Weights | Bias | Offset | Scale | ||||
1 | input_1 | Image Input | 299 × 299 × 3 | - | - | - | - |
2 | scaling | Scaling | 299 × 299 × 3 | - | - | - | - |
3 | conv2d_1 | Convolution | 149 × 149 × 32 | [3,3,3,32] | [1,1,32] | - | - |
4 | batch_normalization_1 | Batch Normalization | 149 × 149 × 32 | - | - | 1 × 1 × 32 | 1 × 1 × 32 |
5 | activation_1_relu | ReLU | 149 × 149 × 32 | - | - | - | - |
6 | conv2d_2 | Convolution | 147 × 147 × 32 | [3,3,32,32] | [1,1,32] | - | - |
7 | batch_normalization_2 | Batch Normalization | 147 × 147 × 32 | - | - | [1,1,32] | [1,1,32] |
8 | activation_2_relu | ReLU | 147 × 147 × 32 | - | - | - | - |
9 | conv2d_3 | Convolution | 147 × 147 × 64 | [3,3,32,64] | [1,1,64] | - | - |
10 | batch_normalization_3 | Batch Normalization | 147 × 147 × 64 | - | - | [1,1,64] | [1,1,64] |
11 | activation_3_relu | ReLU | 147 × 147 × 64 | - | - | - | - |
12 | max_pooling2d_1 | Max Pooling | 73 × 73 × 64 | - | - | - | - |
13 | conv2d_4 | Convolution | 73 × 73 × 80 | [1,1,64,80] | [1,1,80] | - | - |
14 | batch_normalization_4 | Batch Normalization | 73 × 73 × 80 | - | - | [1,1,80] | [1,1,80] |
15 | activation_4_relu | ReLU | 73 × 73 × 80 | - | - | - | - |
16 | conv2d_5 | Convolution | 71 × 71 × 192 | [3,3,80,192] | [1,1,192] | - | - |
17 | batch_normalization_5 | Batch Normalization | 71 × 71 × 192 | - | - | [1,1,192] | [1,1,192] |
18 | activation_5_relu | ReLU | 71 × 71 × 192 | - | - | - | - |
19 | max_pooling2d_2 | Max Pooling | 35 × 35 × 192 | - | - | - | - |
20 | conv2d_9 | Convolution | 35 × 35 × 64 | [1,1,192,64] | [1,1,64] | - | - |
21 | batch_normalization_9 | Batch Normalization | 35 × 35 × 64 | - | - | [1,1,64] | [1,1,64] |
22 | activation_9_relu | ReLU | 35 × 35 × 64 | - | - | - | - |
23 | conv2d_7 | Convolution | 35 × 35 × 48 | [1,1,192,48] | [1,1,48] | - | - |
24 | conv2d_10 | Convolution | 35 × 35 × 96 | [3,3,64,96] | [1,1,96] | - | - |
25 | batch_normalization_7 | Batch Normalization | 35 × 35 × 48 | - | - | [1,1,48] | [1,1,48] |
26 | batch_normalization_10 | Batch Normalization | 35 × 35 × 96 | - | - | [1,1,96] | [1,1,96] |
27 | activation_7_relu | ReLU | 35 × 35 × 48 | - | - | - | - |
28 | activation_10_relu | ReLU | 35 × 35 × 96 | - | - | - | - |
29 | average_pooling2d_1 | Avg Pooling | 35 × 35 × 192 | - | - | - | - |
30 | conv2d_6 | Convolution | 35 × 35 × 64 | [1,1,192,64] | [1,1,64] | - | - |
31 | conv2d_8 | Convolution | 35 × 35 × 64 | [5,5,48,64] | [1,1,64] | - | - |
32 | conv2d_11 | Convolution | 35 × 35 × 92 | [3,3,96,96] | [1,1,96] | - | - |
33 | conv2d_12 | Convolution | 35 × 35 × 32 | [1,1,192,32] | [1,1,32] | - | - |
34 | batch_normalization_6 | Batch Normalization | 35 × 35 × 64 | - | - | [1,1,64] | [1,1,64] |
35 | batch_normalization_8 | Batch Normalization | 35 × 35 × 64 | - | - | [1,1,64] | [1,1,64] |
36 | batch_normalization_11 | Batch Normalization | 35 × 35 × 96 | - | - | [1,1,96] | [1,1,96] |
37 | batch_normalization_12 | Batch Normalization | 35 × 35 × 32 | - | - | [1,1,32] | [1,1,32] |
38 | activation_6_relu | ReLU | 35 × 35 × 64 | - | - | - | - |
39 | activation_8_relu | ReLU | 35 × 35 × 64 | - | - | - | - |
40 | activation_11_relu | ReLU | 35 × 35 × 96 | - | - | - | - |
41 | activation_12_relu | ReLU | 35 × 35 × 32 | - | - | - | - |
42 | mixed0 | Depth Concat | 35 × 35 × 256 | - | - | - | - |
43 | conv2d_16 | Convolution | 35 × 35 × 64 | [1,1,256,64] | [1,1,64] | - | - |
44 | batch_normalization_16 | Batch Normalization | 35 × 35 × 64 | - | - | [1,1,64] | [1,1,64] |
45 | activation_16_relu | Fully Connected | 35 × 35 × 64 | - | - | - | - |
46 | conv2d_14 | Convolution | 35 × 35 × 48 | [1,1,256,48] | [1,1,48] | - | - |
47 | conv2d_17 | Convolution | 35 × 35 × 96 | [3,3,64,96] | [1,1,96] | - | - |
-- | -- | -- | -- | -- | -- | -- | -- |
307 | batch_normalization_94 | Batch Normalization | 8 × 8 × 192 | - | - | [1,1,192] | [1,1,192] |
308 | activation_86_relu | ReLU | 8 × 8 × 320 | - | - | - | - |
309 | mixed9_1 | Depth Concat | 8 × 8 × 768 | - | - | - | - |
310 | concatenate_2 | Depth Concat | 8 × 8 × 768 | - | - | - | - |
311 | activation_94_relu | ReLU | 8 × 8 × 192 | - | - | - | - |
312 | mixed10 | Depth Concat | 8 × 8 × 2048 | - | - | - | - |
313 | avg_pool | Avg Pooling | 1 × 1 × 2048 | - | - | - | - |
314 | predictions | Fully Connected | 1 × 1 × 1000 | 1000 × 2048 | 1000 × 1 | - | - |
315 | predictions_softmax | Softmax | 1 × 1 × 1000 | - | - | - | - |
316 | classification layer_predictions | Classification Output | - | - | - | - |
References
- Ly, H.-B.; Le, T.-T.; Vu, H.-L.T.; Tran, V.Q.; Le, L.M.; Pham, B.T. Computational hybrid machine learning based prediction of shear capacity for steel fiber reinforced concrete beams. Sustainability 2020, 12, 2709. [Google Scholar] [CrossRef] [Green Version]
- Cioffi, R.; Travaglioni, M.; Piscitelli, G.; Petrillo, A.; De Felice, F. Artificial intelligence and machine learning applications in smart production: Progress, trends, and directions. Sustainability 2020, 12, 492. [Google Scholar] [CrossRef] [Green Version]
- Lin, F.; Zhang, D.; Huang, Y.; Wang, X.; Chen, X. Detection of corn and weed species by the combination of spectral, shape and textural features. Sustainability 2017, 9, 1335. [Google Scholar] [CrossRef] [Green Version]
- Zhou, C.; Gu, Z.; Gao, Y.; Wang, J. An improved style transfer algorithm using feedforward neural network for real-time image conversion. Sustainability 2019, 11, 5673. [Google Scholar] [CrossRef] [Green Version]
- Amini, M.H.; Arasteh, H.; Siano, P. Sustainable smart cities through the lens of complex interdependent infrastructures: Panorama and state-of-the-art. In Sustainable Interdependent Networks II; Springer: Berlin, Germany, 2019; pp. 45–68. [Google Scholar]
- Gupta, V.; Singh, J. Study and analysis of back-propagation approach in artificial neural network using HOG descriptor for real-time object classification. In Soft Computing: Theories and Applications; Springer: Berlin, Germany, 2019; pp. 45–52. [Google Scholar]
- Sharif, M.; Khan, M.A.; Rashid, M.; Yasmin, M.; Afza, F.; Tanik, U.J. Deep CNN and geometric features-based gastrointestinal tract diseases detection and classification from wireless capsule endoscopy images. J. Exp. Theor. Artif. Intell. 2019, 1–23. [Google Scholar] [CrossRef]
- Rashid, M.; Khan, M.A.; Sharif, M.; Raza, M.; Sarfraz, M.M.; Afza, F. Object detection and classification: A joint selection and fusion strategy of deep convolutional neural network and SIFT point features. In Multimedia Tools and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2018; pp. 1–27. [Google Scholar]
- Wang, S.; Li, W.; Wang, Y.; Jiang, Y.; Jiang, S.; Zhao, R. An improved difference of gaussian filter in face recognition. J. Multimed. 2012, 7, 429–433. [Google Scholar] [CrossRef]
- He, Q.; He, B.; Zhang, Y.; Fang, H. Multimedia based fast face recognition algorithm of speed up robust features. Multimed. Tools Appl. 2019, 78, 1–11. [Google Scholar] [CrossRef]
- Suhas, M.; Swathi, B. Significance of haralick features in bone tumor classification using support vector machine. In Engineering Vibration, Communication and Information Processing; Springer: Berlin, Germany, 2019; pp. 349–361. [Google Scholar]
- Khan, M.A.; Akram, T.; Sharif, M.; Saba, T.; Javed, K.; Lali, I.U.; Tanik, U.J.; Rehman, A. Construction of saliency map and hybrid set of features for efficient segmentation and classification of skin lesion. Microsc. Res. Tech. 2019, 82, 741–763. [Google Scholar] [CrossRef]
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Arshad, H.; Khan, M.A.; Sharif, M.I.; Yasmin, M.; Tavares, J.M.R.; Zhang, Y.D.; Satapathy, S.C. A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition. Expert Syst. 2020, e12541. [Google Scholar] [CrossRef]
- Majid, A.; Khan, M.A.; Yasmin, M.; Rehman, A.; Yousafzai, A.; Tariq, U. Classification of stomach infections: A paradigm of convolutional neural network along with classical features fusion and selection. Microsc. Res. Tech. 2020, 83, 562–576. [Google Scholar] [CrossRef] [PubMed]
- Jiang, B.; Li, C.; Rijke, M.D.; Yao, X.; Chen, H. Probabilistic feature selection and classification vector machine. Acm Trans. Knowl. Discov. Data (Tkdd) 2019, 13, 21. [Google Scholar] [CrossRef] [Green Version]
- Xiao, X.; Qiang, Z.; Zhao, J.; Qiang, Y.; Wang, P.; Han, P. A feature extraction method for lung nodules based on a multichannel principal component analysis network (PCANet). Multimed. Tool Appl. 2019, 8, 1–19. [Google Scholar] [CrossRef]
- Wen, J.; Fang, X.; Cui, J.; Fei, L.; Yan, K.; Chen, Y.; Xu, Y. Robust sparse linear discriminant analysis. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 390–403. [Google Scholar] [CrossRef]
- Mwangi, B.; Tian, T.S.; Soares, J.C. A review of feature reduction techniques in neuroimaging. Neuroinformatics 2014, 12, 229–244. [Google Scholar] [CrossRef] [PubMed]
- Khan, M.A.; Akram, T.; Sharif, M.; Shahzad, A.; Aurangzeb, K.; Alhussein, M.; Haider, S.I.; Altamrah, A. An implementation of normal distribution based segmentation and entropy controlled features selection for skin lesion detection and classification. BMC Cancer 2018, 18, 638. [Google Scholar] [CrossRef]
- Afza, F.; Khan, M.A.; Sharif, M.; Rehman, A. Microscopic skin laceration segmentation and classification: A framework of statistical normal distribution and optimal feature selection. Microsc. Res. Tech. 2019, 82, 1471–1488. [Google Scholar] [CrossRef]
- Gopalakrishnan, R.; Chua, Y.; Iyer, L.R. Classifying neuromorphic data using a deep learning framework for image classification. arXiv 2018, arXiv:1807.00578. [Google Scholar]
- Ryu, J.; Yang, M.-H.; Lim, J. DFT-based transformation invariant pooling layer for visual classification. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 84–99. [Google Scholar]
- Liu, Q.; Mukhopadhyay, S. Unsupervised learning using pretrained CNN and associative memory bank. arXiv 2018, arXiv:1805.01033. [Google Scholar]
- Li, Q.; Peng, Q.; Yan, C. Multiple VLAD encoding of CNNs for image classification. Comput. Sci. Eng. 2018, 20, 52–63. [Google Scholar] [CrossRef] [Green Version]
- Liu, X.; Zhang, R.; Meng, Z.; Hong, R.; Liu, G. On fusing the latent deep CNN feature for image classification. World Wide Web 2019, 22, 423–436. [Google Scholar] [CrossRef]
- Khan, H.A. DM-L based feature extraction and classifier ensemble for object recognition. J. Signal Inf. Process. 2018, 9, 92. [Google Scholar] [CrossRef] [Green Version]
- Mahmood, A.; Bennamoun, M.; An, S.; Sohel, F. Resfeats: Residual network based features for image classification. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1597–1601. [Google Scholar]
- Cengil, E.; Çınar, A.; Özbay, E. Image classification with caffe deep learning framework. In Proceedings of the 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 5–7 October 2017; pp. 440–444. [Google Scholar]
- Zhang, C.; Huang, Q.; Tian, Q. Contextual exemplar classifier-based image representation for classification. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1691–1699. [Google Scholar] [CrossRef]
- Hussain, N.; Khan, M.A.; Sharif, M.; Khan, S.A.; Albesher, A.A.; Saba, T.; Armaghan, A. A deep neural network and classical features based scheme for objects recognition: An application for machine inspection. Multimed. Tool. Appl. 2020. [Google Scholar] [CrossRef]
- Khan, M.A.; Akram, T.; Sharif, M.; Javed, M.Y.; Muhammad, N.; Yasmin, M. An implementation of optimized framework for action classification using multilayers neural network on selected fused features. Pattern Anal. Appl. 2019, 22, 1377–1397. [Google Scholar] [CrossRef]
- Liaqat, A.; Khan, M.A.; Shah, J.H.; Sharif, M.; Yasmin, M.; Fernandes, S.L. Automated ulcer and bleeding classification from WCE images using multiple features fusion and selection. J. Mech. Med. Biol. 2018, 18, 1850038. [Google Scholar] [CrossRef]
- Rauf, H.T.; Saleem, B.A.; Lali, M.I.U.; Khan, M.A.; Sharif, M.; Bukhari, S.A.C. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data Brief 2019, 26, 104340. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Gomes, H.M.; Barddal, J.P.; Enembreck, F.; Bifet, A. A survey on ensemble learning for data stream classification. Acm Comput. Surv. (Csur) 2017, 50, 1–36. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Fei-Fei, L.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar] [CrossRef] [Green Version]
- Lazebnik, S.; Schmid, C.; Ponce, J. A maximum entropy framework for part-based texture and object recognition. In Proceedings of the ICCV 2005 Tenth IEEE International Conference on Computer Vision, Beijing, China, 17–20 October 2005; pp. 832–838. [Google Scholar]
- Lazebnik, S.; Schmid, C.; Ponce, J. Semi-local affine parts for object recognition. In Proceedings of the British Machine Vision Conference (BMVC’04), Kingston, UK, 7–9 September 2004; pp. 779–788. [Google Scholar]
- Ma, B.; Li, X.; Xia, Y.; Zhang, Y. Autonomous deep learning: A genetic DCNN designer for image classification. Neurocomputing 2020, 379, 152–161. [Google Scholar] [CrossRef] [Green Version]
- Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Improved inception-residual convolutional neural network for object recognition. Neural Comput. Appl. 2018, 32, 1–15. [Google Scholar] [CrossRef] [Green Version]
Image Database | Sample Classes | Total Samples | Min-Max |
---|---|---|---|
Caltech [41] | 101 | 9144 | 31~800 |
Birds [42] | 6 | 600 | 100~100 |
Butterflies [43] | 7 | 619 | 42~134 |
CIFAR-100 [40] | 100 | 1000 (Testing) 50,000 (Training) | 100 |
Classifier | M1 | P-Fusion | P-Selection | Accuracy (%) | FNR (%) | Time (s) |
---|---|---|---|---|---|---|
ESD | ✓ | - | - | 79.0 | 21.0 | 180.00 |
- | ✓ | - | 90.8 | 9.2 | 93.70 | |
- | - | ✓ | 95.5 | 4.5 | 47.00 | |
ES-KNN | ✓ | - | - | 75.8 | 24.2 | 665.80 |
- | ✓ | - | 80.1 | 19.9 | 286.45 | |
- | - | ✓ | 85.3 | 14.7 | 191.27 | |
LDA | ✓ | - | - | 75.0 | 25.0 | 597.84 |
- | ✓ | - | 81.8 | 18.2 | 127.83 | |
- | - | ✓ | 94.4 | 5.5 | 106.57 | |
L-SVM | ✓ | - | - | 76.0 | 24.0 | 9723.70 |
- | ✓ | - | 88.0 | 12.0 | 3154.70 | |
- | - | ✓ | 91.6 | 8.6 | 2045.00 | |
Q-SVM | ✓ | - | - | 77.2 | 22.8 | 1896.00 |
- | ✓ | - | 87.6 | 12.4 | 1341.00 | |
- | - | ✓ | 92.0 | 8.0 | 753.57 | |
Cu-SVM | ✓ | - | - | 77.9 | 22.1 | 7493.00 |
- | ✓ | - | 87.7 | 12.3 | 3647.70 | |
- | - | ✓ | 92.3 | 7.7 | 1889.50 | |
F-KNN | ✓ | - | - | 75.7 | 24.3 | 152.06 |
- | ✓ | - | 84.9 | 15.1 | 96.96 | |
- | - | ✓ | 89.9 | 10.1 | 71.57 | |
M-KNN | ✓ | - | - | 74.8 | 25.2 | 57.95 |
- | ✓ | - | 84.5 | 15.5 | 47.44 | |
- | - | ✓ | 89.6 | 10.4 | 33.90 | |
W-KNN | ✓ | - | - | 76.8 | 23.2 | 228.19 |
- | ✓ | - | 85.7 | 14.3 | 187.50 | |
- | - | ✓ | 90.5 | 9.5 | 105.87 | |
Co-KNN | ✓ | - | - | 52.4 | 21.0 | 61.35 |
- | ✓ | - | 87.6 | 12.4 | 48.76 | |
- | - | ✓ | 92.8 | 7.2 | 23.83 |
Classifier | M1 | P-Fusion | P-Selection | Accuracy (%) | FNR (%) | Time (s) |
---|---|---|---|---|---|---|
ESD | ✓ | - | - | 99.0 | 15.5 | 85.09 |
- | ✓ | - | 99.5 | 1.0 | 68.31 | |
- | - | ✓ | 100.0 | 0.0 | 42.45 | |
E-S-KNN | ✓ | - | - | 96.7 | 3.3 | 45.09 |
- | ✓ | - | 97.6 | 2.4 | 38.31 | |
- | - | ✓ | 97.4 | 2.6 | 25.54 | |
LD | ✓ | - | - | 98.0 | 2.0 | 48.39 |
- | ✓ | - | 99.0 | 1.0 | 31.11 | |
- | - | ✓ | 100.0 | 0.0 | 23.92 | |
L-SVM | ✓ | - | - | 97.9 | 2.1 | 45.36 |
- | ✓ | - | 99.0 | 0.5 | 20.00 | |
- | - | ✓ | 100.0 | 0.0 | 17.66 | |
Q-SVM | ✓ | - | - | 84.5 | 1.0 | 51.03 |
- | ✓ | - | 99.3 | 0.7 | 24.06 | |
- | - | ✓ | 100.0 | 0.0 | 15.25 | |
Cub-SVM | ✓ | - | - | 99.0 | 1.0 | 54.59 |
- | ✓ | - | 99.5 | 0.5 | 43.32 | |
- | - | ✓ | 100.0 | 0.0 | 21.29 | |
F-KNN | ✓ | - | - | 96.2 | 3.8 | 41.47 |
- | ✓ | - | 97.4 | 2.6 | 19.58 | |
- | - | ✓ | 99.5 | 0.5 | 14.89 | |
M-KNN | ✓ | - | - | 97.6 | 2.4 | 32.30 |
- | ✓ | - | 98.8 | 1.2 | 17.31 | |
- | - | ✓ | 100.0 | 0.0 | 15.82 | |
W-KNN | ✓ | - | - | 97.9 | 2.1 | 23.96 |
- | ✓ | - | 99.3 | 0.7 | 13.10 | |
- | - | ✓ | 100.0 | 0.0 | 9.16 | |
Cos-KNN | ✓ | - | - | 95.7 | 4.3 | 31.08 |
- | ✓ | - | 99.0 | 1.0 | 22.00 | |
- | - | ✓ | 99.8 | 0.2 | 16.11 |
Classifier | M1 | P-Fusion | P-Selection | Accuracy (%) | FNR (%) | Time (s) |
---|---|---|---|---|---|---|
ESD | ✓ | - | - | 95.1 | 9.4 | 46.05 |
- | ✓ | - | 95.6 | 5.9 | 31.95 | |
- | - | ✓ | 98.0 | 2.0 | 19.53 | |
E-S-KNN | ✓ | - | - | 85.7 | 14.3 | 28.56 |
- | ✓ | - | 87.7 | 12.3 | 18.27 | |
- | - | ✓ | 88.7 | 11.3 | 13.08 | |
LD | ✓ | - | - | 70.9 | 29.1 | 48.44 |
- | ✓ | - | 94.1 | 4.6 | 22.42 | |
- | - | ✓ | 96.6 | 3.4 | 17.01 | |
L-SVM | ✓ | - | - | 91.6 | 8.4 | 40.02 |
- | ✓ | - | 94.6 | 5.4 | 29.65 | |
- | - | ✓ | 96.6 | 3.4 | 16.72 | |
Q-SVM | ✓ | - | - | 94.1 | 5.9 | 39.46 |
- | ✓ | - | 94.1 | 5.9 | 24.58 | |
- | - | ✓ | 96.6 | 3.4 | 18.80 | |
Cub-SVM | ✓ | - | - | 90.6 | 4.9 | 44.23 |
- | ✓ | - | 93.6 | 6.4 | 29.41 | |
- | - | ✓ | 97.0 | 3.0 | 21.51 | |
F-KNN | ✓ | - | - | 85.7 | 14.3 | 30.82 |
- | ✓ | - | 89.2 | 10.8 | 18.70 | |
- | - | ✓ | 94.1 | 5.9 | 13.79 | |
M-KNN | ✓ | - | - | 82.3 | 19.7 | 29.29 |
- | ✓ | - | 85.2 | 14.8 | 18.30 | |
- | - | ✓ | 92.1 | 7.9 | 10.83 | |
W-KNN | ✓ | - | - | 85.2 | 14.8 | 15.06 |
- | ✓ | - | 87.2 | 12.8 | 14.26 | |
- | - | ✓ | 94.6 | 5.4 | 10.12 | |
Cos-KNN | ✓ | - | - | 81.8 | 18.2 | 16.02 |
- | ✓ | - | 85.7 | 14.3 | 14.54 | |
- | - | ✓ | 94.1 | 5.9 | 10.55 |
Classifier | M1 | P-Fusion | P-Selection | Accuracy (%) | FNR (%) | Time (min) |
---|---|---|---|---|---|---|
ESD | ✓ | - | - | 51.34 | 48.66 | 608 |
- | ✓ | - | 63.97 | 36.03 | 524 | |
- | - | ✓ | 69.76 | 30.24 | 374 |
Classifier | M1 | P-Fusion | P-Selection | Accuracy (%) | FNR (%) | Time (min) |
---|---|---|---|---|---|---|
ESD | ✓ | - | - | 47.84 | 52.16 | 258 |
- | ✓ | - | 62.34 | 37.66 | 204 | |
- | - | ✓ | 68.80 | 31.2 | 111 |
Method | Features | Measures | ||
---|---|---|---|---|
P-Fusion | P-Selection | Accuracy (%) | FNR (%) | |
AlexNet | ✓ | - | 86.70 | 13.30 |
- | ✓ | 90.24 | 9.76 | |
Vgg16 | ✓ | - | 85.16 | 14.84 |
- | ✓ | 89.24 | 10.76 | |
ResNet50 | ✓ | - | 88.57 | 11.43 |
- | ✓ | 92.36 | 7.64 | |
ResNet101 | ✓ | - | 89.96 | 10.04 |
- | ✓ | 92.83 | 7.17 | |
Proposed | ✓ | - | 90.80 | 9.20 |
- | ✓ | 95.50 | 4.50 |
Method | Features | Measures | ||
---|---|---|---|---|
P-Fusion | P-Selection | Accuracy (%) | FNR (%) | |
AlexNet | ✓ | - | 61.29 | 38.71 |
- | ✓ | 65.82 | 34.18 | |
Vgg16 | ✓ | - | 60.90 | 39.10 |
- | ✓ | 64.06 | 35.94 | |
ResNet50 | ✓ | - | 61.82 | 38.18 |
- | ✓ | 65.71 | 34.29 | |
ResNet101 | ✓ | - | 61.98 | 38.02 |
- | ✓ | 66.25 | 33.75 | |
Proposed | ✓ | - | 62.34 | 38.71 |
- | ✓ | 68.80 | 34.18 |
Reference | Technique | Dataset | Accuracy (%) |
---|---|---|---|
Roshan et al. [25] | Fine-tuning on top layers | Caltech-101 | 91.66 |
Jongbin et al. [26] | Discrete Fourier transform | Caltech-101 | 93.60 |
Qun et al. [27] | Memory banks-based unsupervised learning | Caltech-101 | 91.00 |
Qing et al. [28] | PCA-based reduction on fused features | Caltech-101 | 92.54 |
Xueliang et al. [29] | A fusion of mid-level layers-based features | Caltech-101 | 92.20 |
Rashid et al. [8] | Fusion of SIFT and CNN features | Caltech-101 | 89.70 |
Svetlana [43] | Local affine parts-based approach | Butterflies | 90.40 |
Ma et al. [44] | Genetic CNN designer approach (70:30) | CIFAR-100 | 66.77 |
Alom et al. [45] | IRRCNN (70:30) | CIFAR-100 | 72.78 |
IRCNN (70:30) | CIFAR-100 | 71.76 | |
EIN (70:30) | CIFAR-100 | 68.29 | |
EIRN (70:30) | CIFAR-100 | 69.22 | |
Proposed | MLFFS | Butterflies | 98.00 |
Proposed | MLFFS | Birds | 100% |
Proposed | MLFFS | Caltech-101 | 95.5 |
Proposed | MLFFS (50:50) | CIFAR-100 | 65.46 |
- | MLFFS (60:40) | CIFAR-100 | 68.80 |
- | MLFFS (70:30) | CIFAR-100 | 73.16 |
- | MLFFS (80:20) | CIFAR-100 | 77.28 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rashid, M.; Khan, M.A.; Alhaisoni, M.; Wang, S.-H.; Naqvi, S.R.; Rehman, A.; Saba, T. A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection. Sustainability 2020, 12, 5037. https://doi.org/10.3390/su12125037
Rashid M, Khan MA, Alhaisoni M, Wang S-H, Naqvi SR, Rehman A, Saba T. A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection. Sustainability. 2020; 12(12):5037. https://doi.org/10.3390/su12125037
Chicago/Turabian StyleRashid, Muhammad, Muhammad Attique Khan, Majed Alhaisoni, Shui-Hua Wang, Syed Rameez Naqvi, Amjad Rehman, and Tanzila Saba. 2020. "A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection" Sustainability 12, no. 12: 5037. https://doi.org/10.3390/su12125037
APA StyleRashid, M., Khan, M. A., Alhaisoni, M., Wang, S. -H., Naqvi, S. R., Rehman, A., & Saba, T. (2020). A Sustainable Deep Learning Framework for Object Recognition Using Multi-Layers Deep Features Fusion and Selection. Sustainability, 12(12), 5037. https://doi.org/10.3390/su12125037