Abstract
Using convolutional neural networks (CNNs) in classifying hyperspectral images (HSIs) has achieved quite good results in recent years. It is widely used in agricultural remote sensing, geological exploration, environmental monitoring, and marine remote sensing. Unfortunately, the complexity of network structures used for hyperspectral image classification challenges the efficient delivery of HSI data extremely, and existing methods suffer from a large amount of redundancy in the network weight parameters during training, as they either require huge computational resources or make inefficient use of storage space when designing the network structure, and many of the parameters that waste computational resources contribute less to the rich spectral and spatial information transfer in HSI. So we introduce LCTCS, a better low-memory and less-parametric network approach. LCTCS aims to improve the efficiency of computational resource utilization with advanced classification performance and lower levels of computational resources. Unlike the conventional 2D and 3D convolution used previously, we use simple and efficient 3D grouped convolution as a vehicle to convey the semantic features of HSIs. More specifically, we design a novel two-channel sparse network to classify HSIs since grouped 3D convolution conveys the properties of hyperspectral data well in the time and space domains.We have compared LCTCS with eight widely used network methods on four publicly available hyperspectral datasets for learning HSI information. A series of experiments shows that the model architecture designed has \(65.89 \%\) less storage space than the DBDA method, consumes \(67.36 \%\) fewer computational resources than the SSRN method on the IP dataset, and accomplishes a highly accurate classification task with the number of parameters accounting for only \(1.99 \%\) that of the DBMA method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Unlike two-dimensional RGB ordinary three-channel images, hyperspectral images contain three-dimensional spatial cube data with rich one-dimensional spectral information and two-dimensional spatial information. With the continuous development of spectral processing technology, hyperspectral images have been widely used in medicine [1, 2], agricultural remote sensing [3, 4], geological exploration [5], environmental monitoring [6]. and marine remote sensing [7, 8], Furthermore, HSI classification research in the field of remote sensing is attracting attention. In earlier times, related researchers have applied many classical classification methods for HSI, such as K-nearest neighbors [9], decision trees[10], support vector machines (SVMs) [11, 12], and sparse expression-based [13] and Bayesian estimation[14, 15], methods. were proposed. However, these methods are based on the shallow features of spectral information and fail to exploit the spatial features of the HSIs at a deeper level, which, in turn, leads to less-than-ideal classification accuracy.
CNNs in the field of deep learning have achieved remarkable results in extracting layered and nonlinear features, such as face recognition, autonomous driving, and drone navigation [16, 17]. In HSI classification, Zhang et al. utilized autoencoders (REA) [18], and Zhou et al. introduced stacked sparse self-encoders (SSAE) [19]with deep belief networks (DBN) designed by Chen et al. [20] to extract deeper features. However, although these methods acquire deeper local features to some extent, they change the HIS to its original data form. The rise of CNNs [21] not only captures spatial and spectral information [22] but also ensures that the original data structure of the HSI is not corrupted. Meanwhile, its weight-sharing highlight further reduces the number of parameters to be computed.
Hu et al. [23] implemented HSI classification using one-dimensional CNNs. Next, Sharma et al. [24] presented a two-dimensional convolutional network to learn HSI spatial features. Finally, Hamida et al. [25] used a joint three-dimensional convolutional approach to learn HSI spatial and spectral features. Although the 3D block convolution, droupt, and modified residual join used in the manuscript are not the most recent methods, we have designed a novel and low-cost spectral branch module and spatial dynamic convolution. Compared with other similar methods, the spatial dynamic convolution module in the manuscript can mine different spectral and spatial information according to the changes in characteristics of hyperspectral datasets and corresponding changes of filters.The three contribution points of the text are presented as follows:
-
1)
An end-to-end classification framework with good sparsity is designed by combining spatial branching with the spectral branching of grouped convolution with a design idea similar to residual networks and better adaptability to join the learning of spectral space for HSI data.
-
2)
The network structure LCTCS performs the HSI classification task effectively compared to other current methods and has been verified in four publicly available hyperspectral datasets.
-
3)
The network structure LCTCS has further reduced the number of parameters, calculations, and storage space, compared with other similar types of 3D grouped convolution methods.
2 Related Work
2.1 Architecture of HSI Classification Models
The main challenges in HSI classification arise from (1) the poor joint utilization of the spectral space [26] (2) the excessive complexity of the HSI models used for classification, and (3) the unsatisfactory classification results achieved with limited training samples.
In recent years, to capture more abstract spatial and spectral information, Roy et al. [27] proposed a HybridSN classification method for the first time by combining 2DCNN to learn features in the spatial domain of HSI with 3DCNN to learn features in the spatial and temporal domains simultaneously while reducing the complexity of the model to a certain extent. Zhong et al. [28] designed a spatial-spectral residual network (SSRN) to capture finer spectral-spatial information by training the deeper layer properties of the classification network using residual networks. Wang et al. [29] improved an end-to-end dense convolutional classification structure framework for spatial-spectral learning, which reused previous spectral-spatial features in a dense structure to improve the feature utilization rate. Zhang et al. [30] proposed a classification method with context-aware extraction of local features, which further improves the applicability of the end-to-end model by extracting features adaptively according to the classification target. Although most of the above classification model frameworks can alleviate the problem of joint spatial-spectral learning to some extent, they still suffer from the classification of limited samples and overly complex network models. Zheng et al. [31] introduced an adaptive attention mechanism end-to-end classification framework in spatial and spectral dimensions. The method focuses on spectral space learning according to spectral bands with pixel points, thereby alleviating the problem in limited training samples [32].With the rise of graph convolutional networks in deep learning, Ding et al. [33] proposed a hybrid mechanism based on graph filters to enable information sharing and interaction among different filters to solve the problem of small sample training, which allows for diverse feature learning.Other classification network constructs, such as full convolutional networks (FCNs) [34], recurrent neural networks (RNNs) [35], generative adversarial networks (GANs) [36], and capsule networks [37, 38], have also been successfully introduced into HSI classification. Moreover, Yang et al. [31] proposed a classification framework with a self-attentive mechanism focused on spectral information called(DBDA). Although these classification model frameworks effectively alleviate the problem of limited samples and improve the generalization and robustness of the models, they also suffer from the redundancy of the weight parameters and inefficient computation.
2.2 Reducing Methods of Computational Resource
Current conventional methods for reducing computational resources include model distillation [39], quantization [40], pruning [41], and parameter sharing [42]. Another way is to design a simple and efficient network architecture to reduce the network weight parameters, storage space, and computation to save computational resources.
To reduce the waste of computational resources, Liu et al. [43] proposed a migration learning framework for hyperspectral data at different band counts by training CNNs on the initial HSI data and fine-tuning the CNNs to reduce the amount of computation required through migration learning. Li et al. [44]proposed a deep two-channel dense network with a top and local concatenation approach (DDCD), which alleviates the redundancy of weight parameters to a certain extent but also suffers from excessive computational complexity. Meng et al. [45] proposed a lightweight modular approach with point convolution and depth-separable convolution instead of weight parameters for 3 \(\times \) 3 spatial convolutions. Liu et al. [46] proposed a multiheaded knowledge distillation classification framework that used a self-guided refinement network as a teacher network to distill into a compact student network and solved the computational overload problem by compressing the student. Yang et al. [47] designed an encoding strategy that encodes the information of connection operations between nodes in a computational unit, which can save training costs better when the training samples are limited by optimizing the weight-sharing parameters. Subba Reddy et al. [48]proposed a combination of the Aquila optimizer and a compressed cooperative deep CNN to learn spatial-spectral information, which reduces the computational time and memory space by reducing the loss function of the model and the learning complexity of the wavelet band with the Aquila optimizer. Wang et al. [49] designed a lightweight spectral-spatial attentional feature classification framework based on network framing search, which reduces computational complexity by adjusting different channel weights with multi-scale Ghost grouping and attentional modules. Mei et al. [40] proposed a stepwise activation quantization method that suppressed the input to the original network with nonlinear uniform quantization, thereby saving memory space. Although the above methods have alleviated the problem of computational resources to a certain extent in HSI classification, problems, like too many weighting parameters, poor storage space utilization, and great computational effort, still exist.
Our study has used a combination of grouped convolution and residual structures to construct the feature extraction blocks to reduce computation and storage space. Dynamic adaptive convolution is also used to perform multitrait fusion extraction to improve the efficiency of spectral space utilization.
3 Research Methods
3.1 Dimensionality Reduction Process of HSI Data
The true label (Ground truth) of HSI dataset X consists of T pixel points that contain \(\left\{ t_1, t_2, \ldots , t_a\right\} \in R^{1 \times 1 \times b}\),where b denotes the number of bands. The true label vector is \(\left\{ g_1, g_2, \ldots , g_a\right\} \in R^{1 \times 1 \times c}\),where c denotes the type of feature.In our work, the HSI annotated target pixel and neighborhood cube data are directly selected for initial feature extraction and \(p \times p\) convolution as \(3 \textrm{D}\) input to feature preprocessing instead of processing the HSI data by PCA with principal component analysis because of the rich spectral information and hundreds of bands contained in the HSI.The three-dimensional convolution formula is expressed as
where (p, q, r) denotes the position in space, \(w_{i j k}^{p q}\) is the weight of the ijk-th characteristic cube; \(v_{i j}^{v y z}\) represents the \(j-t h\) cubic block at spatial location (x, y, z) level \(i; b_{i j}\) denotes the j-th bias size at layer i; and \(P_i, Q_i, R_i\) refers to the height, width, and number of channels of the \(3 \textrm{D}\) convolution kernel, respectively. g (.) denotes the activation function. The size of the convolution kernel used in the feature preprocessing part of the paper is \(1 \times 1 \times 7\), and the step size is set to (1, 1, 2), thereby determining the height and width of the moving window for each convolutional kernel and resulting in repeated extraction of some local features during training, as well as a reduction in spectral dimensionality and refinement of spectral and spatial features.
3.2 Channel Attention Mechanisms
The HSI is input to the convolutional network as cubic blocks in the neighborhood, and the HSI contains rich spectral information as well as band redundancy. To improve the efficiency and accuracy of processing HSI information in the network framework,a channel attention mechanism similar to the dot product similarity in [11] is introduced to score and judge important spatial and spectral information, thus improving the accuracy of classification.The HSI annotated pixels with 3D input are taken as input from the neighborhood cube data \(t\left\{ t_1, t_2, \ldots , t_a\right\} \in R^{1 \times 1 \times b}\). The first input first layer of 3D convolution n-band information is expressed in two vectors, namely, K and V, in the form of key-value pairs in the form of \(H=\lceil \left( k_1, v_1\right) ,\left( k_2, v_2\right) , \ldots ,\left( k_N, v_N\right) \rceil ,(K, V)=\lceil \left( k_1, v_1\right) ,\left( k_2, v_2\right) , \ldots ,\left( k_N, v_N\right) \rceil \).The importance of the spectral and spatial features of the input are exhibited in the form of dot products and are normalized by the \(\alpha _i={\text {softmax}}\left( s_i\right) \) function. The weights of the important spectral and spatial elements are highlighted, and the weights are finally weighted and summed to obtain the final formula for determining the importance of the spatial spectral feature weights:
where \(q_i^{\textrm{T}}\) denotes every vector for the \(i-t h\) significant spectral and spatial feature in the 3D block processed by the first convolution layer.
3.3 Spectral Branching Modules
Considering the less contribution of the redundant parameters to the rich spectral and spatial information transfer in HSI. The manuscript designs the spectral branching module using a simple and efficient 3D grouped convolution to solve the problem of parameter redundancy caused by the redundant number of channels in the training process of 3D convolutional networks. Grouped convolution first came from AlexNet [50] in 2012, where the authors divided multiple feature maps to multiple GPUs for processing to cover the limited hardware resources, and finally fused the computed results.3DCNN The grouped convolutional network is similar to the AlexNet [50]such as the HSI data feature maps, which are inputted through \(c_1\) channel filter, are divided into S groups and accordingly for each filter channel is divided into S groups, with each channel convolving in groups with the corresponding convolutional kernel and each group convolving independently without interfering with each other. \(c_2\) that filters into the convolution should generate \(c_2\) feature maps, and the last step generates feature maps for fusion superposition so that the feature cubes are generated the same as the standard convolution. The parameter reduction module is shown in Fig. 2. We assume that the size of the his feature cube input to the nth layer of the ordinary 3D convolution is \(H_n \times W_{\textrm{n}} \times C_n\) (height, width, channel) and the size of thisHSI feature cube to the (n+1)-th layer is \(H_{n+1} \times W_{\textrm{n}+1} \times C_{n+1}\). The filter size kernels are \(M_n \times M_n \times \textrm{d}_n\) and \(M_{\textrm{n}+1} \times M_{n+1} \times \textrm{d}_{n+1}\),the spectral branching structure moves one step in a \(3 \textrm{D}\) convolutional kernel window, and the number of computed pixel points (Flops) is presented as follows:
The covariance of the \(3 \textrm{D}\) convolution kernel at this spatial location is calculated as:
If the corresponding number of 3D convolutional channels is divided into S groups, that is, \(C_n=C_n / S\), then the filters that correspond to feature map extraction are also divided into S groups that do not interfere with each other. At this time, the parametric At this time, the parametric number of convolutional kernels is calculated as:
According to Equations (4) and (5), \(GrPa=\frac{1}{C}Pa\). The calculation and number of parameters are reduced to \(\frac{1}{S}=\left( V \times M_n \times M_n \times \textrm{d}_n \times \frac{1}{S}\right) /\left( M_{\textrm{n}+1} \times M_{n+1} \times \textrm{d}_{n+1} \times V\right) \) represents the pixel points in the HSI for which the classification sample is valid. The manuscript combines a 3D grouped convolutional layer with BatchNorm and Relu as a separate unit to simplify the computation, because the use of the Relu activation function increases the sparsity of the network when the neurons are trained. Figure 2 shows that only 1/s of each group of filters that should have participated in the convolution calculation after being divided into S groups, hence, resulting in better sparsity for grouped convolution than for normal convolution. In some cases, the use of grouped convolution can remove more redundant parameters in the case of learning important spectral and spatial feature information, because ordinary 3D convolutional networks have redundant parameters and channels.
3.4 Spatial Branching and Classification Module
To cut the overhead in future training and reduce parameter redundancy, The manuscript makes the following modifications to the residual block structure by using \(\bigoplus \) (a residual block-like connection in Fig. 3) to denote the unit summation operation and \(T_i\) to represents the input hyperspectral 3D data set block and replacing Relu with Droupt3d.
After the introduction of Droupt3d in the cropping layer, some channels are randomly set to zero, which is equivalent to randomly discarding some channels to make the whole spatial module network structure sparse, playing a role similar similar effect to regularization. Additionally, we remove the linear activation ReLU after the addition of the traditional residual structure to allow spatially localized features to be preserved and not discarded, thus enabling feature reuse to work well. The convolution part also uses a 1\(\times \)1\(\times \)7 convolution kernel to refine the spatial dimension of the feature blocks for dimensionality reduction. The residual equation is expressed as where \(h\left( t_l\right) \) represents the 3D convolutional 1\(\times \)1\(\times \)7 direct mapping part, \(\mathcal {F}\left( t_l, W_l\right) \) represents the residual component, and \(W_l\) represents the weights of the residual part of the 3D convolution layer.
In the classification module, we take the feature cubes from the spectral mode branching and the spatial branching feature cubes and perform a Concatenate operation to fuse the spatial and spectral information into a dynamic grouped 3D convolutional layer in the classification module. The dynamic 3D convolution layer adjusts the size of the convolution kernel dynamically according to the different feature cubes to deliver various spectral and spatial information; then, it is sent to the global average pooling layer. All feature cubes processed by the dynamic convolution layer are reduced dimensionally and finally fed to the linear layer to output the classification results. This paper uses the current popular cross-entropy loss function, which is defined as:
where \(\left\{ g_1, g_2, \ldots , g_a\right\} \in R^{1 \times 1 \times c}\) represents the true label vector, c represents the type of feature, and \(\left\{ p_1, p_2, \ldots , p_a\right\} \in R^{1 \times 1 \times c}\) represents a forecast value (prediction).
3.5 LCTCS Network Structure
This section focuses on the details of the superiority of the designed LCTCS network, as shown in Fig. 4 and Table 1.
The cubic block data of size (200\(\times \)9\(\times \)9,1) in HSI is inputted to the feature preprocessing 3D convolution layer (1\(\times \)1\(\times \)7,24), and the output size obtained after convolution operation is (9\(\times \)9\(\times \)97,24). The size of the feature cube after 3D convolution and dimensionality reduction is 97\(\times \)9\(\times \)9. Subsequently, the resulting cubes are sent to the channel intention force mechanism for processing to highlight the important spectral features with the weighting coefficients of the spatial features. Then, we feed the obtained output results into the upper spectral branch module and the lower spatial branch module sequentially, where the spectral branch module takes the grouped convolution layer BN layer and the linear activation layer ReLU as a separate cell and feeds the first cell divided into three groups of convolutions with a (9x9x97,24) 3D block to obtain the output results of (9\(\times \)9\(\times \)97,12) and then feeds the output into the second cell to further refine the spectral and spatial feature cubes. Meanwhile, to further the sparse network and save computational resources, the third independent unit uses the group convolution method with S=6 to refine the features with a 3D block size of (9\(\times \)9\(\times \)97,12) and then output them with the same size. Finally, in the spatial branching module, we send the 3D convolution with (9\(\times \)9\(\times \)97,24) 3D blocks to remove the linear activation layer. The significance of removing the ReLU layer is to make some of the neurons nonzero to increase the correlation between the parameters, which can extract some of the features of the HSI in space accurately. The output (9\(\times \)9\(\times \)97,24) obtained after the first two convolutional modules is superimposed with the 3D block fusion processed by the same size channel attention mechanism to reuse the previous features and then fed to the next 3D convolutional layer in the same form. At this time, the superimposed result (9\(\times \)9\(\times \)97,24) is fed into the grouped S=6 convolutional layer.
The (9\(\times \)9\(\times \)97,12) size and the 3D concatenated (9\(\times \)9\(\times \)97,12) feature block from the spectral module are input into the dynamic grouping convolution layer, and the convolution kernel used in this layer will keep changing with the different number of hyperspectral data bands to adapt to different data cubes. Finally, the final 1\(\times \)16 2D feature map is obtained by global pooling and a linear layer.
The significance of removing the Relu layer is to make some of the neurons non-zero to increase the correlation between the parameters, which can extract some of the features of the HSI in space accurately. The output (9\(\times \)9\(\times \)97,24) obtained after the first two convolutional modules are superimposed with the 3D block fusion processed by the same size channel attention mechanism to reuse the previous features and then fed to the next 3D convolutional layer in the same form. At this time, the superimposed result (9\(\times \)9\(\times \)97,24) is fed to the grouped S=6 convolutional layer. The (9\(\times \)9\(\times \)97,12) size and the 3D Concatenate(9\(\times \)9\(\times \)97,12) feature block from the spectral module is inputted to the dynamic grouping convolution layer, and the convolution kernel used in this layer will keep changing with the different number of hyperspectral data bands to adapt to different data cubes. Finally, the final (1\(\times \)16) 2D feature map is obtained by global pooling and linear layer. The detailed Settings of network parameters in the experiment are shown in Table 2
4 Experimental Demonstration
All experiments were conducted on the software environment Windows 10 and the PyCharm integrated development environment. CPU: i7-11700k, GPU: RTX3080Ti, RAM32GB, Memory28GB, Python: 3.8.13, Torch: 1.11.0+cu113.
4.1 Hyperspectral Dataset
To verify the effectiveness of the proposed method, four widely used hyperspectral public datasets, namely, the IndianPines (IP) dataset, PaviaU (PU) dataset, Botswana (BS) dataset, and Salina (SA) dataset, are used for experimental validation. The details of the four datasets are presented as follows:
-
1)
Indian Pines (IP): Acquired by the airborne infrared spectrometer, the AVIRIS sensor at the Indian Pines test site in northwestern Indiana, USA, has 145 \(\times \) 145 pixels and 200 spectral reflection bands. The spectral coverage ranges from 0.4 to 2.5 m, with a true ground classification (the ground truth) of 16 classes of cover vegetation (Fig. 5).
-
2)
PaviaU (PU): This dataset is a portion of the hyperspectral data collected by the German airborne Reflective Optics Spectrographic Imaging System (ROSIS-03) in 2003 on features in the city of Pavia, Italy. The spectral imager continuously images 115 bands in the wavelength range of 0.43 to 0.86 m with a spatial resolution of 1.3 m. The dataset size is 610\(\times \)340\(\times \)103. The ground truth is classified into nine urban feature types (Fig. 6).
-
3)
Botswana (BS): A series of data acquired by NASA EO-1 satellite in 2001-2004 at the Okavango Delta in Botswana, with 1476\(\times \)256 pixels with 145 bands. The spectral wavelength imaging ranges from 0.4 to 2.5 m with a spatial resolution of 30 m and the ground truth classification has 14 classes of cover class features (Fig. 7).
-
4)
Salina (SA): This dataset was captured by the 224-band AVIRIS sensor over the Salinas Valley, California, with 512\(\times \)217 pixels, of which 204 bands were used for the study. The spectral coverage range is 0.4 to 2.5 m. The spatial resolution size is 3.7 m. The true ground classification (the ground truth) has 16 crop categories (Fig. 8).
4.2 Experimental Settings
The evaluation metrics of all algorithms in our work use three metrics: overall accuracy (OA), average accuracy (AA), and k-score (Kappa) to measure the performance of each algorithm. LCTCS has been compared with currently used State-of-art methods: double-branch dual-attention mechanism network(DBDA) [31], Spectral-spatial residual network (SSRN) [28], 3-d-2-d cnn feature hierarchy (HybridSN) [27],HamidaEtAlNet [25], Double-branch multi-attention mechanism network (DBMA) [51], Double-Channel Dense Network(DDCD) [44], Dual Multi-Head Contextual Attention Network(DMuCA) [52],ast dense spectral-spatial convolution network framework (FDSSC) [53] and the classical support vector machine (SVM) [11].
The IP, PU, BS, and \(\textrm{SA}\) datasets are divided into training and test sets, and the IP dataset, \(5.00 \%\) of the samples of the PU dataset, \(9.00 \%\) of the BS dataset, and \(8.00 \%\) of the samples of the SA dataset are selected for training. The samples were used for testing for the remaining 90.00%, 95.00%, 91.00%, and 92.00%, respectively and the specific training and testing sample divisions are shown in Tables 3, 4, 5 and 6.
4.3 Comparison with the State-of-the-Art Methods for Different Data Sets Under Single Sample
This section mainly analyzes the classification graphs and classification results of different datasets under a single sample, and all experiments are run 10 times to obtain the mean and mean square deviation, which verifies the effectiveness of the method in our work under a single sample.
4.3.1 Classification Graph and Classification Results Under the IndianPines(IP) Dataset
The classification results of the IndianPines (IP) dataset under DBDA, SSRN, FDSSC, HybridSN, HamidaEtAlNet, DBMA, and SVM methods are shown in Table 7. The classification plots of real training labels (a) and test labels (b) with different methods are shown in Fig. 9. In FDSSC, DBDA and DBMA all three methods use conventional convolution, and the classification results obtained are lower than those of the proposed method, which is most likely due to the stacked fusion of each channel after group convolution, which makes the feature utilization increase.
4.3.2 Classification Graph and Classification Results Under the PaviaU(PU) Dataset
Classification plots and classification results under the PU dataset: the classification results for the PU dataset under different methods are shown in Table 8. The classification plots for real training labels (a) and test labels (b) with different methods are shown in Fig. 10. Table 8 and Fig. 10 illustrate that the SVM algorithm achieves the lowest classification results mainly due to its use of only one-dimensional spectral features, resulting in a large loss of spatial information. achieved better results than the traditional machine algorithm for joint learning in spectral space. However, its classification results were still lower than the algorithm proposed in our work, improving overall accuracy by \(1.38 \%\) over DBDA methods.
4.3.3 Classification Graph and Classification Results Under the Botswana(BS) Dataset
The results of classifying the BS dataset under different methods are shown in Table 9, and the classification plots of the real training labels (a), test labels (b), and different methods are shown in Fig. 11. By observing Table 9 and Fig. 11, the lowest classification result is still that of the traditional machine algorithm SVM, whereas the classification results of other comparison methods are above \(92.00 \%\), and most of the classification results of the methods designed in our work in each category are close to \(100.00 \%\).Meanwhile, the proposed LCTCS method still achieves the highest AA, OA, and KPa with minimum computation and number of parameters in the BS dataset, which further proves the robustness and generality of the proposed algorithm.
4.3.4 Classification Graph and Classification Results Under the Salina(SA) Dataset
The results of the classification of the SA dataset under different methods are shown in Table 10, and the classification plots of real training labels (a) and test labels (b) with different methods are shown in Fig. 12. In terms of the performance on the Table SA dataset, the results achieved by the proposed LCTCS method in our work are \(96.53 \%, 97.73 \%\), and \(96.14 \%\) for OA, AA, and KPa metrics, respectively, under \(8.00 \%\) training samples compared to DBMA, which improved by \(4.14 \%, 2.30 \%\), and \(4.58 \%\), respectively. The algorithm proposed in our work achieves desirable results above \(95.00 \%\) for most classes in terms of individual classification results.
4.4 Experimental Results Confusion Matrix Analysis
As can be seen from Fig. 13, the classification of sample points can be completed well in all sample categories on the IndianPines dataset. The larger the value of the right moment row, the larger the number of samples participating in the classification. It can be seen from the PU dataset that the dataset is highly rich and most of the categories are less than 6K. On BS data sets with relatively few samples, there are fewer misclassification points for each class. In the final SA data set, the real value and the predicted value are also in good agreement. This also shows that the algorithm has good generality and generalization.
4.5 Analysis of Experimental Results with Different Training Samples
To further illustrate the generalization and robustness of the method designed in our work, \(1.00 \%, 3.00 \%, 5.00 \%, 10\%\), and \(15.00 \%\), of data are randomly selected as training samples in four widely used HSI public datasets, IP, PU, BS, and SA. The classification accuracies are given without using the method under different training samples, as shown in Fig. 14.
-
1)
The classification results for different training samples under the IP dataset are shown in Fig. 14a, where the HybridSN method does not achieve particularly satisfactory classification results with the traditional machine learning algorithm SVM when the training sample is \(1.00 \%\). Additionally, the LCTCS method in our work achieves very good classification results when the sample is only \(1.00 \%\).
-
2)
The classification results of different training samples under the PU dataset are shown in Fig. 14b, and the LCTCS method in our work still has the best classification results when the training sample is \(1.00 \%\). HybridSN, FDSSC, and the spatial residual method SSRN also all achieve more than \(90.00 \%\) accuracy.
-
3)
The classification results of different training samples under the BS dataset are shown in Fig. 14c. The classification results of the LCTCS method proposed in this paper are not optimal when the training sample is \(1.00 \%\). However, the classification results achieved by the LCTCS method are still the best as the training sample size increases.
-
4)
The classification results for different training samples under the SA dataset, as seen in Fig. 14d, the proposed algorithm in our work still obtains optimal classification results when the training sample is \(1.00 \%\). With the increase in training samples, the overall accuracy was also improved to \(93.87 \%\). Meanwhile, the traditional machine algorithm SVM achieved only \(67.41 \%\) classification results in terms of performance on the SA dataset. An interesting phenomenon is that as the training sample increases from \(1.00 \%\) to \(3.00 \%\), the LCTCS method, similar to most other methods, rapidly increases the classification accuracy.
4.6 Computing Resource Analysis
Saving computational resources is a major advantage of the LCTCS method. To verify the advantage of the method in our work in saving computational resources, we make a comparison under the same input size of 103\(\times \)25\(\times \)25 (0.25 M), 200\(\times \)25\(\times \)25 (0.48 M), 145\(\times \)25\(\times \)25 (0.49 M), and 145\(\times \)25\(\times \)25 (0.35 M), and give the computational resource usage under the four datasets of PU, IP, BS, and SA.
-
1)
The specific use of FLOPs is shown in Fig. 15a. DBDA and DBMA’s floating point consumptions are the highest. The floating point consumption of the method designed in our work is the lowest in all four datasets, within 1000 M. This finding also proves the good generality and robustness of this paper’s method in terms of FLOPs.
-
2)
The specific use of video memory is shown in Fig. 15b. Although the HamidaEtAlNet method is the least expensive in terms of storage space consumption, it uses many redundant and cumbersome parameters and FLOPs. The proposed method in our work is not optimal in terms of storage space, but the storage space consumed compared with FDSSC, SSRN, DBDA, and other algorithms has nearly reached a more desirable result. The HybridSN storage space usage is worse.
-
3)
The details of the parameters are shown in Table 11. The number of method parameters in our work is the lowest under all four datasets. In particular, the number of parameters in the IP dataset is 9.516 K compared with the HamidaEtAlNet method with 2.191 M redundant parameters, which is only \(0.40 \%\). Thus, our method can be said to greatly alleviate the burden of computation and storage due to redundant parameters.
4.7 Ablation Experiments
To further illustrate the effectiveness of the proposed method in the work, we conducted a series of ablation experiments on the spectral module, the spatial module and the attentional mechanism module. As can be seen from Table 12, when ASe is not considered, the overall classification accuracy, average classification accuracy and kpa coefficient are \(95.94 \%\), \(96.14 \%\) and \(95.60 \%\), respectively, which are \(3.34 \%\), \(3.26 \%\) and \(3.60 \%\) lower than that when the spectral module is used in Spatial module and Attention Mechanism module (ASS). Comparing the classification results in ASe, ASa, SS and ASS classification, the best classification result in ASS is largely due to the feature reuse of spatial branches and the extraction of global spectral information by spectral branches.
4.8 Comparative Analysis of Model Running Time
The model running time of this experiment is counted after 100 iterations of each method. It can be seen from Table 13 that the support vector machine (SVM) method has the shortest running time under the four data sets because it decomposes HSI data into high-dimensional vector form and classifies hyperspectral ground objects through one or more hyperplanes. However, DBDA method consumes a relatively long time under the four data sets, because DBDA method consumes a lot of time for the calculation of the three-dimensional convolution function because of its dense block connection mode.
Although the performance of LCTCS on the four data sets is not the best, because we also expand the filter of the dynamic convolution to retain more band spectrum information when we design the dynamic convolution module, this leads to more time consumption, LCTCS, the modified residual sparse branch network greatly reduces the storage space during network model learning.
5 Conclusion
In our work, a novel HSI method called the LCTCS network, is proposed. This method utilizes an organic combination of a channel focus mechanism, simple and efficient grouped convolution, and dynamic convolution. The method utilizes normal 3D convolution to reduce dimensionality, channel focuses to highlight important spectral and spatial weights, the spectral and spatial modules of grouped convolution to extract global features, and the dynamic classification module to efficiently complete the HSI classification task. Using this research method can greatly alleviate the problem of computing resource waste in traditional HSI classification networks. Here are some of the conclusions that can be drawn.
-
1)
Experiments with single samples and multiple copies show that this method can effectively maintain advanced classification performance with fewer parameters, lower calculation costs, and smaller video memory occupation. At the same time, it also shows that this research method has good universality and generalization.
-
2)
Ablation experiments show that the synergistic effect of the spectral branch network and spatial branch network allows the model to achieve optimal performance, and the added channel attention mechanism can further improve the utilization efficiency of HSI features.
In the future, a more efficient attention mechanism and adaptive convolution module will be considered to be designed on top of the existing ones to improve the classification performance of HSI further. This also provides a new idea to utilize grouped convolution better to design more efficient network structures in other fields.
Data Availability
All data are generated by relevant algorithms. If you need to reproduce the experimental results, please contact the corresponding author.
References
Lv M, Li W, Chen T, Zhou J, Tao R (2021) Discriminant tensor-based manifold embedding for medical hyperspectral imagery. IEEE J Biomed Health Inform 25(9):3517–3528
Rehman A, Qureshi SA (2021) A review of the medical hyperspectral imaging systems and unmixing algorithms’ in biological tissues. Photodiagn Photodyn Ther 33:102165
Javed T, Li Y, Rashid S, Li F, Hu Q, Feng H, Chen X, Ahmad S, Liu F, Pulatov B (2021) Performance and relationship of four different agricultural drought indices for drought monitoring in china’s mainland using remote sensing data. Sci Total Environ 759:143530
Galieni A, D’Ascenzo N, Stagnari F, Pagnani G, Xie Q, Pisante M (2021) Past and future of plant stress detection: an overview from remote sensing to positron emission tomography. Front Plant Sci 11:1975
Arunbose S, Srinivas Y, Rajkumar S, Nair NC, Kaliraj S (2021) Remote sensing, GIS and AHP techniques based investigation of groundwater potential zones in the karumeniyar river basin, tamil nadu, southern india. Groundw Sustain Dev 14:100586
Halder B, Bandyopadhyay J, Banik P (2021) Monitoring the effect of urban development on urban heat island based on remote sensing and geo-spatial approach in kolkata and adjacent areas, india. Sustain Cities Soc 74:103186
Belkin IM (2021) Remote sensing of ocean fronts in marine ecology and fisheries. Remote Sens 13(5):883
Wang X, He X, Shi J, Chen S, Niu Z (2022) Estimating sea level, wind direction, significant wave height, and wave peak period using a geodetic gnss receiver. Remote Sens Environ 279:113135
Cariou C, Chehdi K (2016) A new k-nearest neighbor density-based clustering method and its application to hyperspectral images. In: 2016 IEEE International geoscience and remote sensing symposium (IGARSS), IEEE, pp. 6161–6164
Velásquez L, Cruz-Tirado J, Siche R, Quevedo R (2017) An application based on the decision tree to classify the marbling of beef by hyperspectral imaging. Meat Sci 133:43–50
Melgani F, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42(8):1778–1790
Ye Q, Zhao H, Li Z, Yang X, Gao S, Yin T, Ye N (2017) L1-norm distance minimization-based fast robust twin support vector \( k \)-plane clustering. IEEE Trans Neural Netw Learn Syst 29(9):4494–4503
Chen Y, Nasrabadi NM, Tran TD (2011) Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans Geosci Remote Sens 49(10):3973–3985
Dobigeon N, Moussaoui S, Coulon M, Tourneret J-Y, Hero AO (2009) Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery. IEEE Trans Signal Process 57(11):4355–4368
SahIn YE, Arisoy S, Kayabol K (2018) Anomaly detection with Bayesian gauss background model in hyperspectral images. In: 2018 26th signal processing and communications applications conference (SIU). IEEE, 1–4
Hariri W (2022) Efficient masked face recognition method during the covid-19 pandemic. SIViP 16(3):605–612
Li G, Yang Y, Qu X, Cao D, Li K (2021) A deep learning based image enhancement approach for autonomous driving at night. Knowl-Based Syst 213:106617
Zhang X, Liang Y, Li C, Huyan N, Jiao L, Zhou H (2017) Recursive autoencoders-based unsupervised feature learning for hyperspectral image classification. IEEE Geosci Remote Sens Lett 14(11):1928–1932
Zhou P, Han J, Cheng G, Zhang B (2019) Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(7):4823–4833
Chen Y, Lin Z, Zhao X, Wang G, Gu Y (2014) Deep learning-based classification of hyperspectral data. IEEE J Select Topics Appl Earth Observ Remote Sens 7(6):2094–2107
Sellami A, Tabbone S (2022) Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Patt Recogn 121:108224
Paoletti M, Haut J, Plaza J, Plaza A (2018) A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J Photogramm Remote Sens 145:120–147
Hu W, Huang Y, Wei L, Zhang F, Li H (2015) Deep convolutional neural networks for hyperspectral image classification. J Sens 2015:1–2
Sharma V, Diba A, Tuytelaars T, Van Gool L (2016) Hyperspectral cnn for image classification & band selection, with application to face recognition. Technical report KUL/ESAT/PSI/1604, KU Leuven, ESAT, Leuven, Belgium
Hamida AB, Benoit A, Lambert P, Amar CB (2018) 3-d deep learning approach for remote sensing image classification. IEEE Trans Geosci Remote Sens 56(8):4420–4434
Zhao C, Gao X, Emery WJ, Wang Y, Li J (2018) An integrated spatio-spectral-temporal sparse representation method for fusing remote-sensing images with different resolutions. IEEE Trans Geosci Remote Sens 56(6):3358–3370
Roy SK, Krishna G, Dubey SR, Chaudhuri BB (2019) Hybridsn: exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification. IEEE Geosci Remote Sens Lett 17(2):277–281
Zhong Z, Li J, Luo Z, Chapman M (2017) Spectral-spatial residual network for hyperspectral image classification: a 3-d deep learning framework. IEEE Trans Geosci Remote Sens 56(2):847–858
Wang J, Song X, Sun L, Huang W, Wang J (2020) A novel cubic convolutional neural network for hyperspectral image classification. IEEE J Select Topics Appl Earth Observ Remote Sens 13:4133–4148
Ding Y, Zhao X, Zhang Z, Cai W, Yang N, Zhan Y (2021) Semi-supervised locality preserving dense graph neural network with arma filters and context-aware learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–12
Li R, Zheng S, Duan C, Yang Y, Wang X (2020) Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens 12(3):582
Zhao C, Gao X, Wang Y, Li J (2016) Efficient multiple-feature learning-based hyperspectral image classification with limited training samples. IEEE Trans Geosci Remote Sens 54(7):4052–4062
Yao D, Zhi-li Z, Xiao-feng Z, Wei C, Fang H, Yao-ming C, Cai W-W (2022) Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification. Defence Technol 23:164–176
Jiang Y, Li Y, Zou S, Zhang H, Bai Y (2021) Hyperspectral image classification with spatial consistence using fully convolutional spatial propagation network. IEEE Trans Geosci Remote Sens 59(12):10425–10437
Hang R, Liu Q, Hong D, Ghamisi P (2019) Cascaded recurrent neural networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(8):5384–5394
Shi Y, Han L, Han L, Chang S, Hu T, Dancey D (2022) A latent encoder coupled generative adversarial network (le-gan) for efficient hyperspectral image super-resolution. IEEE Trans Geosci Remote Sens 60:1–9
Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F (2018) Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(4):2145–2160
Mei Z, Yin Z, Kong X, Wang L, Ren H (2022) Cascade residual capsule network for hyperspectral image classification. IEEE J Select Top Appl Earth Observ Remote Sens 15:3089–3106
Peng Y, Wang Y (2021) An industrial-grade solution for agricultural image classification tasks. Comput Electron Agric 187:106253
Mei S, Chen X, Zhang Y, Li J, Plaza A (2021) Accelerating convolutional neural network-based hyperspectral image classification by step activation quantization. IEEE Trans Geosci Remote Sens 60:1–12
Wang P, Shen X, Ni K, Shi L (2022) Hyperspectral sparse unmixing based on multiple dictionary pruning. Int J Remote Sens 43(7):2712–2734
Wang D, Wang J, Li W, Guan P (2021) T-cnn: trilinear convolutional neural networks model for visual detection of plant diseases. Comput Electron Agric 190:106468
Liu Y, Xiao C (2020) Transfer learning for hyperspectral image classification using convolutional neural network. In: MIPPR 2019: remote sensing image processing, geographic information systems, and other applications, 79–84;11432. SPIE
Wang K, Zheng S, Li R, Gui L (2021) A deep double-channel dense network for hyperspectral image classifica-tion 4(4):46–62
Meng Z, Jiao L, Liang M, Zhao F (2021) A lightweight spectral-spatial convolution module for hyperspectral image classification. IEEE Geosci Remote Sens Lett 19:1–5
Liu X, Zhang C, Cai Z, Yang J, Zhou Z, Gong X (2021) Continuous particle swarm optimization-based deep learning architecture search for hyperspectral image classification. Remote Sens 13(6):1082
Zhang C, Liu X, Wang G, Cai Z (2020) Particle swarm optimization based deep learning architecture search for hyperspectral image classification. In: IGARSS 2020-2020 IEEE international geoscience and remote sensing symposium. IEEE, 509–512
Subba Reddy T, Harikiran J, Enduri MK, Hajarathaiah K, Almakdi S, Alshehri M, Naveed QN, Rahman MH (2022) Hyperspectral image classification with optimized compressed synergic deep convolution neural network with aquila optimization. Comput Intell Neurosci, 2022
Wang J, Huang R, Guo S, Li L, Zhu M, Yang S, Jiao L (2021) Nas-guided lightweight multiscale attention fusion network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(10):8754–8767
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Ma W, Yang Q, Wu Y, Zhao W, Zhang X (2019) Double-branch multi-attention mechanism network for hyperspectral image classification. Remote Sens 11(11):1307
Liang M, He Q, Yu X, Wang H, Meng Z, Jiao L (2022) A dual multi-head contextual attention network for hyperspectral image classification. Remote Sens 14(13):3091
Wang, W, I Dou S, Id Z, Jiang L (2019) Sun: remote sensing a fast dense spectral-spatial convolution network framework for hyperspectral images classification
Funding
The National Natural Science Foundation of China (Nos. 62166005), Joint Open Fund Project of Key Laboratories of the Ministry of Education (Nos.[2020]248), The Guizhou University Talents Project (Nos. GRJH[2020]14), Science and Technology Project of Guizhou Province (Nos. QKH-ZK[2022]130, QKH[2021]335), Guizhou University Cultivation Project (Nos. GDP[2019]22), Developing Objects and Projects of Scientific and Technological Talents in Guiyang City (Nos.ZKHT[2023]48-8)
Author information
Authors and Affiliations
Contributions
Jie Sun: Data Curation, Methodology, Writing—Original Draft; Jing Yang: Methodology, Software, Investigation, Formal Analysis, Writing—Original Draft; Wang Chen: Resources, Methodology, Supervision; Yifan Wang: Software, Validation; Shaobo Li and Jianjun Hu: Visualization, Investigation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sun, J., Yang, J., Chen, W. et al. LCTCS: Low-Cost and Two-Channel Sparse Network for Hyperspectral Image Classification. Neural Process Lett 56, 181 (2024). https://doi.org/10.1007/s11063-024-11631-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s11063-024-11631-y