Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
LighterGAN: An Illumination Enhancement Method for Urban UAV Imagery
Next Article in Special Issue
Self-Matching CAM: A Novel Accurate Visual Explanation of CNNs for SAR Image Interpretation
Previous Article in Journal
The Road to Operationalization of Effective Tropical Forest Monitoring Systems
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Metric Learning with Online Hard Mining for Hyperspectral Classification

1
Hubei Subsurface Multi-Scale Imaging Key Laboratory, Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China
2
Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(7), 1368; https://doi.org/10.3390/rs13071368
Submission received: 14 March 2021 / Accepted: 30 March 2021 / Published: 2 April 2021
Graphical abstract
">
Figure 1
<p>Fundamental structure of convolutional neural network (CNN), which consists of a set of layers comprising convolutional layer, pooling layer, and fully connected layer.</p> ">
Figure 2
<p>The flowchart of the proposed method. All hyperspectral pixels with a single shape of <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> <mn>1</mn> <mo>×</mo> <mi>d</mi> </mrow> </semantics></math> are fed into the embedded network separately in order to obtain embedded feature space <math display="inline"><semantics> <mi mathvariant="sans-serif">Ε</mi> </semantics></math>. From the embedded feature space, the hardest triplets are selected by utilizing a random hardest negative sampling strategy. Finally, all the hardest triplets participate in the calculation of triplet loss in turn and propagate backward to achieve the purpose of obtaining the optimal network parameters.</p> ">
Figure 3
<p>Illustration of triplet loss. Anchor, positive sample, and negative sample are projected to the low-dimensional metric space through the model. In each epoch of training, pull the positive sample closer to the anchor while pushing the negative sample far away by triplet loss.</p> ">
Figure 4
<p>Visualization of Salinas scene. (<b>a</b>) Pseudo-color composite image; (<b>b</b>) ground truth map; (<b>c</b>) land-cover classes.</p> ">
Figure 5
<p>Visualization of Pavia University dataset. (<b>a</b>) Pseudo-color composite image; (<b>b</b>) ground truth map; (<b>c</b>) land-cover classes.</p> ">
Figure 6
<p>Visualization of Dioni scene. (<b>a</b>) Pseudo-color composite image; (<b>b</b>) ground truth map; (<b>c</b>) land-cover classes.</p> ">
Figure 7
<p>(<b>a</b>) Classification accuracy of different datasets under different margin values; (<b>b</b>) convergence performances of different training numbers per class in the Pavia University dataset.</p> ">
Figure 8
<p>Classification accuracy varied as a result of feature dimensions. (<b>a</b>) Salinas dataset; (<b>b</b>) Pavia University dataset; (<b>c</b>) HyRANK dataset.</p> ">
Figure 9
<p>The classification performances using different numbers of training samples for the Salinas dataset. (<b>a</b>) overall accuracy (OA); (<b>b</b>) average accuracy (AA); (<b>c</b>) kappa.</p> ">
Figure 10
<p>The classification performances using different numbers of training samples for the Pavia University dataset. (<b>a</b>) OA; (<b>b</b>) AA; (<b>c</b>) kappa.</p> ">
Figure 11
<p>The classification performances using different numbers of training samples for the HyRANK dataset. (<b>a</b>) OA; (<b>b</b>) AA; (<b>c</b>) kappa.</p> ">
Figure 12
<p>Classification results of all the methods for the Salinas dataset. (<b>a</b>) Ground truth map; (<b>b</b>) auto-encoder (AE); (<b>c</b>) recurrent neural network (RNN); (<b>d</b>) CNN; (<b>e</b>) deep metric learning with triplet network (DMLTN); (<b>f</b>) deep metric learning with online hard mining (DMLOHM).</p> ">
Figure 13
<p>Classification results of all the methods for the Pavia University dataset. (<b>a</b>) Ground truth map; (<b>b</b>) AE; (<b>c</b>) RNN; (<b>d</b>) CNN; (<b>e</b>) DMLTN; (<b>f</b>) DMLOHM.</p> ">
Figure 14
<p>Classification results of all the methods for the HyRANK dataset. (<b>a</b>) Ground truth map; (<b>b</b>) AE; (<b>c</b>) RNN; (<b>d</b>) CNN; (<b>e</b>) DMLTN; (<b>f</b>) DMLOHM.</p> ">
Versions Notes

Abstract

:
Recently, deep learning has developed rapidly, while it has also been quite successfully applied in the field of hyperspectral classification. Generally, training the parameters of a deep neural network to the best is the core step of a deep learning-based method, which usually requires a large number of labeled samples. However, in remote sensing analysis tasks, we only have limited labeled data because of the high cost of their collection. Therefore, in this paper, we propose a deep metric learning with online hard mining (DMLOHM) method for hyperspectral classification, which can maximize the inter-class distance and minimize the intra-class distance, utilizing a convolutional neural network (CNN) as an embedded network. First of all, we utilized the triplet network to learn better representations of raw data so that raw data were capable of having their dimensionality reduced. Afterward, an online hard mining method was used to mine the most valuable information from the limited hyperspectral data. To verify the performance of the proposed DMLOHM, we utilized three well-known hyperspectral datasets: Salinas Scene, Pavia University, and HyRANK for verification. Compared with CNN and DMLTN, the experimental results showed that the proposed method improved the classification accuracy from 0.13% to 4.03% with 85 labeled samples per class.

Graphical Abstract">

Graphical Abstract

1. Introduction

With the exponential development of hyperspectral remote sensing imaging technology, hyperspectral imaging spectrometers can capture high spatial resolution images with hundreds of narrow spectral bands. Meanwhile, hyperspectral remote sensing images have abundant spectral and structural information for the analysis and detection of features. As a result, hyperspectral images have been utilized for a wide variety of applications, such as precision agriculture [1], environmental monitoring [2], and mineral exploration [3,4]. Among these applications, one of the most attractive fields in the research of hyperspectral images is hyperspectral classification.
There have a lot of different hyperspectral classification algorithms being proposed. Depending on whether a priori knowledge is used or not, the popular hyperspectral classification algorithms comprise an unsupervised and supervised classification [5]. Unsupervised learning, in the absence of a given prior knowledge, automatically classifies or clusters the input data to find the model and law of the data. The more representative unsupervised algorithms are principal component analysis [6], locally linear embedding [7], and independent component analysis [8], which utilize the selected prominent features to reduce the dimensionality of original data. However, the classification accuracy of unsupervised algorithms is not as high as that of supervised classification algorithms. Supervised approaches utilize a group of training samples to classify input data for each category, such as maximum likelihood methods, support vector machine [9], sparse/spatial nonnegative matrix underapproximation [10], neural networks [11], and kernel-based methods [12,13,14]. For instance, Li et al. [15] proposed a generalized framework for composite kernel to flexibly balance spatial, spectral information, and computational efficiency. Although hyperspectral classification is widely studied, there are still two problems: (1) hundreds of narrow spectral bands leading to the “curse of dimensionality”, and (2) limited labeled samples for training.
To further handle the Hughes phenomenon [16] (when the number of training samples is limited, the performance of classification decreases as the feature dimension increases), researchers have extensively studied deep learning-based methods. The feature dimension refers to the number of features in the feature space. Deep learning is a hierarchical structure of deep neural networks, usually more than three layers deep. The hierarchical structure attempts to extract the deep features of the input data on a hierarchical basis. Deep learning is a rapidly developing research field that has shown usefulness in many research fields, such as computer vision and pattern recognition [17,18]. In the field of hyperspectral classification, there have been proposed many deep models. Yuan et al. [19] proposed a stacked auto-encoder (SAE) model for hyperspectral classification, where SAE was employed to obtain valuable advanced features. Since then, an increasing number of deep learning models have been proposed, such as deep belief network [20], recurrent neural network (RNN) [21], and convolutional neural network (CNN) [22,23,24]. Although methods based on deep learning have made great strides in dimensional reduction of the hyperspectral image, they all need numerous labeled samples to train many parameters, which is known in deep learning as the small sample set classification problem. Many strategies have been used to better tackle such problems. For instance, Li et al. [25] proved that by constructing pixel-pair samples, the number of training samples will increase significantly. More recently, a multi-grained network has been proposed as a hyperspectral classification method based on deep learning, with the aim of classifying hyperspectral data on a small scale [26]. Wu et al. [27] proposed a semi-supervised deep learning framework whereby large amounts of unlabeled data, with their pseudo labels, were used to pretrain a deep convolutional recurrent neural network, and then refine the network with the limited labeled data available.
In this paper, a model with online hard mining based on deep metric learning (DMLOHM) has been proposed, which is a powerful deep embedded model for extracting features and classifying pixel-level hyperspectral remote sensing images. The first thing to do is to obtain embedded feature space by feeding all samples into the embedded network separately. Secondly, a random hardest negative sampling strategy is utilized to select the hardest triplets from the embedded feature space, which ensures that all triplets are valid. Finally, to obtain the optimal parameters of the model, we used all the hardest triplets to train a deep metric learning-based model. The objective of the proposed model is to project the hyperspectral input features into Euclidian space where the mapping features have a minimum intra-class distance and a maximum inter-class distance. The online hard mining strategy is used to seek valid triplets (a triplet is composed of an anchor, a positive sample of the same class as the anchor, and a negative sample of the different class as the anchor) from mapping features while improving operational efficiency. In comparison to other related advanced methods, our proposed methodology comprises three key contributions, which are summarized below.
(1)
A model based on deep metric learning is proposed for hyperspectral classification. By utilizing the ability of the deep metric learning-based approach to maximize distances between classes and minimize distances within classes, one can effectively reduce the high dimensionality of hyperspectral data while achieving a high classification accuracy.
(2)
We introduce the idea of online hard mining for deep metric learning to mine the most discriminative triplets while improving the performance of triplet network. Triplets obtained through an online hard mining strategy are more effective with limited labeled data, significantly improving the classification accuracy.
(3)
The experimental results show that the proposed method is superior to other comparison methods for comparing multiple hyperspectral image datasets.
The remaining sections of this paper are constructed as follows. Section 2 presents the background of relevant research. Section 3 provides detailed information about the proposed deep metric learning-based model. Section 4 presents the experimental results of the proposed methodology using three actual hyperspectral datasets. Finally, the conclusions are described in Section 5.

2. Related Work

In our proposed approach, convolutional neural networks are used as an embedded network of the deep metric learning model. As a result, the fundamental structure of convolutional neural networks is first presented here. Next, we present the research related to deep metric learning and sample mining strategy.

2.1. Convolutional Neural Network

In recent times, CNN has achieved outstanding achievements in a wide range of applications, notably the analysis of images by remote sensing [28,29,30]. Therefore, CNN [22], which is stress-free handling of high-dimensional data, is used as an embedded network for our proposed model on the basis of deep metric learning. As Figure 1 shows, the fundamental structure of a CNN is constructed as a set of layers, consisting of convolutional layer, pooling layer, and fully connected layer.

2.1.1. Convolutional Layer

These are particularly important in the extraction of feature. The first convolutional layer typically obtains low-level features, while the high-level features can be extracted from the deeper convolutional layers by combining low-level features. In a convolutional layer, the connection between each neuron and the local patch in the feature map of the previous layer is by means of a group of convolutional kernels. Next, the result of this locally weighted sum passes by a nonlinearity operation, such as a hyperbolic function (tanh) and rectified linear unit (ReLU). In a feature map, all neurons share the same convolutional kernels. At the meantime, different feature maps commonly utilize different convolutional kernels in the convolutional layer. Thus, the output volume generated by the layer l is calculated as z l = w l × z l 1 + b l , where w l is the convolutional kernel of layer l , z l 1 is the output volume of layer l 1 , and the bias matrix of the layer l is b l .

2.1.2. Pooling Layer

In general, there is a pooling layer after each convolutional layer, which is created by the calculation of some local non-linear operations on a small spatial region of the feature map. Reducing the dimension of the representation and establishing invariants for small translations or rotations is the objective of the pooling layer [30]. A commonly used pooling operation is the max-pooling operation that calculates the maximum of a local patch of units into a single feature map. The pooling results of the layer l is calculated as p l = max i z i l .

2.1.3. Fully Connected Layer

The last few layers of a convolutional neural network are usually fully connected layers, which helps to better aggregate the information conveyed at lower levels and to make final decisions.

2.2. Deep Metric Learning

Learning a self-defined distance metric that can be utilized to calculate the similarity between two samples is the goal of metric learning. For instance, Wang et al. [31] utilized a locality constraint to assure the local smoothness and preserve correlation between samples for traffic congestion detection. According to Weinberger and Saul [32], an appropriate distance metric can significantly improve the performance of many visual classification tasks. Since Hinton et al. [20] introduced deep learning concept in 2006, more and more deep models have been increasingly proposed. The aim of deep models is to learn valuable semantic representations of data that can then be used to distinguish between available classes [33,34,35]. However, such representations and the corresponding induction measures are often considered to be side effects of the classification task, rather than being explicitly investigated [36]. Therefore, Hadsell et al. [37] proposed the Siamese network variants for distinguishing between similar pairs and dissimilar pairs of examples, in which a contrastive loss is utilized to train the network. As the concepts of similarity and dissimilarity require context, Siamese networks are also sensitive to calibration. A triplet network model was proposed by Hoffer et al. [36], aiming at learning valuable representations through a comparison of distances. Deng et al. [38] utilized a triplet network based on metric learning and the mean square error (MSE) loss to classify hyperspectral image data, which significantly improves the results of limited labeled samples classification. Although triplet network have been successfully implemented as a deep metric learning model to perform classification tasks using only a small amount of training data [38], the triplet generation method is not efficient, which needs to be improved. Our proposed approach, which guarantees valid information about the features, utilizes a hard negative mining strategy with an online method to generate triplets.

2.3. Sampling Mining Strategy

Informative input samples, the structure of the network model, and a metric loss function [39] together constitute the concept of deep metric learning. While the loss function is critical for deep metric learning, the selection of informative samples is also vital for the classification or clustering task. Sampling strategies can improve the success rate of the network and the speed at which the network can be trained. Earlier, a sampling strategy of a random selection of positive and negative sample pairs was adopted in the Siamese network [40]. For face sketch synthesis, Wang et al. [41] reduced the time consumption by utilizing an offline random sampling strategy, which shows strong scalability. However, Simo-Serra et al. [42] pointed out that the learning process may slow down and be adversely affected after the network has achieved a level of acceptable performance. To overcome this issue, it is highly effective to employ more realistic training models with a better sampling strategy, such as semi-hard negative mining and hard negative mining, while using informative samples. Semi-hard negative mining [43] focuses on finding negative samples in the margin. False-positive samples determined by training data correspond to hard negative samples [39]. When the anchor is too near to the negative samples, the variance of the gradient is high, and the signal-to-noise ratio of the gradient is low. Thus, Manmatha et al. [44] proposed a distance-weighted sampling strategy to filter out noisy samples.
In summary, although we can create a good network model and architecture, the network learning ability is still limited by the discriminatory ability of the samples that are presented to the network. Thus, differentiated training samples of each category should be submitted to construct the network, in order for the network to be able to learn better and to obtain a representation of the features.

3. Deep Metric Learning with Online Hard Mining

To tackle the challenge of the Hughes phenomenon and limited labeled samples in hyperspectral classification, we constructed a model based on deep metric learning that embeds samples into a specific metric space in which the distance between any two samples can be characterized. At first, all samples are individually fed into the embedded network mentioned in Section 2.1, rather than in triplet form, to obtain embedded feature space Ε , which will reduce computational consumption. Secondly, the hardest triplets are selected from the embedded feature space by utilizing a random hardest negative sampling strategy, which ensures that all triplets are valid. Finally, the hardest triplets are used to train a deep metric learning-based model to obtain the optimal parameters of the model. Figure 2 shows the flowchart of our proposed method.

3.1. Deep Metric Learning-Based Model

Three same feed-forward embedding network instances with shared parameters form a triplet network, which is a typical model in deep metric learning. The embedding network is represented by f ( X ) d , which embeds a sample X into a d dimensional Euclidean space d . As a result, the input data form of the triplet network is { ( X a , X p , X n ) } , where ( X a , X p ) to have the same class labels and ( X a , X n ) have different class labels. The X a term is known as an anchor of a triplet. In a triplet network, it is important to calculate the distance D ( X a , X i ) i = { p , n } between the positive sample, negative sample, and the anchor.
D ( X a , X i ) = f ( X a ) f ( X i ) 2 2 , i = { p , n } ,
where 2 2 represents the Euclidean distance between two samples. After calculating the distances between the anchor, positive, and negative samples, one can calculate the standard loss function of the triplet network [44] as
t r i p l e t = max ( 0 , D ( X a , X p ) D ( X a , X n ) + m ) ,
where m is a margin, which is enforced between positive and negative pairs. Pulling the positive samples (green dot) closer to the anchor (blue dot) while pushing the negative samples (red square) far away is the objective of triplet loss. The blue arrow represents pulling in, and the red arrow represents pushing away, as shown in Figure 3.

3.2. Deep Metric Learning for Online Hard Mining

When training the triplet network mentioned above, one must often necessarily form triplets of anchor, positive samples, and negative samples at first, and then cast the triplet batches into triplet network to obtain the embedded features. Specifically, assuming that C (C is the number of generated triplets) triplets { ( X a , X p , X n ) } are generated in the manner described above, 3C convolutional neural network operations must be computed to obtain the triplets consisting of C embedded features. Then, the loss of these C triplets is calculated and finally backpropagated to the network. Generally, such a training procedure, which is known as an offline training strategy, is not efficient [43].
Therefore, Schroff et al. [43] proposed an online method for generating triplets. It is assumed that the C triplets { ( X a , X p , X n ) } are generated online. First, the embedded features of the C input samples are obtained by using a convolutional neural network, which is computed C times. Then, these C embedded features are used to generate triplets (up to a maximum of C 3 triplets). Compared to the traditional method of generating triplets, it appears that the online method of generating triplets reduces the number of operations by 2C times, but not all triplets generated by the online method is valid triplets. To overcome the problem of valid triplets, we combine the sampling strategy of Hermans et al. [45] with the online approach for generating triplets to form our deep metric learning with the online hard mining (DMLOHM) method, which can be seen in Algorithm 1. The core idea of valid triplets mining is to form batches by randomly sampling P (P is the total number of classes in all samples) classes, and then randomly sampling K (K is the number of samples by randomly sampling) samples of each class. Thus, the total number of samples in a batch is PK. For each sample in a batch, we can select the most challenging positive and negative samples in the batch. The final loss function of DMLOHM can be formulated as
θ = i = 1 P a = 1 K [ max p = 1 K D ( X a i , X p i ) min j = 1 P n = 1 K j i D ( X a i , X n j ) + m ] + ,
where [ ] + represents hinge function, if the value in the formula is less than 0 , then l θ is equal to 0 ; otherwise, vice versa.
For DMLOHM, the online method and hard triplet mining is adopted for efficiently generating valid triplets. Moreover, generating valid triplets allows for the production of more discriminative information. With the help of valid triplets, it is now possible to perform hyperspectral classification without many training samples.
Algorithm 1 A single iteration for training the DMLOHM
Input: Model: D θ
 The training set χ = { ( X 1 , y 1 ) , , ( X n , y n ) } , where y i { 1 , , K }
 Parameter: the value of margin m (set as 0.1)
Output: Updated model D θ
Begin
  1. Obtain the embedded features Ε through the embedded net.
  2. Utilize the random hardest negative sampling strategy to get the hardest positive samples and hardest negative samples from the embedded features Ε .
  3. Calculate the distance of anchor-positive samples and the distance of anchor-negative samples by (1).
  4. Calculate the loss θ by (3) with margin m.
  5. Update parameter sets θ by backpropagating through θ .
End

4. Experiments and Analysis

Due to several critical problems, including the “curse of dimensionality” and limited labeled samples for training, we proposed a deep metric learning-based method to tackle these two problems. We used high-dimensional hyperspectral images to verify the former problem and used different sample sampling strategies to verify the latter problem. Firstly, to demonstrate the performance of the proposed DMLOHM algorithm, we implemented experiments on three publicly available datasets, which were commonly used in hyperspectral classification, namely, the Salinas dataset, the Pavia University dataset, and the HyRANK dataset [46,47]. These datasets are described in detail as follows. Then, a detailed analysis was made for the hyperparameters in the model.

4.1. Dataset Description

4.1.1. Salinas Dataset

The Salinas dataset was acquired by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over Salinas Valley, California, USA [38], which can be downloaded from http://www.ehu.eus/ccwintco/index.php (accessed on 30 March 2021). There are 204 spectral bands available after removing the 20 water absorption bands. The spatial size of this dataset was 512 × 217, with a spatial resolution of 3.7 m. The pseudo-color composite image and ground truth map of the Salinas dataset can be seen in Figure 4. There were 16 types of land cover with 54,129 labeled pixels in total, and the specific information is shown in Table 1.

4.1.2. Pavia University Dataset

The Pavia University dataset was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight campaign over Pavia, Northern Italy [48], which can be download from http://www.ehu.eus/ccwintco/index.php (accessed on 30 March 2021). After we removed 12 noisy bands, 103 spectral bands remained in the Pavia University dataset. The spectral range of bands was between 430 and 860 nm. The spatial resolution and spatial size were 1.3 m and 610 × 340, respectively. The pseudo-color composite image and ground truth map of the Pavia University dataset can be seen in Figure 5. As shown in Table 2, there were nine types of land cover with a total of 42,776 labeled pixels.

4.1.3. HyRANK Dataset

The HyRANK dataset was acquired by the Hyperion sensor on the Earth Observing-1 satellite with a spatial resolution of 30 m. It includes five images, two of which (i.e., Dioni and Loukia) can be used as training hyperspectral images and three of which (i.e., Erato, Kirki, and Nefeli) can be used as validation hyperspectral images. Since only the Dioni has a sample size greater than 100 in each category, Dioni was chosen as our experimental data. The spatial size of the HyRANK dataset was 250 × 1376, with 176 spectral bands, which can be downloaded from http://www2.isprs.org/commissions/comm3/wg4/ (accessed on 30 March 2021). The pseudo-color composite image and ground truth map of the Dioni dataset can be seen in Figure 6. As Table 3 shows, there were 12 types of land cover with total of 20,024 labeled pixels.

4.2. Experimental Setting

To illustrate the efficiency of the proposed method for reducing hyperspectral dimensionality and classifying limited labeled samples, we compared DMLOHM with four deep learning classification algorithms, i.e., auto-encoder (AE), recurrent neural network (RNN) [21], CNN [22], and deep metric learning with triplet network (DMLTN) [36]. Because of the single-pixel hyperspectral data as sequential data, we chose Hu’s 1-D (one dimensional) CNN [22] as the embedded network of DMLOHM and DMLTN. In this paper, overall accuracy (OA), average accuracy of each class (AA), and kappa coefficient (kappa) were the performance metrics. The OA score assesses the overall classification accuracy, i.e., the number of samples correctly classified in all categories divided by the total size of testing samples and the average of each classification accuracy per class is the AA score. Kappa is a statistical measure relating to the degree of agreement of categorical items [49,50].
For the dimensionality reduction of the hyperspectral image, we performed the following experiments, setting extracted feature dimension to start at 1 dimension and then increment to 200 in 10 intervals, utilizing 85 samples per class as the training set. Since we were also interested in limited labeled samples classification, likewise, the experimental parameter setting was that randomly picking out a very few numbers of labeled samples per class (e.g., 10, 25, 40, 55, 70, 85, and 100) from the labeled set was to constitute the training set, and then the testing set consisted of the rest of the labeled samples. After considering the results of feature dimensions, we set the feature extraction dimension of the limited labeled sample classification to 128, which is abbreviated as Dim, in order to obtain stable and better classification results. Here, we took the average OA and kappa values of 10 experimental results as a measure of the performance of the different classification algorithms.

4.3. Parameter Setting and Convergence Analysis

As for training configuration, we ran our training procedures in a PyTorch environment with Adam optimization algorithm, and experiments were performed on Nvidia RTX2060 with memory usage limited to 6 GB. For the proposed model, the learning process was stopped after 200 training epochs without the validation set, and the learning rate was set as 0.001. The hyperparameter, margin value m, also affects the classification accuracy of DMLOHM. Thus, we performed an experimental analysis of this parameter, utilizing 85 samples per class as a training set while setting the feature extraction dimension as 128, and the result is shown in Figure 7a. From Figure 7a, we set the important margin value m as 1. As shown in Figure 7b, our DMLOHM approach converged smoothly in the training procedure.

5. Results

To tackle the “curse of dimensionality”, many academics have studied the ability of deep learning models to deal with this problem. Thus, in this paper, the dimensionality reduction effect of our proposed method was analyzed by comparing the classification accuracy of different algorithms from 1 to 200 dimensions. Experiments that compared the classification accuracy of various algorithms when sampling 10, 25, 40, 55, 70, 85, and 100 samples per class were also set up to analyze the ability of our proposed approach to deal with limited labeled samples classification problems.

5.1. Dimensionality Reduction

Figure 8a–c shows variation of classification OA values with differential extracted feature dimensions of Salinas dataset, Pavia University dataset, and HyRANK dataset, respectively. The dimension of the experiment was set to take one dimension as a single dimension, and then, starting with 10 dimensions, with 10 as an interval, until 200 dimensions were taken. The following separately analyzed the experimental results. For the Salinas dataset, when the feature dimension reached 10 dimensions, the classification accuracy of DMLOHM was comparable to that of CNN but much higher than those of AE, RNN, and DMLTN. For the Pavia University dataset, DMLOHM outperformed the other comparison algorithms when the feature dimension reached 10. DMLOHM achieved the best classification result of about 40 feature dimensions. Then, the classification accuracy of the DMLOHM algorithm gradually stabilized at a certain numerical range. For the HyRANK dataset, DMLOHM was able to achieve the best classification result when the number of feature dimensions reached 25, which was optimal earlier than any other algorithm.
What is obvious from Figure 8 is that the DMLOHM algorithm can effectively reduce the dimensions of hyperspectral data and improve the Hughes phenomenon. Meanwhile, our proposed method performed with better OA in most situations (when the feature dimension was greater than 10 dimensions).

5.2. Limited Labeled Samples Classification

Figure 9, Figure 10 and Figure 11 show the classification accuracies of the Salinas dataset, Pavia University dataset, and HyRANK dataset, respectively, including OA, AA, and kappa of all comparison methodologies in three datasets with different numbers of training samples per class. The feature extraction dimension of all the comparison methods was set up to 128. For the Salinas dataset and Pavia University dataset, the classification accuracy of DMLOHM always outperformed other algorithms with limited labeled samples, especially the Pavia University dataset. For instance, in terms of the difference in classification accuracy between DMLOHM and DMLTN, CNN was greater when the number of samples was 10 per class than when the number of samples was 85 per class. As for the Pavia University dataset, we can see that our DMLOHM method outperformed the comparison algorithms with the highest classification accuracy and the best robustness. Compared to CNN, which was used as the embedded network of DMLOHM, DMLOHM improved the classification accuracy by about 9%. Likewise, compared to DMLTN, DMLOHM substantially outperformed DMLTN, especially when the training sample was 100 per class, which improved the classification accuracy by 7.72%. The superior performance of the DMLOHM approach was also reflected on the HyRANK dataset, which is shown in Figure 11. From 10 training samples per class to 100 training samples per class, the classification accuracy of DMLOHM has always been better than other algorithms. That is, DMLOHM algorithm can boost the classification accuracy with limited training samples.
From the results of limited labeled samples classification, when the number of samples was small, our proposed method was able to effectively improve the classification accuracy.

5.3. Other Experiments

Table 4, Table 5 and Table 6 summarize the quantitative evaluation results of the three datasets, with 85 labeled samples per class, while reducing the original dimension to 128 dimensions. The indicators for quantitative analysis of the results were classification accuracy of each class, the average accuracy, the average OA with the corresponding standard deviation, and the average kappa coefficient with standard deviation. Here, all the experiments were repeated 10 times. The best results for each indicator are labeled in bold.
As shown in Table 4, the proposed approach achieved the best classification accuracy in most individual classes and obtained the highest OA and kappa coefficient values for the Salinas dataset. It was observed that DMLOHM classified the vinyard untrained much better than the other algorithms and reached 78.20%, being 5.4% higher than DMLTN. Moreover, the classification accuracy of each class showed that DMLOHM presented a higher accuracy with a robust classifier performance because of the lower standard deviation in most situation. From Table 5 and Table 6, we can observe from the comparison of results that our proposed approach presented the highest OA and kappa coefficient values. The classification accuracy of each class in these two datasets showed that DMLOHM presented a higher OA, AA, and kappa in most cases, but it did not show a strong dominance.
The thematic maps of the Salinas dataset are visually shown in Figure 12b–f. It follows that the proposed DMLOHM algorithm achieved the best classification results for most land cover classes. As for vinyard untrained land cover, most methods inaccurately classified it into grapes untrained land cover while DMLOHM was able to handle this aspect elegantly. The thematic maps of the Pavia University dataset are visually shown in Figure 13b–f. It can be observed that most methods incorrectly classified bare soil into meadow while DMLOHM was found to be a great way to solve this problem. Obviously, by comparing Figure 13e,f, we were able to see that the online hard mining strategy is vital to the DMLOHM method, helping it to achieve a great improvement in results. Simultaneously, the DMLOHM algorithm achieved the best classification results for most land cover classes. Figure 14b–f shows the thematic maps of the HyRANK dataset. We can see that the thematic map of DMLOHM was much closer to the ground truth map. It can effectively distinguish water and coastal water in that the spectral bands of them were found to be similar.
In short, deep metric learning and online hard mining strategy can greatly improve the classification accuracy of the embedded network.

5.4. Time Complexity

The time complexity of DMLOHM was also empirically tested, utilizing 85 training samples per class, which we compared with other approaches. The result in Table 7 shows that DMLOHM consumed a slightly longer period of time than DMLTN because of the extra online hard mining strategy. Although DMLOHM consumed a little more time than DMLTN and CNN, the classification accuracy was significantly improved by 0.14%–4.81%, thus proving that our work is worthy.

6. Discussion

We proposed a deep metric learning-based method aimed at improving dimensionality reduction and limited labeled samples classification accuracy using hyperspectral images in order to achieve good results.
For dimensionality reduction, we utilized a CNN based embedded network to map features into lower-dimensional feature space. However, just using a CNN-based embedded network was not enough. We also utilized an online triplet loss to control the over-fitting or under-overfitting. CNN is originally a good dimensionality reduction tool, and at the same time, we used better sampling strategies to make the network training more targeted, and only used less data to achieve a higher dimensionality reduction effect. The performance of DMLOHM was validated from the aspect of three classification metrics using three datasets The classification accuracy of DMLOHM outperformed other algorithms.
For limited labeled samples classification, both online hard mining strategy and deep metric network played an important part in solving this problem. Deep metric network can effectively make the same class more compact and the heterogeneous more scattered, which made the network better perform classification tasks. Simultaneously, the online hard mining strategy can provide hardest triplets for embedded network to train the whole model. Therefore, the classification accuracy of three datasets showed that our proposed method was better than other algorithms.
Finally, the difference between our proposed method and the others is that other algorithms directly classify hyperspectral data with cross-entropy loss or others, which does not consider the influence of intra-class distance and inter-class distance. Since our algorithm imposes constraints on intra-class distance and inter-class distance, our algorithm improves the classification accuracy obviously.

7. Conclusions

In this paper, we proposed a deep metric learning-based method for hyperspectral classification. Different from the traditional model, we utilized CNN as an embedded network to only extract features. For dimensionality reduction, an embedded CNN network was utilized to map the high-dimensional hyperspectral data to low-dimensional feature space, while online triplet loss was used to constrain the training process of the network, making the model more suitable for hyperspectral data. Online hard mining strategy was utilized to tackle the problem of limited labeled samples classification, which improved the classification accuracy under the condition of limited labeled samples. In future work, we will focus on the classification of spectral features in conjunction with a spatial feature to achieve further superiority in real applications.

Author Contributions

Conceptualization, Y.D. and C.Y.; methodology, Y.D. and C.Y.; writing—original draft preparation, Y.D. and C.Y.; writing—review and editing, Y.D. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grants 61801444 and 62071438, in part by the State Key Laboratory of Integrated Services Networks (Xidian University) under Grant ISN20-07, and in part by the Hong Kong Scholars Program under Grant XJ2018012.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this paper can be provided by Cong Yang ([email protected]) upon request.

Acknowledgments

The authors would like to thank the handing editor and all anonymous reviewers for providing insightful and helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Scherrer, B.; Sheppard, J.; Jha, P.; Shaw, J.A. Hyperspectral imaging and neural networks to classify herbicide-resistant weeds. J. Appl. Remote Sens. 2019, 13, 044516. [Google Scholar] [CrossRef]
  2. Tan, K.; Wang, H.; Chen, L.; Du, Q.; Du, P.; Pan, C. Estimation of the spatial distribution of heavy metal in agricultural soils using airborne hyperspectral imaging and random forest. J. Hazard. Mater. 2020, 382, 120987. [Google Scholar] [CrossRef]
  3. Tuşa, L.; Khodadadzadeh, M.; Contreras, C.; Rafiezadeh Shahi, K.; Fuchs, M.; Gloaguen, R.; Gutzmer, J. Drill-Core Mineral Abundance Estimation Using Hyperspectral and High-Resolution Mineralogical Data. Remote Sens. 2020, 12, 1218. [Google Scholar] [CrossRef] [Green Version]
  4. Li, X.; Huang, R.; Niu, S.; Cao, Z.; Zhao, L.; Li, J. Local similarity constraint-based sparse algorithm for hyperspectral target detection. J. Appl. Remote. Sens. 2019, 13, 046516. [Google Scholar] [CrossRef]
  5. Richards, J.A. Clustering and unsupervised classification. In Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 2013; pp. 319–341. [Google Scholar]
  6. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  7. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [Green Version]
  8. Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Netw. 2000, 13, 411–430. [Google Scholar] [CrossRef] [Green Version]
  9. Bennett, K.P.; Demiriz, A. Semi-supervised support vector machines. In Advances in Neural Information Processing Systems; Mitt Press: Cambridge, MA, USA, 1999; pp. 368–374. [Google Scholar]
  10. Casalino, G.; Gillis, N. Sequential dimensionality reduction for extracting localized features. Pattern Recognit. 2017, 63, 15–29. [Google Scholar] [CrossRef] [Green Version]
  11. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
  12. Mika, S.; Ratsch, G.; Weston, J.; Scholkopf, B.; Mullers, K.-R. Fisher discriminant analysis with kernels. In Proceedings of the Neural Networks for Signal Processing IX: Proceedings of the IEEE Signal Processing Society Workshop (Cat. No. 98th8468), Madison, WI, USA, 25–25 August 1999; pp. 41–48. [Google Scholar]
  13. Baudat, G.; Anouar, F. Generalized Discriminant Analysis Using a Kernel Approach. Neural Comput. 2000, 12, 2385–2404. [Google Scholar] [CrossRef]
  14. Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef] [Green Version]
  15. Li, J.; Marpu, P.R.; Plaza, A.; Bioucas-Dias, J.M.; Benediktsson, J.A. Generalized composite kernel framework for hyperspectral image classification. IEEE Transact. Geosci. Remote Sens. 2013, 51, 4816–4829. [Google Scholar] [CrossRef]
  16. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef] [Green Version]
  17. Han, J.; Yao, X.; Cheng, G.; Feng, X.; Xu, D. P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 1. [Google Scholar] [CrossRef]
  18. Wang, N.; Zha, W.; Li, J.; Gao, X. Back projection: An effective postprocessing method for GAN-based face sketch synthesis. Pattern Recognit. Lett. 2018, 107, 59–65. [Google Scholar] [CrossRef]
  19. Yuan, X.; Huang, B.; Wang, Y.; Yang, C.; Gui, W. Deep Learning-Based Feature Representation and Its Application for Soft Sensor Modeling With Variable-Wise Weighted SAE. IEEE Trans. Ind. Inform. 2018, 14, 3235–3243. [Google Scholar] [CrossRef]
  20. Hinton, G.E.; Osindero, S.; Teh, Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
  21. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef] [Green Version]
  22. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H.-C. Deep Convolutional Neural Networks for Hyperspectral Image Classification. J. Sens. 2015, 2015, 1–12. [Google Scholar] [CrossRef] [Green Version]
  23. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
  24. Li, Y.; Zhang, H.; Shen, Q. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Convolutional Neural Network. Remote. Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  25. Li, W.; Wu, G.; Zhang, F.; Du, Q. Hyperspectral Image Classification Using Deep Pixel-Pair Features. IEEE Trans. Geosci. Remote. Sens. 2016, 55, 844–853. [Google Scholar] [CrossRef]
  26. Pan, B.; Shi, Z.; Xu, X. MugNet: Deep learning for hyperspectral image classification using limited samples. ISPRS J. Photogramm. Remote. Sens. 2018, 145, 108–119. [Google Scholar] [CrossRef]
  27. Wu, H.; Prasad, S. Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification. IEEE Trans. Image Process. 2018, 27, 1259–1270. [Google Scholar] [CrossRef]
  28. Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.-S. Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities. arXiv 2020, arXiv:2005.01094. Available online: https://ieeexplore.ieee.org/document/9127795 (accessed on 30 March 2021). [CrossRef]
  29. Lu, X.; Sun, H.; Zheng, X. A Feature Aggregation Convolutional Neural Network for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote. Sens. 2019, 57, 7894–7906. [Google Scholar] [CrossRef]
  30. Cheng, G.; Yang, C.; Yao, X.; Guo, L.; Han, J. When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 2811–2821. [Google Scholar] [CrossRef]
  31. Wang, Q.; Wan, J.; Yuan, Y. Locality constraint distance metric learning for traffic congestion detection. Pattern Recognit. 2018, 75, 272–281. [Google Scholar] [CrossRef]
  32. Weinberger, K.Q.; Saul, L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 2009, 10, 207–244. [Google Scholar]
  33. Sundermeyer, M.; Schlüter, R.; Ney, H. LSTM neural networks for language modeling. In Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, OR, USA, 9–13 September 2012. [Google Scholar]
  34. Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Networks 2009, 20, 61–80. [Google Scholar] [CrossRef] [Green Version]
  35. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Mitt Press: Cambridge, MA, USA, 2014; pp. 2672–2680. [Google Scholar]
  36. Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In International Workshop on Similarity-Based Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2015; pp. 84–92. [Google Scholar]
  37. Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
  38. Deng, B.; Jia, S.; Shi, D. Deep Metric Learning-Based Feature Embedding for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 1422–1435. [Google Scholar] [CrossRef]
  39. Kaya, M.; Bilge, H. Şakir Deep Metric Learning: A Survey. Symmetry 2019, 11, 1066. [Google Scholar] [CrossRef] [Green Version]
  40. Bell, S.; Bala, K. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 2015, 34, 1–10. [Google Scholar] [CrossRef]
  41. Wang, N.; Gao, X.; Li, J. Random sampling for fast face sketch synthesis. Pattern Recognit. 2018, 76, 215–227. [Google Scholar] [CrossRef] [Green Version]
  42. Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F. Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 118–126. [Google Scholar]
  43. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  44. Manmatha, R.; Wu, C.; Smola, A.J.; Krahenbuhl, P. Sampling Matters in Deep Embedding Learning. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2859–2867. [Google Scholar]
  45. Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet Loss for Person Re-Identification. arXiv 2017, arXiv:abs/1703.07737. Available online: https://arxiv.org/abs/1703.07737 (accessed on 30 March 2021).
  46. Dong, Y.; Du, B.; Zhang, L.; Zhang, L. Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning. IEEE Transact. Geosci. Remote Sens. 2017, 55, 2509–2524. [Google Scholar] [CrossRef]
  47. Li, J.; Huang, X.; Gamba, P.; Bioucas-Dias, J.M.B.; Zhang, L.; Benediktsson, J.A.; Plaza, A. Multiple Feature Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1592–1606. [Google Scholar] [CrossRef] [Green Version]
  48. Cao, X.; Ge, Y.; Li, R.; Zhao, J.; Jiao, L. Hyperspectral imagery classification with deep metric learning. Neurocomputing 2019, 356, 217–227. [Google Scholar] [CrossRef]
  49. Lennon, R. Remote Sensing Digital Image Analysis: An Introduction; Esa/Esrin: Frascati, Italy, 2002. [Google Scholar]
  50. Dong, Y.; Liang, T.; Zhang, Y.; Du, B. Spectral-Spatial Weighted Kernel Manifold Embedded Distribution Alignment for Remote Sensing Image Classification. IEEE Trans. Cybern. 2020, 1–13. [Google Scholar] [CrossRef]
Figure 1. Fundamental structure of convolutional neural network (CNN), which consists of a set of layers comprising convolutional layer, pooling layer, and fully connected layer.
Figure 1. Fundamental structure of convolutional neural network (CNN), which consists of a set of layers comprising convolutional layer, pooling layer, and fully connected layer.
Remotesensing 13 01368 g001
Figure 2. The flowchart of the proposed method. All hyperspectral pixels with a single shape of 1 × 1 × d are fed into the embedded network separately in order to obtain embedded feature space Ε . From the embedded feature space, the hardest triplets are selected by utilizing a random hardest negative sampling strategy. Finally, all the hardest triplets participate in the calculation of triplet loss in turn and propagate backward to achieve the purpose of obtaining the optimal network parameters.
Figure 2. The flowchart of the proposed method. All hyperspectral pixels with a single shape of 1 × 1 × d are fed into the embedded network separately in order to obtain embedded feature space Ε . From the embedded feature space, the hardest triplets are selected by utilizing a random hardest negative sampling strategy. Finally, all the hardest triplets participate in the calculation of triplet loss in turn and propagate backward to achieve the purpose of obtaining the optimal network parameters.
Remotesensing 13 01368 g002
Figure 3. Illustration of triplet loss. Anchor, positive sample, and negative sample are projected to the low-dimensional metric space through the model. In each epoch of training, pull the positive sample closer to the anchor while pushing the negative sample far away by triplet loss.
Figure 3. Illustration of triplet loss. Anchor, positive sample, and negative sample are projected to the low-dimensional metric space through the model. In each epoch of training, pull the positive sample closer to the anchor while pushing the negative sample far away by triplet loss.
Remotesensing 13 01368 g003
Figure 4. Visualization of Salinas scene. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Figure 4. Visualization of Salinas scene. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Remotesensing 13 01368 g004
Figure 5. Visualization of Pavia University dataset. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Figure 5. Visualization of Pavia University dataset. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Remotesensing 13 01368 g005
Figure 6. Visualization of Dioni scene. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Figure 6. Visualization of Dioni scene. (a) Pseudo-color composite image; (b) ground truth map; (c) land-cover classes.
Remotesensing 13 01368 g006
Figure 7. (a) Classification accuracy of different datasets under different margin values; (b) convergence performances of different training numbers per class in the Pavia University dataset.
Figure 7. (a) Classification accuracy of different datasets under different margin values; (b) convergence performances of different training numbers per class in the Pavia University dataset.
Remotesensing 13 01368 g007
Figure 8. Classification accuracy varied as a result of feature dimensions. (a) Salinas dataset; (b) Pavia University dataset; (c) HyRANK dataset.
Figure 8. Classification accuracy varied as a result of feature dimensions. (a) Salinas dataset; (b) Pavia University dataset; (c) HyRANK dataset.
Remotesensing 13 01368 g008
Figure 9. The classification performances using different numbers of training samples for the Salinas dataset. (a) overall accuracy (OA); (b) average accuracy (AA); (c) kappa.
Figure 9. The classification performances using different numbers of training samples for the Salinas dataset. (a) overall accuracy (OA); (b) average accuracy (AA); (c) kappa.
Remotesensing 13 01368 g009
Figure 10. The classification performances using different numbers of training samples for the Pavia University dataset. (a) OA; (b) AA; (c) kappa.
Figure 10. The classification performances using different numbers of training samples for the Pavia University dataset. (a) OA; (b) AA; (c) kappa.
Remotesensing 13 01368 g010
Figure 11. The classification performances using different numbers of training samples for the HyRANK dataset. (a) OA; (b) AA; (c) kappa.
Figure 11. The classification performances using different numbers of training samples for the HyRANK dataset. (a) OA; (b) AA; (c) kappa.
Remotesensing 13 01368 g011
Figure 12. Classification results of all the methods for the Salinas dataset. (a) Ground truth map; (b) auto-encoder (AE); (c) recurrent neural network (RNN); (d) CNN; (e) deep metric learning with triplet network (DMLTN); (f) deep metric learning with online hard mining (DMLOHM).
Figure 12. Classification results of all the methods for the Salinas dataset. (a) Ground truth map; (b) auto-encoder (AE); (c) recurrent neural network (RNN); (d) CNN; (e) deep metric learning with triplet network (DMLTN); (f) deep metric learning with online hard mining (DMLOHM).
Remotesensing 13 01368 g012
Figure 13. Classification results of all the methods for the Pavia University dataset. (a) Ground truth map; (b) AE; (c) RNN; (d) CNN; (e) DMLTN; (f) DMLOHM.
Figure 13. Classification results of all the methods for the Pavia University dataset. (a) Ground truth map; (b) AE; (c) RNN; (d) CNN; (e) DMLTN; (f) DMLOHM.
Remotesensing 13 01368 g013
Figure 14. Classification results of all the methods for the HyRANK dataset. (a) Ground truth map; (b) AE; (c) RNN; (d) CNN; (e) DMLTN; (f) DMLOHM.
Figure 14. Classification results of all the methods for the HyRANK dataset. (a) Ground truth map; (b) AE; (c) RNN; (d) CNN; (e) DMLTN; (f) DMLOHM.
Remotesensing 13 01368 g014
Table 1. Land-cover classes and numbers of samples in Salinas dataset.
Table 1. Land-cover classes and numbers of samples in Salinas dataset.
ClassColorNameTotal Number of Samples
C1 Broccoli green weeds 12009
C2 Broccoli green weeds 23726
C3 Fallow1976
C4 Fallow rough plow1394
C5 Fallow smooth2678
C6 Stubble3959
C7 Celery3579
C8 Grapes untrained11,271
C9 Soil vinyard develop6203
C10 Corn senesced green weeds3278
C11 Lettuce romaine 4wk1068
C12 Lettuce romaine 5wk1927
C13 Lettuce romaine 6wk916
C14 Lettuce romaine 7wk1070
C15 Vinyard untrained7268
C16 Vinyard vertical trellis1807
Total54,129
Table 2. Land-cover classes and numbers of samples in Pavia University dataset.
Table 2. Land-cover classes and numbers of samples in Pavia University dataset.
ClassColorNameTotal Number of Samples
C1 Asphalt6631
C2 Meadow18,649
C3 Gravel2099
C4 Trees3064
C5 Metal sheets1345
C6 Bare soil5029
C7 Bitumen1330
C8 Bricks3682
C9 Shadows947
Total42,776
Table 3. Land-cover classes and numbers of samples in HyRANK dataset.
Table 3. Land-cover classes and numbers of samples in HyRANK dataset.
ClassColorNameTotal Number of Samples
C1 Dense urban fabric1262
C2 Mineral extraction sites204
C3 Non-irrigated arable land614
C4 Fruit trees150
C5 Olive groves1768
C6 Coniferous forest361
C7 Dense sclerophyllous vegetation5035
C8 Sparce sclerophyllous vegetation6374
C9 Sparcely vegetated areas1754
C10 Rocks and sand492
C11 Water1612
C12 Coastal water398
Total20,024
Table 4. Classification results for the Salinas dataset with 85 labeled samples per class (about 2.5% of total sample) as the training set (DIM = 128).
Table 4. Classification results for the Salinas dataset with 85 labeled samples per class (about 2.5% of total sample) as the training set (DIM = 128).
AERNNCNNDMLTNDMLOHM
ClassMeanStdMeanStdMeanStdMeanStdMeanStd
C197.221.362798.651.285798.650.645398.531.846799.770.0483
C299.340.946898.780.528199.660.387499.071.101599.890.0276
C395.092.283496.930.908099.240.436899.150.428899.920.1227
C499.660.106999.540.217899.710.033799.690.035499.720.0538
C597.670.797296.770.358598.190.241897.134.232498.710.2078
C699.710.083299.580.313699.810.025899.740.049399.790.0063
C798.621.840599.190.153599.480.057999.480.108099.590.0658
C872.352.371571.046.457780.322.156369.8015.074576.655.5246
C998.840.518598.800.393399.380.216199.510.495299.890.0462
C1088.781.290086.231.343391.330.565592.110.960896.630.4062
C1193.611.499697.051.017397.240.397598.340.732698.570.2012
C1298.751.365597.341.530598.910.458499.110.778999.900.0583
C1398.460.187298.280.732999.230.278399.270.290999.660.0759
C1492.860.939193.001.580897.290.752697.520.540098.770.1197
C1563.123.393053.369.312669.193.567374.1916.500178.204.5129
C1695.373.374796.271.064498.430.105998.350.098398.680.0705
AA93.090.458092.540.313195.370.204595.060.506196.520.0751
OA87.270.051585.610.55990.580.026489.021.432191.630.0548
Kappa0.85820.00570.83960.00610.89500.00290.87790.01550.90680.0059
Table 5. Classification results for the Pavia University dataset with 85 labeled samples per class (about 1.5% of total sample) as the training set (DIM = 128).
Table 5. Classification results for the Pavia University dataset with 85 labeled samples per class (about 1.5% of total sample) as the training set (DIM = 128).
AERNNCNNDMLTNDMLOHM
ClassMeanStdMeanStdMeanStdMeanStdMeanStd
C167.883.152079.214.086081.140.998281.733.416084.151.4853
C271.503.810176.783.773582.940.996681.263.167786.691.1915
C360.8722.922470.757.668378.586.257479.497.764879.185.7180
C491.262.959290.903.213693.150.945694.791.601096.630.3333
C599.670.119199.460.120699.880.056699.830.045499.880.0566
C654.003.851167.124.545876.041.633083.662.983788.581.4022
C790.771.203886.575.462390.430.757591.742.263092.870.9842
C879.0013.329681.853.827886.083.380087.244.345586.986.1821
C999.990.0379100.000.000099.940.063299.980.050699.960.0580
AA79.441.734583.631.087387.580.375588.860.687390.550.3262
OA72.410.132978.610.102083.700.034084.251.263887.730.5049
Kappa0.64590.01510.72350.01140.78790.00400.79670.01500.84020.0062
Table 6. Classification results for the HyRANK dataset with 85 labeled samples per class (about 5.0% of total sample) as the training set (DIM = 128).
Table 6. Classification results for the HyRANK dataset with 85 labeled samples per class (about 5.0% of total sample) as the training set (DIM = 128).
AERNNCNNDMLTNDMLOHM
ClassMeanStdMeanStdMeanStdMeanStdMeanStd
C161.156.455569.482.057983.642.022480.602.721084.151.6429
C292.780.708493.701.494896.560.619896.640.792097.560.2656
C385.994.362287.073.950493.330.922494.081.298193.930.7888
C497.692.205296.921.918099.850.4870100.000.0000100.000.0000
C573.233.658972.684.031082.900.887581.981.950484.430.6411
C699.351.103698.410.7690100.000.000099.930.1518100.000.0000
C789.910.715190.310.904792.600.376891.490.410391.220.3840
C886.041.703084.021.678586.470.829688.051.287286.760.4438
C974.093.863880.322.708090.090.826990.202.667690.011.7745
C1094.132.320998.600.751199.560.153999.750.231099.680.2834
C1196.801.767898.581.369999.910.119897.515.805199.970.0827
C1294.154.106299.550.895098.880.458898.910.856099.620.4475
AA87.110.610589.140.574493.650.152693.260.468493.940.1763
OA84.760.062485.560.047089.950.026389.760.577089.890.2216
Kappa0.80910.00740.81960.00570.87430.00320.87180.00700.87360.0027
Table 7. Time consumption (seconds) of different methods.
Table 7. Time consumption (seconds) of different methods.
DatasetAERNNCNNDMLTNDMLOHM
Salinas251.8425239.506912.602619.989576.5048
Pavia University196.826492.36277.612511.869874.8602
HyRANK95.922190.66887.282315.379174.5944
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Dong, Y.; Yang, C.; Zhang, Y. Deep Metric Learning with Online Hard Mining for Hyperspectral Classification. Remote Sens. 2021, 13, 1368. https://doi.org/10.3390/rs13071368

AMA Style

Dong Y, Yang C, Zhang Y. Deep Metric Learning with Online Hard Mining for Hyperspectral Classification. Remote Sensing. 2021; 13(7):1368. https://doi.org/10.3390/rs13071368

Chicago/Turabian Style

Dong, Yanni, Cong Yang, and Yuxiang Zhang. 2021. "Deep Metric Learning with Online Hard Mining for Hyperspectral Classification" Remote Sensing 13, no. 7: 1368. https://doi.org/10.3390/rs13071368

APA Style

Dong, Y., Yang, C., & Zhang, Y. (2021). Deep Metric Learning with Online Hard Mining for Hyperspectral Classification. Remote Sensing, 13(7), 1368. https://doi.org/10.3390/rs13071368

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop