Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Soil Marginal Effect and LSTM Model in Chinese Solar Greenhouse
Previous Article in Journal
Three-Dimensional Reconstruction and Visualization of Underwater Bridge Piers Using Sonar Imaging
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction

College of Information Engineering, Sichuan Agricultural University, Ya’an 625014, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2024, 24(14), 4728; https://doi.org/10.3390/s24144728
Submission received: 26 June 2024 / Revised: 19 July 2024 / Accepted: 19 July 2024 / Published: 21 July 2024
(This article belongs to the Section Environmental Sensing)
Figure 1
<p>The initial absorbance spectra and the seven corresponding spectral preprocessing methods. The 5th, 16th, 50th, 84th, and 95th percentiles are depicted.</p> ">
Figure 2
<p>The procedure for converting a visible–near-infrared spectral sequence into a GADF image is as follows: (<b>a1</b>) is the original spectral sequence, (<b>a2</b>) is the spectral sequence after PAA dimensionality reduction, (<b>a3</b>) is the polar coordinate transformation, and (<b>a4</b>) is the resulting GADF image.</p> ">
Figure 3
<p>The overall framework of the CNNSANet.</p> ">
Figure 4
<p>Multi-scale spatial selection mechanism model.</p> ">
Figure 5
<p>Multi-scale channel information fusion model.</p> ">
Figure 6
<p>RMSE and R<sup>2</sup> comparison between 1D raw spectral data and 2D single-channel GADF images constructed using the same 1D raw spectral data as inputs.</p> ">
Figure 7
<p>Boxplot of prediction accuracies for different properties of 2D inputs constructed from spectral information obtained using various preprocessing methods and raw spectral information.</p> ">
Figure 8
<p>Training and validation losses of the CNNSANet model for seven soil properties.</p> ">
Figure 9
<p>Scatter plot of CNNSANet model for measured and predicted values of seven soil properties.</p> ">
Figure 10
<p>Results of the CNNSANet and other deep learning models for soil property prediction.</p> ">
Versions Notes

Abstract

:
Visible near-infrared spectroscopy (VNIR) is extensively researched for obtaining soil property information due to its rapid, cost-effective, and environmentally friendly advantages. Despite its widespread application and significant achievements in soil property analysis, current soil prediction models continue to suffer from low accuracy. To address this issue, we propose a convolutional neural network model that can achieve high-precision soil property prediction by creating 2D multi-channel inputs and applying a multi-scale spatial attention mechanism. Initially, we explored two-dimensional multi-channel inputs for seven soil properties in the public LUCAS spectral dataset using the Gramian Angular Field (GAF) method and various preprocessing techniques. Subsequently, we developed a convolutional neural network model with a multi-scale spatial attention mechanism to improve the network’s extraction of relevant spatial contextual information. Our proposed model showed superior performance in a statistical comparison with current state-of-the-art techniques. The RMSE (R²) values for various soil properties were as follows: organic carbon content (OC) of 19.083 (0.955), calcium carbonate content (CaCO3) of 24.901 (0.961), nitrogen content (N) of 0.969 (0.933), cation exchange capacity (CEC) of 6.52 (0.803), pH in H2O of 0.366 (0.927), clay content of 4.845 (0.86), and sand content of 12.069 (0.789). Our proposed model can effectively extract features from visible near-infrared spectroscopy data, contributing to the precise detection of soil properties.

1. Introduction

Soil is a critical natural resource, and the accurate and timely acquisition of soil property information is essential for ensuring soil health and achieving sustainable agriculture [1]. Traditional methods typically entail on-site sampling and laboratory testing; however, these approaches are plagued by high costs, low efficiency, and environmental unfriendliness. In recent years, soil visible–near-infrared reflectance spectroscopy has emerged as a rapid, cost-effective, environmentally friendly, non-destructive, and reproducible analytical technique [2]. Therefore, it has gradually emerged as an effective alternative to traditional methods. However, soil property prediction is challenging due to the spectral data’s numerous spectral bands, strong collinearity, and intricate interrelationships. With the advancement of machine learning, numerous nonlinear regression algorithms have been developed and applied. Said et al. [3] conducted a comparative analysis of three regression techniques—Partial Least Squares Regression (PLSR), Support Vector Machine (SVM), and Multivariate Adaptive Regression Splines (MARS)—for the prediction of the organic matter and clay content in saline soils. Similarly, Yang et al. [4] employed four methods—PLSR, Least Squares Support Vector Machine (LS-SVM), Extreme Learning Machine (ELM), and the Cubist regression model—to forecast the soil organic matter and pH levels. Notwithstanding these advancements, these machine learning methods demonstrate computational efficiency and modeling capability limitations.
In contrast to conventional machine learning methods, deep learning models, particularly convolutional neural networks (CNNs), are highly effective in multi-dimensional data and large-scale problems due to their hierarchical structure, and the learning capabilities of the patterns of complex problems [5]. They have been extensively utilized across domains such as image classification [6,7], natural language processing [8], and speech recognition [9]. By leveraging sparse local connections and weight sharing, CNNs have been proven to effectively and automatically learn and extract local and abstract features from complex spectral data [10]. By stacking multiple convolutional and pooling layers, CNNs can efficiently capture intricate patterns within the data, making them well-suited for soil property prediction tasks [11]. In recent years, the application of deep learning in soil spectroscopy has become increasingly widespread. In 2015, Veres et al. [12] pioneered the integration of deep learning into soil spectroscopy, successfully validating the efficacy of one-dimensional convolutional neural networks (1D CNNs) in predicting specific soil properties. To extract deep feature information, Zhong et al. [13] proposed deep CNN models for the regression prediction of seven soil properties. Spectral data are commonly considered to exhibit a temporal structure [14]. The presence of identical feature peaks at different positions in spectral data may indicate varying information, and the sequential nature of spectral data can affect the accuracy of soil property predictions [15]. However, convolutional neural networks (CNNs) are insensitive to positional information during data extraction, which can lead to a decline in model performance. To address this issue, some studies have adopted recurrent neural networks (RNNs), which are better suited for handling sequential data. RNNs can use feedback connections to store historical information over time. Singh et al. [16] used long short-term memory (LSTM) to predict six soil physical and chemical properties from the LUCAS spectral library. The network can effectively capture and retain short-term and long-term dependencies in sequential data. Yang et al. [17] proposed a novel approach, the Combined CNN and RNN model (CCNVR), that exploits the strengths of both Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Initially, the model employs CNN to extract features from the raw soil spectra. Subsequently, it utilizes a RNN to analyze the relationships among these features. This integration method effectively distills soil spectral features while also profoundly investigating the interconnections among these features. Furthermore, certain studies use two-dimensional transformations to convert one-dimensional spectral data into two-dimensional spectral images to enhance the feature extraction capabilities of the model. Padarian et al. [18] employed a short-time fast Fourier transformation to convert the raw spectra from the LUCAS database into two-dimensional spectrograms. Then, they used a 2D multi-task CNN to predict six soil properties. Li et al. [19] similarly used a short-time fast Fourier transformation to construct a dual-stream convolutional neural network model (Multi-CNN), which integrates both one-dimensional and two-dimensional convolutions to achieve accurate the prediction of multiple soil properties. Jin et al. [20] investigated four methods for converting one-dimensional spectra into two-dimensional spectral images: slicing and reshaping, the Gramian angular difference field, the Gramian angular field, and the Markov transition field. They combined the transformed images with the Swin Transformer to predict six soil properties. Additionally, they demonstrated that the spatial positional correlations preserved in the Gramian angular field method could enhance the information extraction capability of deep neural networks.
This paper introduces a multi-scale spatial attention mechanism module to tackle the issues previously outlined. The spatial attention mechanism, a pivotal element within convolutional neural networks, functions as an adaptive process that selectively focuses on key spatial areas, thus addressing the question of “where to focus” [21]. This approach significantly improves the network’s capacity to discern essential objects within the feature maps by identifying and emphasizing critical regions. It accomplishes this through the application of weighted operations across different areas of the input feature map along the spatial dimension, allowing the network to give precedence to pertinent information [22,23]. We aim to enhance the prediction of soil properties by employing a multi-scale spatial attention mechanism. This mechanism captures information at different scales using convolutional kernels of varying sizes, thereby improving the feature extraction capabilities of convolutional neural networks.
Furthermore, researchers utilize various algorithms to preprocess spectral data to advance the creation of more robust calibration models for soil property prediction. This preprocessing endeavor aims to diminish or eradicate noise in the spectra while highlighting relevant information. Ultimately, this assists calibration models in recognizing the correlation between the input spectra and output soil properties [24]. Common soil spectral preprocessing methods include Savitzky–Golay smoothing, standardization, and normalization techniques. Zhao et al. [25] employed four preprocessing methods—first-order derivative, standard normal variate transformation, multiple scatter correction, and detrending—to process the original spectra. Tsakiridis et al. [26] utilized absorbance spectra and some preprocessed spectra developed using standard techniques as one-dimensional multi-channel inputs for their model. It has been confirmed that effectively combining different preprocessing techniques in one-dimensional multi-input methods produces more robust prediction results than single-input methods. However, research on two-dimensional multi-channel inputs in soil visible–near-infrared spectroscopy prediction studies is scarce. We aim to explore whether two-dimensional multi-channel input methods can improve the prediction accuracy, thus providing more reliable tools for soil property analysis.

2. Materials and Methods

2.1. The Soil Dataset

The soil spectral dataset utilized in this study is derived from the LUCAS soil spectral dataset. This dataset, collected during the 2009 survey, includes 19,036 topsoil samples from 23 European Union countries. All samples underwent standardization and chemical analysis to determine their primary topsoil characteristics, such as coarse fragments, particle size distribution (clay, silt, and sand), pH, organic carbon, carbonates, soluble phosphorus, total nitrogen, extractable potassium, and cation exchange capacity. Spectral data were captured using a diffuse reflectance spectrometer (XDS™ Rapid Content Analyzer, NIRSystems, Inc., Laurel, MD, USA) across a range of 400–2500 nm with a spectral resolution of 0.5 nm, resulting in 4200 data points per sample [27,28,29]. In this study, seven soil properties were selected as target prediction variables: the calcium carbonate content (CaCO3, g·kg−1), cation exchange capacity (CEC, cmol(+)·kg−1), clay fraction (Clay, %), sand fraction (Sand, %), nitrogen content (N, g·kg−1), organic carbon content (OC, g·kg−1), and pH in H2O (pH). We considered all available soil samples in the dataset, encompassing both mineral and organic soils, without considering any additional information such as geographic origin or soil category.

2.2. Method

The entire experimental process was divided into three parts. First, the raw data underwent various preprocessing techniques. Second, the one-dimensional data were transformed into two-dimensional spectral images using the Gramian Angular Difference Field transformation. Next, the best combination of preprocessing methods for different soil properties for a multi-channel input was analyzed using the Vgg16 network model [30]. Finally, the proposed deep learning model was employed to achieve high-precision predictions of soil property.

2.2.1. Preprocessing Methods

Spectral preprocessing techniques optimize raw spectral data, providing more accurate inputs for subsequent analysis and modeling and also acquiring various spectral information through different preprocessing methods that complement each other. To fully leverage this complementary information, we selected spectra processed with a series of common preprocessing methods, along with the original absorbance spectra, as multi-channel inputs for the model, with each spectrum forming an independent channel. Several preprocessing methods commonly used in soil science (such as SG filtering, standard normal variate transformation, and scatter correction) were chosen to create a spectral information pool. The following seven methods were selected to transform the original absorbance spectra: (1) standard normal variate transformation followed by detrending (SNV + DT); (2) the zero-order Savitzky–Golay filter with a window width of 9, followed by standard normal variate transformation (SG0-SNV); (3) the first-order Savitzky–Golay filter with a window width of 9, followed by standard normal variate transformation (SG1-SNV); (4) the second-order Savitzky–Golay filter with a window width of 9, followed by standard normal variate transformation (SG2-SNV); (5) the zero-order Savitzky–Golay filter with a window width of 9, followed by multiple scatter correction (SG0-MSC); (6) the first-order Savitzky–Golay filter with a window width of 9, followed by multiple scatter correction (SG1-MSC); and (7) the second-order Savitzky–Golay filter with a window width of 9, followed by multiple scatter correction (SG2-MSC). The original spectra and the corresponding spectral transformations are depicted in Figure 1.

2.2.2. 2D Transformation Methods

In time series processing, the Gramian Angular Field (GAF) method [31] transforms time series data into image data. This technique retains the complete information of the signal while preserving its temporal dependencies. Visible near-infrared spectroscopy can be viewed as a type of time series. Utilizing the GAF transformation to preserve the spatial position correlations of spectral sequences enables data augmentation and improves the information extraction ability of neural networks [20]. After converting sequence data into image data, we can fully utilize the advantages of convolutional neural networks in image classification and recognition and explore the methods suitable for deep learning algorithm models. We can obtain a two-dimensional GAF image for a given sequence X = { x t , 1 , 2 , , M } by following the steps listed below: To reduce the dimensionality of the sequence, this study employs the Piecewise Aggregate Approximation (PAA) method [32]. Using this method, we obtain the aggregated sequence X ¯ = x ¯ t , t = 1 , 2 , N . It should be noted that in this study, the value of N is set to 64. The formula for the sequence X ¯ is as follows:
X t ¯ = 1 k j = k * ( t 1 ) + 1 k * t x t , 1 t M ,
where k = M N , N < M ;
Next, the data obtained from the first step X ¯ need to be processed using min–max normalization to scale its range to [0, 1]. This will result in a new data set X ¯ ˜ . The specific transformation method is shown in Equation (2).
x t ¯ ˜ = ( x t ¯ x t ¯ min ) x t ¯ max x t ¯ min
For the data obtained in the second step X ¯ ˜ , a polar coordinate transformation can be applied to obtain the corresponding angle and radius for each data point.
ϕ i = arccos ( x t ¯ ˜ ) , 1 x t ¯ ˜ 1 , x t ¯ ˜ X ¯ ˜ r = t N , t N ,
where   ϕ i is the angle and r is the radius;
Using Equations (4) and (5), the cosine of the sum of the angles and the sine of the difference between the angles for two different points can be calculated. Consequently, the Gramian Angular Summation Field ( X G A S F ) and Gramian Angular Difference Field ( X G A D F ) can be obtained.
X G A S F = cos ( ϕ i + ϕ j )
X G A D F = sin ( ϕ i ϕ j )
In this study, we applied the GADF transformation, as shown in Figure 2.

2.2.3. Construction of Multi-Channel Input

To validate the effectiveness of the GADF method, we generated single-channel 2D images from the original soil spectral data. The original spectral sequences and the 2D images were used to train 1D_Vgg16 and 2D_Vgg16 models. Table 1 presents the 2D_Vgg16 network framework in detail. The following hyperparameters were used: SGD was the optimizer, the learning rate was 0.001, the mean squared error was the loss function (MSELoss), the training batch size was 64 samples, and there were 100 training epochs. With the network structure and hyperparameters fixed, only the input data could affect the prediction results.
Next, we applied the preprocessing methods mentioned in Section 2.2.1 to the original spectral sequences, obtaining a series of spectral information. Subsequently, we transformed the spectral information into 2D images. We combined these image data in various ways to construct input data with different channel numbers, which were then fed into the 2D_VGG16 model for training.
To investigate the relationship between the soil property prediction performance and the number of channels in the preprocessing method combination, we gradually increased the number of considered channels to observe the variations in the prediction performance of different properties. Firstly, considering only one channel, we selected one of the preprocessing methods mentioned earlier and obtained a one-channel spectral image by using a two-dimensional transformation as the input variable, denoted as NCC1. Next, considering two channels, we selected any two preprocessing methods and obtained a two-channel spectral image by using a two-dimensional transformation as the input variable, denoted as NCC2, and so on for other channels. According to the permutation and combination methods, the number of NCC1 and NCC2 combinations was 8 and 28, respectively (Table 2). Finally, we compared the prediction accuracy of each property under different channel inputs. We selected the preprocessing method combination with the highest prediction accuracy for each property as the input for that property’s multi-channel, two-dimensional image.

2.2.4. Structure of the CNN Network

As illustrated in Figure 3, this paper introduces a two-dimensional convolutional neural network model with a spatial attention mechanism called CNNSANet. The model employs a hierarchical architecture divided into four stages, akin to certain studies in computer vision [33,34,35]. Each stage comprises a downsampling layer followed by a sequential stack of blocks. Each block contains a multi-scale spatial selection mechanism module and a multi-channel information fusion module. Downsampling is performed using layer normalization and a 2 × 2 convolution layer with a stride of 2.
To enhance the network’s focus on the most relevant spatial contextual information, we introduce a Multi-Scale Spatial Selection Mechanism (MSSM), as illustrated in Figure 4. This module can select feature maps from convolutional kernels at different scales. First, to extract rich contextual information features from the input X , we utilize a series of depth-wise separable convolutions with varying receptive fields.
D 0 = X , D i = F i d w ( D 0 )
Here, F i d w ( ) represents a depthwise separable convolution with a kernel size of ki. Assuming there are N convolutional kernels, each kernel is further refined by a 1 × 1 convolution F i d w ( ) , as shown in Equation (7).
D i ˜ = F i 1 × 1 ( D i ) ,   for     i     in     [ 1 , N ]
To obtain more detailed and comprehensive feature information, it is possible to concatenate features obtained from different convolutional kernels with varying receptive field sizes. This approach offers the advantage of fully leveraging the multi-level information extraction capabilities of different convolutional kernels on the image, thereby further enhancing the model’s representative capacity and performance.
D ˜ = [ D ˜ 1 D ˜ N ]
Next, we employ the channel-wise average pooling method (represented as P a v g ( ) ) to process the spatial features, resulting in the spatial feature map SA being obtained through average pooling. Then, through convolutional processing, we transform the pooled features (with only one channel) into N spatial attention maps, denoted as S A ˜ .
S A = P a v g ( D ˜ )
S A ˜ = F 1 N ( S A )
To acquire individual spatial selection masks for each convolutional kernel, we apply the Sigmoid activation function to process each spatial attention map S A ˜ i
S A ˜ i = σ ( S A ˜ i )
Here, σ ( ) denotes the Sigmoid function. Following this, a corresponding spatial selection mask is employed to apply weights to the features extracted by various convolutional kernels. The weighted features are then combined using a convolutional layer F ( ) , thereby producing the attention feature S:
S = F ( i = 1 N S A ˜ i D ˜ i )
Finally, the input feature X is multiplied elementwise with S, yielding the final output Y.
Y = X S
Furthermore, we propose a Multi-Scale Channel Information Fusion (MCIF) module to enhance the model’s representative ability and performance, as depicted in Figure 5. This module improves the network’s ability to learn complex features and enhance information fusion between channels. The MCIF module consists of the following components: a parallel depthwise convolution module with four different scales, a 1 × 1 convolution for channel compression and expansion to reduce the computational cost, and a residual connection. In the parallel depthwise convolution module with four different scales, each convolution processes one-fourth of the channels. The depthwise convolution kernels with sizes {3, 5, 7} effectively capture multi-scale information. The 1 × 1 depthwise convolution kernel also acts as a learnable channel-wise scaling factor, further enhancing the module’s performance. This design ensures that features at different scales are fully utilized, improving the model’s ability to recognize and learn complex features. Furthermore, the 1 × 1 convolution for channel compression and expansion helps reduce the computational costs. Finally, the residual connection better preserves and transmits the information about the original features. The following equation can represent the MCIF module:
MCIF ( X ) = C o n v 1 × 1 C r C ( C o n v 1 × 1 C C r ( i N c o n c a t ( D W C o n v k × k ( X i ) ) ) ) + X , k = 2 i 1 , N = 1 , 2 , 3 , 4
X 1 , X 2 , X 3 , X 4 = t o r c h . c h u n k ( X , 4 , d i m = 1 )

2.3. Evaluation

The Root Mean Square Error (RMSE), Coefficient of Determination (R2), and Ratio of Performance to Inter-Quartile Distance (RPIQ) are utilized to assess the training model’s performance. These metrics are validated on the test set, facilitating an objective and thorough evaluation of the model’s performance. RMSE is used to quantify the discrepancy between the predicted values and the actual observations, and it is calculated as follows:
R M S E = 1 n i = 1 n ( y i y ^ i ) 2
R2 is a statistical indicator used to evaluate the fit of a regression model. It represents how the model explains the variance in the actual data. The R2 values range between 0 and 1, with higher values signifying the greater explanatory capability of the model. The calculation formula for R2 is as follows:
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ i ) 2
The RPIQ is used to measure the deviation between the predicted values and observed values. IQR represents the interquartile range of the observed values, while RMSE is the root mean square error between the predicted and observed values. The formula for calculating the RPIQ is as follows:
R P I Q = I Q R R M S E
All deep learning models were trained and tested on a single machine. They were implemented using PyTorch (version 1.11.0), and the training process was accelerated with an NVIDIA TITAN V 12GB GPU.

3. Results and Discussion

Before the experiment, we randomly split the spectral dataset into two subsets, with 70% of the data used for training and 30% for independent testing. The descriptive statistics for the seven soil properties of the calibration and test set samples are summarized in Table 3. The soil properties show a wide range of values, and the means and standard deviations of the soil properties in the calibration and test sets are similar, indicating a uniform distribution, indicating that the dataset was divided reasonably. We split the training set into five subsets using a five-fold cross-validation method for improving the model’s generalization performance. Specifically, the training dataset was randomly divided into five equal-sized subsets. Then, we performed five iterations of training and validation. In each iteration, one subset was used as the validation set, while the remaining four subsets were used as the training set. Each iteration yielded a model, which we evaluated on the independent test set. The final evaluation result of the model was obtained by averaging the performance metrics of the five models generated from the five iterations.

3.1. Analysis of 2D Multi-Channel Inputs

Initially, we verified the effectiveness of the GADF method. As seen in Figure 6, the test performance of converting original spectral information into single-channel GADF images outperformed that of the 1D spectral sequences for each soil property. This observation indicates that preserving spatial positional correlations in the GADF method can enhance the information extraction capability of convolutional neural networks.
Table 4 shows the prediction accuracy for various soil properties using single-channel inputs built from the spectral information obtained via the proposed preprocessing methods and raw spectral information. For different soil properties, the improvement in model performance using different preprocessing combinations is limited, with some combinations even causing a decline in performance. For the five soil properties of CaCO3, N, CEC, pH, and Clay, the preprocessing methods that yielded the best prediction performance for single-channel 2D inputs were SG0 + SNV, SG1 + SNC, SG2 + SNV, SG0 + MSC, and SNV + Detrend, respectively. Compared to the results without using any preprocessing methods, the R2 increased by 0.5−1.1%, while the RMSE values decreased by 1.3−5.9%. However, for the soil properties of OC and Sand, applying the previously mentioned preprocessing methods resulted in a decrease in model performance. This suggests that the single-channel 2D inputs created using these preprocessing techniques do not effectively enhance the relative positional information, leading to limited improvements in the prediction accuracy of the soil property content. Figure 7 illustrates the box plots representing the prediction accuracy for different soil properties using spectral information derived from various preprocessing methods and the original spectral data used to form different multi-channel 2D inputs. The outcomes are primarily consistent across different soil properties. Compared to the prediction accuracy of single-channel 2D inputs, the average coefficient of determination for multi-channel 2D inputs demonstrates a marked improvement and a significant reduction in RMSE. For instance, for OC, the RMSE of its multi-channel 2D input decreased by 3.06−6.51%, and the R2 increased by 0.4−1.0%. However, the prediction accuracy for different soil properties does not always positively correlate with the number of channels. By comparing the average R² of different multi-channel inputs, the optimal number of channels for each property can be determined, and the combination of preprocessing methods that yield the highest R² for that multi-channel input can then be selected. For OC, the optimal number of channels is three, with the highest prediction accuracy achieved using a three-channel 2D input constructed with SNV, SG1 + MSC, and SG2 + MSC methods. The optimal number of channels is seven for CaCO3, N, and CEC, eight for pH, five for Clay, and six for Sand. Table 5 presents the optimal number of channels for each property, the highest accuracy corresponding to that number of channels, and the preprocessing methods used. These findings suggest that multi-channel two-dimensional images constructed with diverse preprocessing methods can enrich the input information, facilitate data augmentation, and improve the predictive performance of soil properties.

3.2. Training and Evaluating the CNNSANet Model

Based on the multi-channel input analysis experiment results, we selected the 2D spectral images with the optimal number of channels for different properties as inputs (Table 5). Subsequently, we used the proposed CNNSANet model to predict seven soil properties. In our experiment, the loss function was the root mean square error, and we used stochastic gradient descent (SGD) with a batch size of 64. Figure 8 shows the loss variation over 100 training iterations. For the prediction tasks of the seven soil properties, the training loss and validation loss for OC, CaCO3, N, pH, and Clay decreased rapidly during the first 0−10 epochs and then stabilized, with the training and validation loss curves almost overlapping. For the soil properties CEC and Sand, the training loss and validation loss decreased slowly, and the validation loss exhibited significant fluctuations. This indicates that the prediction performance for these two properties is not as strong as for the other five properties. Overall, the loss of each model decreases with increasing training iterations, indicating that our models perform well in predicting soil properties and exhibit strong generalization capabilities. To evaluate the effectiveness of the MSSM block and MCIF block in the CNNSANet model, we conducted ablation experiments on our proposed spatial attention mechanism module as follows: We used single-channel 2D images constructed from raw spectra and multi-channel 2D images constructed using different optimal preprocessing methods for each soil property as inputs. Initially, we replaced the MSSM block with a 1 x 1 convolutional block, then used the MSSM block alone, and finally employed the MSSM block along with the MCIF block. As shown in Table 6, the MSSM and MCIF blocks significantly improved the performance. The MSSM block enhanced the R2 by 0.4−0.9% and reduced the RMSE by 1.2−7.8% when predicting the seven soil properties. The MCIF block increased the R² by 0.7−2.6% and decreased the RMSE by 3.4−11.0%. These results indicate that the MSSM and MCIF blocks can improve the predictive performance of CNN, regardless of whether single-channel or multi-channel 2D images are used as input. This confirms the effectiveness of the MSSM and MCIF blocks. Our findings suggest that the proposed spatial attention mechanism enhances the feature extraction abilities of CNNs, leading to an improved soil property prediction performance.
Figure 9 presents scatter plots of the measured versus predicted values for the seven soil properties using the CNNSANet model, effectively illustrating their distribution. Among the predicted soil properties, CaCO3 and OC demonstrate the highest prediction accuracy (R2 > 0.95). The best models for predicting N and pH achieve R2 values of 0.935 and 0.93, respectively. However, the predictive performance for CEC and Clay is comparatively weaker, with R2 values of 0.803 and 0.86, respectively, while Sand shows the lowest R2 value of only 0.789.

3.3. Comparisons of Different Methods

To demonstrate the superior performance of our model, we utilized the same optimal multi-channel 2D inputs for each soil property employed by other image processing models and conducted comparative analyses. We selected several representative algorithmic models: ResNet50, a deep convolutional network; Visual Transformer (ViT) [36], which combines natural language processing with image processing; and ConvNeXt, a next-generation convolutional neural network. Under consistent network hyperparameters, these models were trained to predict soil properties. The results of the soil property prediction performance (RMSE and R2) are presented in Figure 10. The results indicate that our model outperforms other models and can be effectively used for soil property prediction.
To further evaluate the predictive performance of our proposed modeling method on the soil attribute content, we compared the CNNSANet model with the two-dimensional convolutional neural network (2D-CNN) employed by Padarian et al. [18], the one-dimensional long short-term memory neural network (1D-LSTM) used by Singh and Kasana et al. [16], the two-dimensional Swin Transformer network (2D-Swin Transformer) utilized by Jin et al. [20], and the one-dimensional machine learning model (1D-PCR-Poly) proposed by Tavakoli et al. [37]. As shown in Table 7, the CNNSANet model significantly improves the prediction performance for most soil properties. Compared to the 2D-Swin Transformer, which also uses 2D transformation, our model reduces the RMSE for OC, N, CEC, pH, Clay, and Sand by 17.9%, 23.1%, 23.7%, 32.2%, 21.1%, and 21.3%, respectively. This improvement is attributed to the multi-channel 2D images we constructed, which enhance the input information. Additionally, our proposed convolutional neural network, featuring multi-scale spatial attention, offers stronger feature extraction capabilities, leading to better feature fitting and a higher prediction accuracy. It should be noted that some studies utilized both organic and mineral soils from the dataset [18,20,37], while others focused only on mineral soils [17,26]. Our approach considers organic and mineral soils as a single entity to enhance the model’s generalization performance.

4. Conclusions

This study proposes a CNN structure based on 2D multi-channel inputs and a multi-scale spatial attention mechanism. Firstly, we find that the combination of multi-channel inputs and 2D spectral inputs effectively improves the prediction accuracy of various soil properties. We investigate the impact of different channel numbers of 2D inputs for seven properties on the prediction results for each property. Additionally, our proposed convolutional neural network model with spatial attention mechanism, CNNSANet, can better capture the spatial positional correlation information of 2D spectral images, enhancing the feature extraction capability of the convolutional neural network, thereby improving the prediction of soil properties. For the large-scale LUCAS dataset, the CNNSANet model improves the prediction accuracy and outperforms current methods. Unlike laboratory data, VNIR spectra collected in the field are influenced by multiple environmental factors such as the weather, light intensity, and humidity. These factors can introduce higher data variability, thus complicating soil property prediction. Based on the favorable results obtained in this study, we will evaluate our model using more challenging field-collected soil VNIR spectra in future research.

Author Contributions

Conceptualization, G.F. and Z.L.; Data curation, G.F. and Z.L.; Funding acquisition, M.W.; Methodology, G.F. and Z.L.; Project administration, M.W.; Resources, M.W.; Software, G.F.; Supervision, Z.L. and M.W.; Validation, G.F. and J.Z.; Writing—original draft, G.F. and Z.L.; Writing—review and editing, G.F., Z.L., J.Z. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research on Intelligent Monitoring and Early Warning Technology for rice pests and diseases of the Sichuan Provincial Department of Science and Technology, grant number 2022NSFSC0172.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors do not have permission to share data.

Acknowledgments

The LUCAS topsoil dataset used in this work was made available by the European Commission through the European Soil Data Centre managed by the Joint Research Centre (JRC), https://esdac.jrc.ec.europa.eu/content/lucas-2009-topsoil-data (accessed on 8 March 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pahalvi, H.N.; Rafiya, L.; Rashid, S.; Nisar, B.; Kamili, A.N. Chemical fertilizers and their impact on soil health. In Microbiota and Biofertilizers, Ecofriendly Tools for Reclamation of Degraded Soil Environs; Springer: Berlin/Heidelberg, Germany, 2021; Volume 2, pp. 1–20. [Google Scholar]
  2. Soriano-Disla, J.M.; Janik, L.J.; Rossel, R.A.V.; Macdonald, L.M.; McLaughlin, M.J. The performance of visible, near-, and mid-infrared reflectance spectroscopy for prediction of soil physical, chemical, and biological properties. Appl. Spectrosc. Rev. 2014, 49, 139–186. [Google Scholar] [CrossRef]
  3. Nawar, S.; Buddenbaum, H.; Hill, J.; Kozak, J.; Mouazen, A.M. Estimating the soil clay content and organic matter by means of different calibration methods of vis-NIR diffuse reflectance spectroscopy. Soil Tillage Res. 2016, 155, 510–522. [Google Scholar] [CrossRef]
  4. Yang, M.; Xu, D.; Chen, S.; Li, H.; Shi, Z. Evaluation of machine learning approaches to predict soil organic matter and pH using Vis-NIR spectra. Sensors 2019, 19, 263. [Google Scholar] [CrossRef] [PubMed]
  5. Safaie, M.; Hosseinpour-Zarnaq, M.; Omid, M.; Sarmadian, F.; Ghasemi-Mobtaker, H. Using deep neural networks for evaluation of soil quality based on VIS–NIR spectroscopy. Earth Sci. Inform. 2024, 17, 271–281. [Google Scholar] [CrossRef]
  6. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  7. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  8. Tang, D.; Wei, F.; Qin, B.; Liu, T.; Zhou, M. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 208–212. [Google Scholar]
  9. Maas, A.L.; Qi, P.; Xie, Z.; Hannun, A.Y.; Lengerich, C.T.; Jurafsky, D.; Ng, A.Y. Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 2017, 41, 195–213. [Google Scholar] [CrossRef]
  10. Carvalho, M.; Cardoso-Fernandes, J.; Lima, A.; Teodoro, A.C. Convolutional Neural Networks Applied to Antimony Quantification via Soil Laboratory Reflectance Spectroscopy in Northern Portugal: Opportunities and Challenges. Remote Sens. 2024, 16, 1964. [Google Scholar] [CrossRef]
  11. Mamalakis, A.; Barnes, E.A.; Ebert-Uphoff, I. Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience. Artif. Intell. Earth Syst. 2022, 1, e220012. [Google Scholar] [CrossRef]
  12. Veres, M.; Lacey, G.; Taylor, G.W. Deep learning architectures for soil property prediction. In Proceedings of the 2015 12th Conference on Computer and Robot Vision, Halifax, NS, Canada, 3–5 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 8–15. [Google Scholar]
  13. Zhong, L.; Guo, X.; Xu, Z.; Ding, M. Soil properties: Their prediction and feature extraction from the LUCAS spectral library using deep convolutional neural networks. Geoderma 2021, 402, 115366. [Google Scholar] [CrossRef]
  14. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef]
  15. Zhang, R.; Xie, H.; Cai, S.; Hu, Y.; Liu, G.-K.; Hong, W.; Tian, Z.Q. Transfer-learning-based Raman spectra identification. J. Raman Spectrosc. 2020, 51, 176–186. [Google Scholar] [CrossRef]
  16. Singh, S.; Kasana, S.S. Estimation of soil properties from the EU spectral library using long short-term memory networks. Geoderma Reg. 2019, 18, e00233. [Google Scholar] [CrossRef]
  17. Yang, J.; Wang, X.; Wang, R.; Wang, H. Combination of convolutional neural networks and recurrent neural networks for predicting soil properties using Vis–NIR spectroscopy. Geoderma 2020, 380, 114616. [Google Scholar] [CrossRef]
  18. Padarian, J.; Minasny, B.; McBratney, A.B. Using deep learning to predict soil properties from regional spectral data. Geoderma Reg. 2019, 16, e00198. [Google Scholar] [CrossRef]
  19. Li, R.; Yin, B.; Cong, Y.; Du, Z. Simultaneous prediction of soil properties using multi_cnn model. Sensors 2020, 20, 6271. [Google Scholar] [CrossRef] [PubMed]
  20. Jin, X.; Zhou, J.; Rao, Y.; Zhang, X.; Zhang, W.; Ba, W.; Zhou, X.; Zhang, T. An innovative approach for integrating two-dimensional conversion of Vis-NIR spectra with the Swin Transformer model to leverage deep learning for predicting soil properties. Geoderma 2023, 436, 116555. [Google Scholar] [CrossRef]
  21. Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
  22. Hassanin, M.; Anwar, S.; Radwan, I.; Khan, F.S.; Mian, A. Visual attention methods in deep learning: An in-depth survey. Inf. Fusion 2024, 108, 102417. [Google Scholar] [CrossRef]
  23. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
  24. Tsimpouris, E.; Tsakiridis, N.L.; Theocharis, J.B. Using autoencoders to compress soil VNIR–SWIR spectra for more robust prediction of soil properties. Geoderma 2021, 393, 114967. [Google Scholar] [CrossRef]
  25. Zhao, W.; Wu, Z.; Yin, Z.; Li, D. Attention-Based CNN Ensemble for Soil Organic Carbon Content Estimation with Spectral Data. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7000805. [Google Scholar] [CrossRef]
  26. Tsakiridis, N.L.; Keramaris, K.D.; Theocharis, J.B.; Zalidis, G.C. Simultaneous prediction of soil properties from VNIR-SWIR spectra using a localized multi-channel 1-D convolutional neural network. Geoderma 2020, 367, 114208. [Google Scholar] [CrossRef]
  27. Orgiazzi, A.; Ballabio, C.; Panagos, P.; Jones, A.; Fernández-Ugalde, O. LUCAS Soil, the largest expandable soil dataset for Europe: A review. Eur. J. Soil Sci. 2018, 69, 140–153. [Google Scholar] [CrossRef]
  28. Panagos, P.; Van Liedekerke, M.; Jones, A.; Montanarella, L. European Soil Data Centre: Response to European policy support and public data requirements. Land Use Policy 2012, 29, 329–338. [Google Scholar] [CrossRef]
  29. Tóth, G.; Jones, A.; Montanarella, L. The LUCAS topsoil database and derived information on the regional variability of cropland topsoil properties in the European Union. Environ. Monit. Assess. 2013, 185, 7409–7425. [Google Scholar] [CrossRef] [PubMed]
  30. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  31. Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar]
  32. Guo, C.; Li, H.; Pan, D. An improved piecewise aggregate approximation based on statistical features for time series mining. In Knowledge Science, Engineering and Management Proceedings of the 4th International Conference, KSEM 2010, Belfast, Northern Ireland, UK, 1–3 September 2010; Proceedings 4; Springer: Berlin Heidelberg, 2010; pp. 234–244. [Google Scholar]
  33. Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
  34. Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
  35. Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 10819–10829. [Google Scholar]
  36. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  37. Tavakoli, H.; Correa, J.; Sabetizade, M.; Vogel, S. Predicting key soil properties from Vis-NIR spectra by applying dual-wavelength indices transformations and stacking machine learning approaches. Soil Tillage Res. 2023, 229, 105684. [Google Scholar] [CrossRef]
Figure 1. The initial absorbance spectra and the seven corresponding spectral preprocessing methods. The 5th, 16th, 50th, 84th, and 95th percentiles are depicted.
Figure 1. The initial absorbance spectra and the seven corresponding spectral preprocessing methods. The 5th, 16th, 50th, 84th, and 95th percentiles are depicted.
Sensors 24 04728 g001
Figure 2. The procedure for converting a visible–near-infrared spectral sequence into a GADF image is as follows: (a1) is the original spectral sequence, (a2) is the spectral sequence after PAA dimensionality reduction, (a3) is the polar coordinate transformation, and (a4) is the resulting GADF image.
Figure 2. The procedure for converting a visible–near-infrared spectral sequence into a GADF image is as follows: (a1) is the original spectral sequence, (a2) is the spectral sequence after PAA dimensionality reduction, (a3) is the polar coordinate transformation, and (a4) is the resulting GADF image.
Sensors 24 04728 g002
Figure 3. The overall framework of the CNNSANet.
Figure 3. The overall framework of the CNNSANet.
Sensors 24 04728 g003
Figure 4. Multi-scale spatial selection mechanism model.
Figure 4. Multi-scale spatial selection mechanism model.
Sensors 24 04728 g004
Figure 5. Multi-scale channel information fusion model.
Figure 5. Multi-scale channel information fusion model.
Sensors 24 04728 g005
Figure 6. RMSE and R2 comparison between 1D raw spectral data and 2D single-channel GADF images constructed using the same 1D raw spectral data as inputs.
Figure 6. RMSE and R2 comparison between 1D raw spectral data and 2D single-channel GADF images constructed using the same 1D raw spectral data as inputs.
Sensors 24 04728 g006
Figure 7. Boxplot of prediction accuracies for different properties of 2D inputs constructed from spectral information obtained using various preprocessing methods and raw spectral information.
Figure 7. Boxplot of prediction accuracies for different properties of 2D inputs constructed from spectral information obtained using various preprocessing methods and raw spectral information.
Sensors 24 04728 g007
Figure 8. Training and validation losses of the CNNSANet model for seven soil properties.
Figure 8. Training and validation losses of the CNNSANet model for seven soil properties.
Sensors 24 04728 g008
Figure 9. Scatter plot of CNNSANet model for measured and predicted values of seven soil properties.
Figure 9. Scatter plot of CNNSANet model for measured and predicted values of seven soil properties.
Sensors 24 04728 g009
Figure 10. Results of the CNNSANet and other deep learning models for soil property prediction.
Figure 10. Results of the CNNSANet and other deep learning models for soil property prediction.
Sensors 24 04728 g010
Table 1. 1D and 2D-VGG16 network architecture.
Table 1. 1D and 2D-VGG16 network architecture.
1D_Vgg162D_Vgg16
Input (1 × 4200)Input (C × 64 × 64)
Conv1d 3-64Conv2D 3 × 3-64
Conv1d 3-64Conv2D 3 × 3-64
Maxpooling 2Maxpooling 2 × 2
Conv1d 3-128Conv2D 3 × 3-128
Conv1d 3-128Conv2D 3 × 3-128
Maxpooling 2Maxpooling 2 × 2
Conv1d 3-256Conv2D 3 × 3-256
Conv1d 3-256Conv2D 3 × 3-256
Conv1d 3-256Conv2D 3 × 3-256
Maxpooling 2Maxpooling 2 × 2
Conv1d 3-512Conv2D 3 × 3-512
Conv1d 3-512Conv2D 3 × 3-512
Conv1d 3-512Conv2D 3 × 3-512
Maxpooling 2Maxpooling 2 × 2
Conv1d 3-512Conv2D 3 × 3-512
Conv1d 3-512Conv2D 3 × 3-512
Conv1d 3-512Conv2D 3 × 3-512
Maxpooling 2Maxpooling 2 × 2
FC DenseFC Dense
Note: C: The number of channels in two-dimensional input data; Conv1d 3-64: 1D convolutional layer with a kernel size of 3, outputting 64 channels; Conv2D 3 ×3-64: 2D convolutional layer with a kernel size of 3 × 3, outputting 64 channels; Maxpooling 2 × 2: 2D max pooling with a pool size of 2 × 2; FC Dense: fully connected layer.
Table 2. The number of permutations and combinations of different preprocessing methods after two-dimensional transformation.
Table 2. The number of permutations and combinations of different preprocessing methods after two-dimensional transformation.
CNPCNAbbreviationCNPCNAbbreviation
18NCC1556NCC5
228NCC2628NCC6
356NCC378NCC7
470NCC481NCC8
Note: CN indicates the number of channels considered; PCN indicates the number of outcomes from permutation and combination; NCC indicates the number of combined channels.
Table 3. Information statistics of seven soil properties for training and testing sets.
Table 3. Information statistics of seven soil properties for training and testing sets.
Soil PropertiesValid SamplesTrainingTesting
SamplesMinQ1Q2Q3MaxMeanStandard DeviationSamplesMinQ1Q2Q3MaxMeanStandard Deviation
OC (g·kg−1)19,03613,325012.720.839.3586.850.1791.855710012.720.640.757749.6290.03
CaCO3 (g·kg−1)19,03613,3250011294451.31124.7557100011190952.29126.63
N (g·kg−1)19,03613,32501.21.71.938.62.923.76571001.21.72.934.22.933.74
pH19,03613,3253.215.026.27.4710.086.21.3557103.415.016.227.479.756.21.35
CEC (cmol(+)·kg−1)19,03613,3250712.420.423415.7714.39571007.112.320.1227.715.714.7
Clay/%17,93912,5571817277918.8413.0253821817267918.9912.95
Sand/%17,93912,55712042649842.8926.03538211942649842.8126.24
Table 4. Test set results of seven soil properties (OC, N, CEC, pH, CaCO3) for single-channel 2D input constructed using different preprocessing methods based on the Vgg16 network model.
Table 4. Test set results of seven soil properties (OC, N, CEC, pH, CaCO3) for single-channel 2D input constructed using different preprocessing methods based on the Vgg16 network model.
Preprocessing AlgorithmOCCaCO3NCECpHClaySand
R2RMSER2RMSER2RMSER2RMSER2RMSER2RMSER2RMSE
Absorbances 0.92824.2020.93332.720.8871.2590.7247.720.870.4870.8015.7850.68714.669
SNV + Detrend0.92524.620.93532.160.891.2420.7327.6130.8630.5010.7766.1280.65115.506
sg0 + SNV0.92524.7050.93532.3290.8871.2590.7187.8130.8730.4810.8025.7670.68514.73
sg1 + SNV0.92225.130.93731.9020.881.2960.7137.8810.8870.4540.7096.9890.66715.151
sg2 + SNV0.92624.540.93831.4220.8811.290.7177.8240.8850.4580.7017.07870.64715.59
sg0 + MSC0.92424.750.93632.1290.8921.2310.7257.7080.8830.4620.8075.6930.68514.728
sg1 + MSC0.92225.1190.93532.1870.8741.3310.7097.9270.8850.4580.716.9790.66215.248
sg2 + MSC0.92524.660.93831.5380.8761.320.6978.090.8770.4730.6837.2920.65515.411
Table 5. The highest accuracy and multi-channel combination method for different multi-channel numbers based on different properties.
Table 5. The highest accuracy and multi-channel combination method for different multi-channel numbers based on different properties.
Soil PropertyCNPreprocessing Algorithm CombinationR2RMSE
OC3SG0 + SNV, SG1 + MSC, SG2 + MSC0.93722.627
CaCO37SG0 + MSC, SG0 + SNV, SG1 + SNV,
SNV + DT, SG1 + SNV, SG1 + MSC, SG2 + MSC
0.94828.941
N7SG0 + SNV, SG0 + MSC, SG1 + SNV, SG2 + SNV,
SNV + DT, SG1 + MSC, SG2 + MSC
0.9081.133
CEC6Absorbances, SNV + DT, SG1 + SNV, SG2 + SNV, SG1 + MSC, SG2 + MSC0.7826.863
pH8Absorbances, SG0 + MSC, SG0 + SNV, SG1 + SNV, SG2 + SNV,
SNV + DT, SG1 + MSC, SG2 + MSC
0.8960.436
Clay5Absorbances, SG1 + SNV, SG2 + SNV, SG1 + MSC, SG2 + MSC0.8125.609
Sand6Absorbances, SG1 + SNV, SG2 + SNV, SG1 + MSC, SG2 + MSC, SG0 + SNV0.71714.086
Table 6. The results of the ablation experiments on the MSSM block and MCIF block, using single-channel 2D images constructed from raw spectra and multi-channel 2D images constructed with the optimal preprocessing method for each soil property.
Table 6. The results of the ablation experiments on the MSSM block and MCIF block, using single-channel 2D images constructed from raw spectra and multi-channel 2D images constructed with the optimal preprocessing method for each soil property.
Soil1 × 1 Conv2D (SC)MSSM Block (SC)MSSM Block + MCIF Block (SC)1 x 1 Conv2D (MC)MSSM Block (MC)MSSM Block + MCIF Block (MC)
PropertyRMSER2RMSER2RMSER2RMSER2RMSER2RMSER2
OC23.9650.92922.070.9420.7760.94722.130.9421.340.94419.080.955
CaCO331.3210.93929.1330.94727.4280.95328.990.94826.730.95524.90.961
N1.240.891.130.9091.0650.9191.160.9041.090.9150.970.933
CEC7.360.7497.1830.7616.9310.7786.90.786.750.7896.520.803
pH0.4690.8790.4120.9070.390.9170.40.9120.390.9170.370.927
Clay5.8490.7965.350.8295.140.8465.310.835.220.8384.850.86
sand15.2680.66113.8830.7213.210.74913.260.74513.10.75112.060.789
Note: SC indicates the input of single-channel 2D images based on raw spectra, whereas MC indicates the input of multi-channel 2D images constructed with the optimal preprocessing methods for each attribute.
Table 7. The comparison between the proposed CNNSANet model in this paper and other methods from previous studies.
Table 7. The comparison between the proposed CNNSANet model in this paper and other methods from previous studies.
ModelAssessment IndicatorsOCCaCO3NCECpHClaySand
CNNSANet (this study)RMSE19.08324.9010.9696.520.3664.84512.062
R20.9550.9610.9330.8030.9270.860.789
RPIQ1.4670.4421.7541.9946.723.7153.731
2D-CNN [18]RSME32.14NA1.548.580.57.5518.15
R20.88NA0.830.660.870.70.53
1D-LSTM [16]RSME23.25NA1.156.750.42NANA
R20.94NA0.910.770.9NANA
2D-Swin Transformer [20]RMSE23.25NA1.268.550.546.1415.33
R20.95NA0.940.790.90.840.74
RPIQ1.32NA1.271.255.22.772.74
1D-PCR-poly [37]RMSE21.3325.711.116.89NA5.4113.41
R20.950.960.920.8NA0.820.73
RPIQ1.280.431.541.88NA3.333.28
Note: NA, not available.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, G.; Li, Z.; Zhang, J.; Wang, M. Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction. Sensors 2024, 24, 4728. https://doi.org/10.3390/s24144728

AMA Style

Feng G, Li Z, Zhang J, Wang M. Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction. Sensors. 2024; 24(14):4728. https://doi.org/10.3390/s24144728

Chicago/Turabian Style

Feng, Guolun, Zhiyong Li, Junbo Zhang, and Mantao Wang. 2024. "Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction" Sensors 24, no. 14: 4728. https://doi.org/10.3390/s24144728

APA Style

Feng, G., Li, Z., Zhang, J., & Wang, M. (2024). Multi-Scale Spatial Attention-Based Multi-Channel 2D Convolutional Network for Soil Property Prediction. Sensors, 24(14), 4728. https://doi.org/10.3390/s24144728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop