Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
The Impact of Government Subsidies and Quality Certification on Farmers’ Adoption of Green Pest Control Technologies
Previous Article in Journal
Identification of Genetic Markers of APOM and CYP7A1 Genes Affecting Milk Production Traits in Chinese Holstein
Previous Article in Special Issue
Spatial and Temporal Variations of Climate Resources during the Growing Season of Early-Season Rice in Hunan Province
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy

School of Mechatronics & Vehicle Engineering, East China Jiaotong University, Nanchang 330013, China
*
Author to whom correspondence should be addressed.
Agriculture 2025, 15(1), 34; https://doi.org/10.3390/agriculture15010034
Submission received: 14 November 2024 / Revised: 21 December 2024 / Accepted: 23 December 2024 / Published: 26 December 2024
(This article belongs to the Special Issue Rice Ecophysiology and Production: Yield, Quality and Sustainability)

Abstract

:
Rice seed vigor significantly impacts yield, making the selection of high-vigor seeds crucial for agricultural production. Traditional methods for assessing seed vigor are time-consuming and destructive. This study aimed to develop a rapid, non-destructive method for evaluating rice seed vigor using terahertz spectroscopy. Rice seeds with varying vigor levels were prepared through high-temperature and high-humidity aging and classified into high-, low-, and non-vigorous groups based on germination performance. Terahertz transmission imaging (0.1–3 THz) was conducted on 420 seeds, and spectral data were preprocessed using several advanced data processing techniques, including competitive adaptive reweighting (CARS), uninformative variable elimination (UVE), and principal component analysis (PCA). Three chemometric models, namely random forest (RF), K-nearest neighbors (KNN), and partial least squares–discriminant analysis (PLS-DA), were established. The model based on CARS-KNN after band selection achieved the highest prediction accuracy of 97.14%. The results indicate that terahertz spectroscopy combined with band selection methods provides a reliable, non-destructive approach for rice seed vigor assessment, offering significant potential for agricultural quality control.

1. Introduction

Rice is one of the major staple crops in China, widely cultivated across various regions, with its annual production ranking among the top of all grain crops [1]. Seed vigor is a comprehensive physiological trait that assesses the potential for germination and early growth under optimal environmental conditions. It encompasses the developmental integrity of the seed embryo, environmental adaptability, and the seed’s tolerance to stress, serving as a critical indicator of seed quality and agricultural production potential [2]. Seed vigor is a key factor determining seed germination rate and seedling growth, directly influencing crop yield and the stability of agricultural production. Non-destructive detection technology plays a crucial role in seed vigor assessment, not only improving detection efficiency but also effectively reducing resource waste and ensuring the reliability of seed quality control [3]. Sowing high-vigor seeds effectively promotes robust seedling growth, significantly increasing the likelihood of producing high-quality rice. Furthermore, high-vigor seeds demonstrate greater resilience to biotic and abiotic stresses, such as drought, freezing, diseases, and pests, thereby ensuring stable crop growth and yield quality [4]. Rice seeds are the foundation of the rice industry; therefore, selecting high-quality and high-vigor rice seeds is crucial for ensuring and improving rice yield.
Currently, both domestically and internationally, commonly used methods for detecting rice seed vigor include external seed morphology identification, germination detection through standard germination tests, basic seedling counts in field plots, seedling length identification, physiological and biochemical indicator measurements, electrophoresis band identification, and molecular marker identification based on DNA molecular polymorphism. Anhar et al. [5] studied the effect of gibberellin-producing bacteria from the rhizosphere of paddy soils in West Sumatra, Indonesia, on promoting local rice seedling growth. Similarly, this study identified two Trichoderma species, T. harzianum and T. minutisporum, from the same region, and it found that these strains enhanced rice seedling growth with varying effectiveness. Sukkaew et al. [6] assessed the vigor of stored Thai fragrant rice seeds using the TTC method. The results showed a high correlation with standard germination rates (r = 0.98, p < 0.01) and soil seedling emergence rates (r = 0.98, p < 0.01). Alahakoon et al. [7] used the accelerated aging (AA) protocol as a method to predict field emergence rates of direct-seeded rice. They found that for medium-width grain types, the best results were achieved with AA treatment at 43 °C for 72 h (r > 0.8, α = 0.01). However, these methods often result in invasive seed damage and exhibit limitations such as high subjectivity, low specificity, prolonged detection times, and limited accuracy. Additionally, they consume significant human and material resources, making them unable to meet the current seed industry’s demand for steady and progressive development.
In recent years, various spectroscopic techniques have been employed for the detection of seed vigor. Jin et al. [8] applied near-infrared hyperspectral imaging combined with convolutional neural networks (CNN), logistic regression (LR), and support vector machine (SVM) models to successfully predict the viability and vigor of naturally aged rice seeds. The results indicated that the prediction accuracy of both deep learning and traditional chemometric methods was similar, with both exceeding 85%. Models using full-spectrum data and those based on PCA-selected characteristic wavelengths showed comparable performance. Al Siam et al. [9] successfully differentiated viable and non-viable seeds by combining average spectral data extracted from the region of interest (ROI) of hyperspectral images with image features by using a PLS-DA model, achieving a classification accuracy of 90.9%. He et al. [10] successfully identified the viability of rice seeds by combining near-infrared hyperspectral imaging with various data preprocessing methods, using extreme learning machine (ELM) and least squares–support vector machine (LS-SVM) models, achieving classification accuracies of 93.67% and 94.38%, respectively. Zou et al. [11] used peanut seeds with different vigor gradients, including healthy original seeds and seeds artificially aged for 24 and 72 h, as samples. By combining hyperspectral imaging technology with the MF-LightGBM-RF model, they successfully established a highly correlated vigor prediction system. This method achieved a prediction accuracy of 92.59%. Lakshmanan et al. [12] employed near-infrared spectroscopy for non-destructive evaluation of spinach seed vigor, using the successive projections algorithm (SPA) for variable selection. The results showed that this method achieved an error rate of 1.7%, making it simpler and more accurate compared to existing typical variance-based methods. Ambrose et al. [13] compared FT-NIR and Raman spectroscopy for evaluating corn seed viability. FT-NIR achieved 100% classification accuracy and over 95% predictive ability, while Raman spectroscopy showed good accuracy in PLS-DA but experienced significant overlap in PCA analysis. Pardo et al. [14] used photoacoustic spectroscopy (PAS) technology to evaluate the aging of vegetable seeds by measuring the optical parameters of red leaf lettuce and beet seeds, providing information on their composition and physiological quality attributes. The experimental results showed significant differences in the light absorption coefficient (β) between aged and non-aged seeds within the spectral range of 250 nm to 750 nm, with non-aged seeds exhibiting higher β values. However, hyperspectral technology is primarily used to analyze the surface information of seeds, but it faces challenges such as complex data processing and difficulty in penetrating the seed surface. Although near-infrared technology has some penetration ability, it has significant limitations in terms of penetration depth, resistance to surface interference, and sensitivity to moisture. Raman spectroscopy, on the other hand, is limited by weak signal intensity, significant surface effects, complex equipment, and high costs, which restrict its widespread application.
In contrast, terahertz (THz) waves, with a frequency range between 0.1 and 10 THz, offer unique advantages such as low energy, strong penetration, speed, and non-invasiveness. As a non-destructive and cost-effective method, THz technology is widely applied in the food and agricultural fields, providing an efficient alternative to overcome the limitations of traditional techniques. Wang et al. [15] applied terahertz reflection imaging technology to detect rice freshness, utilizing an improved 1D-VGG19-Inception-ResNet-A deep learning network to significantly enhance the accuracy of freshness identification, achieving 99.80%. The experimental results demonstrate that this method holds great potential for monitoring rice freshness, contributing to the assurance of food quality and safety. Wu et al. [16] used terahertz time-domain spectroscopy reflection imaging technology combined with generalized two-dimensional correlation spectroscopy to establish a rapid, non-destructive analysis model for distinguishing seed vigor. By incorporating a support vector machine, the results demonstrated that this method achieved identification accuracy rates of 88.61% and 91.73% for the endosperm and embryo test sets, respectively, significantly improving the accuracy and efficiency of non-destructive seed vigor detection.
This study introduces a novel approach to rice seed viability detection, employing terahertz transmission spectroscopy combined with chemometric modeling, marking its first application in this context. This approach provides new technological tools for agricultural seed quality control and production optimization. Rice seed samples were subjected to accelerated aging experiments under high temperature and humidity to obtain seeds at different stages of aging. Traditional germination tests were then performed to obtain relevant germination parameters. To ensure a more accurate assessment of seed viability, each seed was individually numbered, and the germination days for each seed were recorded. Based on the actual germination days, the rice seeds were categorized into different viability levels. Chemometric methods were employed to model and analyze the terahertz transmission imaging spectra of rice seeds, with the aim of identifying the most effective modeling strategy for rapid and non-destructive viability assessment of the seeds.

2. Materials and Methods

2.1. Preparation of Rice Seeds with Different Vigor Levels

Naturally aged seeds are scarce, and the aging process is time-consuming, creating challenges for seed vigor research. As a result, many studies have shown that artificial aging can effectively simulate the natural aging process in seeds [17]. In this study, “Y Liangyou” hybrid rice seeds harvested in 2023 were purchased from Hunan Jumen Biotechnology Co., Ltd., Changsha, China, with an initial moisture content of approximately 10%. To enhance the differentiation of seed vitality, the seeds were subjected to high-temperature and high-humidity aging treatment prior to the experiment. This treatment accelerates the degradation process, thereby effectively inducing variations in vitality and vigor, and it provides optimal experimental conditions for seed vitality research and quality assessment [18]. The purchased “Y Liangyou hybrid rice” seeds were weighed and evenly divided into seven groups, each weighing approximately 10 g, corresponding to about 500 seeds per group. The seeds in each group were uniformly distributed on aging trays, with the artificial aging chamber maintained at 45 ± 2 °C and a relative humidity of 95% ± 3%. Six of these aging trays were placed in the artificial aging chamber, with one group removed every day. The sequence was as follows: one group was removed on the first day, another on the second day, another on the third day, and so on until the sixth group was removed on the sixth day. Samples were collected representing 0, 1, 2, 3, 4, 5, and 6 days of aging. During the aging experiment, the monitoring equipment was checked every 12 h to ensure stable operation. After the aging treatment, the samples were dried in the same room temperature environment for two days to stabilize their moisture content and eliminate the influence of water variability. From each aging gradient group, 60 rice seeds that were uniform in appearance, plump, and free from discoloration or mold were selected, totaling 420 rice seed samples, which were then used for terahertz spectroscopy imaging.

2.2. Overview of Terahertz Imaging Equipment

The equipment utilized in this experiment is a Terahertz Transmission 3D Imaging Scanner (Model: QT-TO1000, Qingdao Quenda Terahertz Technology Co., Ltd., Qingdao, China). Figure 1 illustrates the schematic representation of the working principle of the QT-TO1000 Terahertz Spectral Transmission Imaging Scanner.
In this system, femtosecond lasers generate pulses that are divided into pump and probe beams. The pump beam is directed to the terahertz emitter to produce THz radiation. The generated terahertz pulses interact with the sample and travel through its surface, and the resulting THz spectrum—containing detailed sample information—is captured by the detector. Operating within a frequency range of 0.1–3.0 THz, the system supports a maximum scanning area of 100 × 100 mm, a detection depth of up to 9 mm, and an imaging speed of 60 pixels per second, ensuring high precision and efficiency.
During the experiment, the laboratory conditions were controlled at 24 ± 0.1 °C with humidity kept below 10%. After a 30 min preheating phase, terahertz transmission images were captured using the device. The samples were placed on a movable platform, with height adjustments made to ensure optimal terahertz transmission signals for scanning and imaging in a point-to-point mode. The X-Y platform was set to move in 0.3 mm increments, while the terahertz imaging camera, controlled by software, collected images of the samples. To avoid any absorption interference from the platform material, 1 mm thick polyethylene (PE) sheets, known for their negligible absorption in the terahertz range, were used as the base for the samples. Transparent double-sided tape was employed to secure the rice seeds on the PE sheet, preventing displacement during platform movement, which could affect the spectral data. Each group of seeds on the PE sheet was sequentially numbered and recorded from top to bottom, left to right. Figure 2 shows the flowchart of the data processing in this study.

2.3. Optical Parameter Extraction

Terahertz time-domain spectroscopy (Terahertz-TDS) contains rich time-domain signal information, such as the absorption coefficient, refractive index, and absorbance. Therefore, Terahertz-TDS can directly reflect a significant amount of information about the internal properties of materials. This study focuses on utilizing terahertz time-domain signal data to analyze time-domain features, revealing the internal characteristics of samples while preserving the integrity of the original time-domain information, as shown in Equation (1) [19]:
E ( ω ) = A ( ω ) e i ϕ ( ω ) = E ( t ) e i ω t d t
where A ( ω ) represents the amplitude of the signal in the frequency domain, ϕ ( ω ) indicates the phase of the signal, E ( t ) denotes the time-domain representation of the signal, and E ( ω ) corresponds to the signal in the frequency domain.

2.4. Seed Germination Experiment

In order to determine the true vigor level of 420 rice seed samples across seven aging gradients, a germination experiment on the samples was necessary. The germination experiment was conducted after the completion of terahertz spectrum collection. The entire germination experiment procedures and temperature environment were designed according to the “Rules for Agricultural Seed Testing—Germination Test” (GB/T 3543.4-1995) [20]. A constant temperature and humidity environment was provided by the MGC-300H artificial climate chamber. Before germination, each seed was soaked in pure water for 24 h to break its dormancy. To prevent the rice seed samples from floating in the water and disrupting the numbering sequence of the seeds, each seed was individually soaked in a separate paper cup filled with pure water. The paper cups were arranged according to the corresponding seed numbers.
In this experiment, the paper germination method was used, as shown in Figure 3. A layer of germination paper soaked with pure ionized water was laid flat in the germination box. The rice seeds from each group, arranged in a 10 × 6 pattern, were transferred as a whole into the germination box. A certain distance was maintained between each seed to prevent fermentation and spoilage caused by insufficient oxygen.
After applying a small amount of pure ionized water, the germination box was covered and placed in an intelligent germination incubator at constant temperature and humidity. The environmental conditions were set to 8 h of light at 28 °C and 16 h of darkness at 20 °C. During the germination experiment, any seeds showing signs of mold were immediately removed to prevent fungal contamination from affecting other rice seeds. The germination period lasted for 14 days. During the germination experiment, the seeds in the germination box were sprayed with water daily to maintain moisture. Germination was considered to have occurred when the embryonic shoot length reached half of the seed length, and the radicle length equaled the seed length. The germination progress was recorded throughout the experiment [21,22]. The calculation formulas for the seed germination and vigor evaluation indices (2)–(5) are as follows:
GR = (seeds germinated at 14 days/Total seeds) × 100%
GE = (seeds germinated at 5 days/Total seeds) × 100%
Germination Index = Σ(Gt/Dt)
Vigor Index = GI × S
GR is the germination rate, GE is the germination energy, GI is the germination index, Gt is the number of seeds germinated each day, Dt is the corresponding day for Gt, and S (cm) is the average shoot length per seedling on the 14th day of germination.

2.5. Algorithm Principle

2.5.1. Competitive Adaptive Reweighted Sampling (CARS)

The CARS algorithm is a feature selection method that optimizes feature subsets through competitive reweighting sampling, thereby enhancing the predictive accuracy of the model. It is widely applied in fields such as spectral analysis and data mining. This method is based on Darwin’s principle of “natural selection”. The CARS algorithm selects the regression coefficients with higher weights in the PLS model through the adaptive reweighted sampling (ARS) method, forming new subsets of variables and gradually eliminating coefficients with lower weights. Through multiple iterations, the algorithm continuously optimizes the subsets to minimize the cross-validation root mean square error (RMSECV) of the PLS model, thereby accurately identifying the key feature wavelengths. The specific steps are as follows:
Monte Carlo sampling (MCS): For each sampling, a certain proportion of samples (usually 80–90%) is randomly drawn from the modeling sample set to establish a partial least squares (PLS) regression model.
Variable elimination based on an exponential decay function: Let us assume the sample spectral matrix is X ( n × p ) , where n represents the total number of samples, and p denotes the number of variables. The sample concentration matrix is represented as Y ( n × 1 ) . The partial least squares (PLS) regression model can be expressed as follows:
Y = X b + e
The regression coefficient b is a p-dimensional coefficient vector, and e represents the residual. The absolute value of the i t h element of b i ( 1 i p ) indicates the contribution of the i t h variable to the regression model. The larger this value, the more important the corresponding variable is in predicting the component concentration.
Variables with relatively smaller b i values are progressively eliminated using an exponential decay approach. For each sampling iteration i , the retention rate of the variables is determined using the following exponential function:
r i = a e k i
Both a and k are constants. During the initial sampling, all p variables are included in the modeling process, where r 1 = 1 . By the N t h sampling iteration, only two variables remain, and the retention rate is defined as r n = 2 / p . The constants a and k are determined using the following equations:
a = p 2 1 / ( N 1 )
k = ln ( p / 2 ) N 1
In the equation, ln represents the natural logarithm. By using the exponential decay function, a large number of unimportant wavelength variables are gradually and effectively eliminated.
Based on the adaptive reweighted sampling technique, the variables undergo further competitive selection by simulating the principle of “survival of the fittest” in Darwin’s theory of evolution. Variable selection is carried out by evaluating the weight W t of each wavelength variable. The calculation of the weight is as follows:
w t = b i i = 1 p b i , i = 1 , 2 , 3 , , p
Wavelengths with larger absolute regression coefficient weights are more likely to be selected, while those with smaller weights are more likely to be eliminated. After N samplings, N variable subsets are obtained. The optimal variable subset is determined by calculating and comparing the cross-validation RMSE of each variable subset, with the subset yielding the smallest error being selected as the optimal subset.

2.5.2. Uninformative Variable Elimination (UVE) Algorithm

The UVE algorithm identifies and removes wavelength variables with minimal contributions to model performance, thereby simplifying the model and retaining only informative features. Based on the partial least squares (PLS) method, the algorithm introduces white noise variables equal in number to the original variables into the model. Using leave-one-out cross-validation, the regression coefficients of all variables are calculated and normalized by dividing each coefficient by its standard deviation. The resulting stability ratios are compared against those of the random variable matrix, and variables with performance similar to random noise are eliminated. This process reduces the number of variables, enhances model efficiency, and improves overall predictive accuracy. The UVE process is detailed as follows:
(1)
Establish the initial model and add noise variables: A partial least squares (PLS) model is constructed using all variables, and a noise variable matrix E is introduced to simulate the effect of uninformative variables. The model is expressed as follows:
Y = X B + E C + ε
(2)
Y represents the response variable, X is the original variable matrix, B and C are the coefficients of the original variables and noise variables, respectively, and ε is the error term. Assess the stability of the variables: The stability S of each variable’s coefficient in the model, including noise variables, is calculated to evaluate its contribution. It is computed as the ratio of the standard deviation of the variable’s coefficient to its absolute mean value, and the formula is as follows:
S i = β i   β i
(3)
Variable selection and model optimization: A threshold T is set based on the stability distribution of the noise variables. Variables with stability values lower than this threshold are considered uninformative and are eliminated. The selected variables S i t are then used to construct the optimized model.

2.6. Evaluation Criteria for the Rice Vigor Detection Model

To objectively assess the performance of the proposed model, this study adopts a confusion matrix-based evaluation method. The model’s effectiveness is measured using three key metrics: precision, recall, and accuracy. Generally, higher values of these metrics indicate superior model performance [23]. When identifying vigor levels, TP represents the number of samples correctly identified as a certain vigor level (high vigor, low vigor, or no vigor), FP represents the number of samples incorrectly identified as that vigor level, FN represents the number of samples that were not identified as that vigor level, and TN represents the number of samples correctly identified as belonging to other categories. Precision is the proportion of samples that actually belong to a certain vigor level among all samples identified as that vigor level.
(1)
Precision:
This metric refers to the proportion of data that actually belongs to a certain vigor level (high vigor, low vigor, or no vigor) among the samples identified as that vigor level, i.e., the proportion of correctly identified samples among all samples predicted by the model to belong to a certain vigor level. The formula is as follows:
P r e c i s i o n = T P T P + F P
(2)
Recall:
This metric refers to the proportion of data that actually belongs to a certain vigor level (high vigor, low vigor, or no vigor) among the samples identified as that vigor level, i.e., the proportion of correctly identified samples among all samples predicted by the model to belong to that vigor level. The formula is as follows:
R e c a l l = T P T P + F N
(3)
Accuracy:
Starting from the entire dataset, accuracy refers to the proportion of correctly identified samples across all samples, i.e., the sum of normal samples correctly identified as normal and abnormal samples correctly identified as abnormal, divided by the total number of samples. The formula is as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N

3. Results and Discussion

3.1. Germination Results

The germination status of the seeds was recorded daily. In Table 1, “Aging days” represents the number of days the seeds were artificially aged, “GE” stands for germination energy, “GI” for germination index, “VI” for vigor index, and “GR” for germination rate.
It can be observed that as the aging gradient increases, indicators such as germination rate and vigor index show varying degrees of decline. This indicates that accelerated artificial aging affects the physiological functions of the seeds, which in turn affects their vigor, leading to a decrease in germination rate or even complete loss of germination ability. However, it can be observed from the table that the differences between adjacent aging days in the earlier groups are not significant. The reason for this is the individual variation among the rice seeds in each aging group. Therefore, relying solely on different aging days for classification and identification is not particularly accurate. In this experiment, the germination status of each seed was recorded by its assigned number, with the germination date of each rice seed individually noted.
In Table 2, the germination days of most seeds increase as the aging grade increases. Building on previous research, to enhance the precision of seed vigor detection, rice seeds that germinate within 1–5 days are classified as high-vigor seeds, those that germinate within 6–14 days are classified as low-vigor seeds, and those that do not germinate within 14 days are classified as non-vigor seeds [24,25]. Among the tested seeds, there were 160 high-vigor seeds, 139 low-vigor seeds, and 121 non-vigor seeds.

3.2. Spectral Feature Extraction of Rice Seeds

Extraction of Spectral Regions of Interest in Rice Seeds

In this study, each pixel of the terahertz image contains a full terahertz time-domain spectrum (THz-TDS), allowing for spectral measurement and integration. The raw data are structured as a three-dimensional array, with spectral information embedded in each pixel of the XOY plane, while the Z-axis represents the wavelength. The time-domain spans 0 to 180 ps, corresponding to 3375 waveform data points. The region of interest (ROI) was manually selected as a rectangular area located at the central region of each rice seed, ensuring a similar size across all seeds. Spectral information within this region was extracted to minimize the interference caused by variations in the thickness of different rice seeds.
As shown in Figure 4, after aging treatment, the lower the vitality of the rice seeds, the higher the peak value of the time-domain signal. Aging-induced damage to the cell membrane and internal structures causes the seeds to become more porous, with a reduction in starch particle size and the formation of voids between the particles. This decreases the obstruction to the terahertz waves, thereby enhancing the transmission signal. This establishes a theoretical basis for applying terahertz spectroscopy and pattern recognition methods in the grading and quality control of rice seed vigor.

3.3. Terahertz Spectral Band Selection for Rice Seeds

The terahertz spectrometer generates a broad spectral range, leading to numerous spectral variables, many of which include redundant, covariant, or background information that can reduce modeling accuracy. To address this, effective signal selection from the large spectral dataset is crucial for further analysis. Feature variable extraction was conducted using CARS, UVE, and PCA.

3.3.1. Wavelength Variable Selection Based on the UVE Algorithm

The terahertz spectra of rice seeds were analyzed using the UVE algorithm, as shown in Figure 5. Two horizontal lines represent the threshold range for wavelength selection, with upper and lower thresholds defined at 27.22 and −27.22, respectively. A green vertical line distinguishes stable wavelengths, located on the left side, from unstable wavelengths on the right. Based on the screening results, spectral variables that exceeded the thresholds were retained as input variables for the model, while those below the thresholds were removed. The UVE algorithm successfully reduced the number of spectral variables from 3375 to 141, eliminating 95.82% of the data and significantly reducing the complexity of the model.

3.3.2. Wavelength Variable Selection Based on the CARS Algorithm

As shown in Figure 6, the results of the terahertz spectra of rice seeds screened using the CARS algorithm indicate a negative correlation between the number of wavelengths and the number of sampling times. Figure 6a shows that the number of wavelengths decreases as sampling times increase. Between 0 and 50 sampling times, the wavelength reduction accelerates, resulting in a sharp decline. From 50 to 136 sampling times, the reduction rate slows, causing a more gradual decrease. Beyond 136 sampling times, the number of wavelengths stabilizes and shows little change. This phenomenon results from the CARS algorithm’s progression from rough to refined wavelength variable selection. Figure 6b illustrates the RMSECV variations during the screening process. The minimum RMSECV value of 0.3037 occurs at 136 sampling times. From 0 to 136 sampling times, RMSECV decreases with increasing sampling times, indicating a negative correlation. This suggests that the CARS algorithm effectively removes irrelevant variables before 136 sampling times. However, beyond this point, it begins to filter out relevant variables instead of eliminating irrelevant ones. Thus, setting the appropriate number of sampling times is critical when applying the CARS algorithm. Figure 6c displays the regression coefficient changes of wavelength variables during screening. The “*” indicates the sampling time when the RMSECV reaches its minimum value, corresponding to 136 sampling times. At this point, the number of wavelengths is 22.

3.3.3. Wavelength Variable Selection Based on the PCA Algorithm

The PCA algorithm was applied to reduce the dimensionality of terahertz transmission spectra data. The optimal number of principal components was determined based on the cumulative variance contribution rate, retaining at least 95% of the total variance. This process effectively retained the majority of the information while reducing dimensionality to simplify data representation and potentially improve classification performance. The PCA algorithm ultimately reduced the terahertz spectra to 53 principal components, achieving a cumulative variance contribution rate exceeding 95%. Figure 7 illustrates the PCA results of the rice seed terahertz spectra, showing the distribution of samples in the first three principal components.

3.4. Development of a Qualitative Detection Model for Rice Seed Viability Based on Terahertz Spectra

3.4.1. Establishment of an RF Qualitative Model for the Terahertz Spectra of Rice Seeds

RF is an ensemble learning algorithm that improves prediction accuracy and model robustness by aggregating the outputs of multiple decision trees constructed from different subsets of data. The algorithm employs the bootstrap method to extract multiple subsets from the original training data and builds individual decision tree models for each subset, thereby reducing the risk of overfitting associated with a single decision tree. During the node splitting process in each tree, a random subset of features is selected to search for the optimal split point, which increases model diversity and generalization ability. Ultimately, random forest aggregates the predictions of all trees through voting or averaging to yield the final classification or regression result. Its strengths lie in handling high-dimensional data and complex nonlinear relationships among features, as well as its strong noise resistance and robustness. Random forest is widely used for classification tasks in practical applications [26]. The 420 rice seed samples were randomly split into training and testing sets in a 3:1 ratio, yielding 315 samples for model training and 105 samples for performance evaluation. RF models were developed using the original data and the feature selection methods CARS, UVE, and PCA, respectively.
Table 3 shows the performance of the RF qualitative model in identifying rice seed viability based on terahertz spectra, utilizing different feature extraction algorithms. Compared to the RF models using wavelength-selected data, the full-wavelength RF model achieved the highest accuracy in the prediction set, with six misclassifications. The prediction accuracy of the full-wavelength model reached 94.28%. Figure 8 shows the confusion matrix for the prediction set of the full-wavelength model. One high-viability rice seed sample was misclassified as a low-viability seed, two low-viability rice seed samples were misclassified as non-viable seeds, and three non-viable seed samples were incorrectly classified as low-viability seeds.

3.4.2. Establishment of PLS-DA Qualitative Model for Terahertz Spectra of Rice Seeds

Partial least squares–discriminant analysis (PLS-DA) is a supervised classification method that uses partial least squares (PLS) regression to identify latent variables, enhancing predictive accuracy by maximizing the covariance between class labels (response variables) and predictor variables. This approach is particularly effective for high-dimensional datasets, especially when the number of variables exceeds the number of samples, and it addresses multicollinearity issues among independent variables. Widely applied in chemometrics, PLS-DA is valued for its ability to manage complex data structures while maintaining strong interpretability and predictive power [27]. In this study, 420 rice seed samples were divided into training and testing sets at a 3:1 ratio, resulting in 315 samples for training and 105 for evaluation. PLS-DA models were developed using raw spectral data and feature selection methods such as CARS, UVE, and PCA.
Table 4 summarizes the recognition results of the PLS-DA qualitative model applied to the terahertz spectra of rice seed viability with different feature extraction methods. Compared to directly using raw spectral data, employing the CARS and UVE algorithms for feature extraction significantly improved the prediction accuracy of the PLS-DA model. The prediction accuracy of the CARS-PLS-DA model on the prediction set reached 95.24%, with a training set accuracy of 94.29%, which is higher than the 93.97% training set accuracy of the UVE-PLS-DA model. Figure 9 presents the confusion matrix for the prediction set of the CARS-PLS-DA model. One high-viability rice seed sample was misclassified as a low-viability seed, one low-viability seed sample was misclassified as a non-viable seed, and three non-viable seed samples were incorrectly classified as low-viability seeds.

3.4.3. Establishment of a KNN Qualitative Model

The KNN algorithm is a non-parametric, instance-based classification technique that determines class membership by measuring the distance between the target sample and the samples in the training set. The K nearest samples are selected, and the category of the sample to be classified is determined based on a majority voting mechanism from the class information of these K neighbors. The KNN algorithm does not require an explicit training process, which is one of its intuitive advantages. Its decision boundaries in the feature space are non-linear, allowing it to handle complex classification problems. This makes KNN suitable for a wide range of real-world applications. This method utilizes the global distribution information of the training data to achieve efficient pattern recognition and classification tasks [28]. Wavelength variables selected using UVE, CARS, and PCA methods were used as input to establish the KNN model. Following the procedure described in Section 3.4.1 for spectral data processing, the 420 rice seed samples were randomly split into training and testing sets in a 3:1 ratio, yielding 315 samples for training and 105 for evaluation.
Table 5 presents the recognition results of the KNN qualitative model for the terahertz spectra of rice seed viability, combined with various feature extraction algorithms. Compared to the KNN model based on the raw spectral data, the KNN model with feature extraction using the CARS algorithm demonstrated improved prediction accuracy, with only three misclassifications in the prediction set. The prediction accuracy of the CARS-KNN model reached 97.14%, which is higher than the prediction accuracies of the models based on raw data, UVE-KNN, and PCA-KNN, all of which achieved 95.24%. Figure 10 presents the confusion matrix for the prediction set of the CARS-KNN model. One high-viability rice seed sample was misclassified as a low-viability seed, one low-viability seed sample was misclassified as a non-viable seed, and one non-viable seed sample was incorrectly classified as a low-viability seed.

3.4.4. Qualitative Model Analysis of Rice Seed Vigor Using RF, PLS-DA, and KNN

Table 6 presents a comparison of the accuracy results for three different classification models RF, PLS-DA, and KNN in predicting rice seed vigor. By applying the CARS algorithm for feature selection, not only was the computational load of the KNN and PLS-DA models reduced, but the accuracy of the models was also improved to some extent. As shown in the table, the accuracy of the CARS-KNN model is higher compared to both the RF and CARS-PLS-DA models, with the CARS-KNN model achieving an accuracy of 97.24%.

4. Conclusions

This study employed terahertz transmission imaging spectroscopy combined with chemometrics to investigate the non-destructive vitality detection of rice seed samples under different vitality conditions. The analysis reveals that as the vitality of the rice seeds decreases, the peak value of their terahertz time-domain signals gradually increases. To address the issue of inaccurate vitality classification based solely on aging days, a new method for classifying rice seed vitality according to actual germination days is herein proposed. Full-spectrum terahertz time-domain signal data were used for modeling, and to further enhance the predictive ability of the model, band selection combined with KNN was employed, which effectively improved model performance. It was found that CARS-KNN could further optimize the model, achieving a prediction accuracy of 97.14%. This study not only validates the feasibility of this technology for the vitality identification of rice seeds, but it also provides a novel method for non-destructive qualitative vitality detection. By increasing the sample size and incorporating a wider diversity of seed varieties, this method holds great potential for practical field applications in the non-destructive screening of high-vitality rice seeds.

Author Contributions

J.H., investigation, writing—review and editing, experimental scheme design, and formal analysis; S.X., writing—original draft, formal analysis, and experiment; Z.H., experiment; W.L., formal analysis; J.Z. and Y.L., experimental scheme design and review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Youth Natural Science Foundation of China (32302261); Jiangxi Provincial Youth Science Fund Project (20224BAB215042); Jiangxi Ganpo Talented Support Plan—Young science and technology talent Lift Project (2023QT04); Natural Science Foundation of Jiangxi Province (20242BAB202053); National Key R&D Program of China (2022YFD2001805).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are not publicly accessible currently but can be provided by the authors upon reasonable request.

Conflicts of Interest

The authors declare no financial or personal relationships with any individuals or organizations that could improperly influence their work. There are no professional, personal, or commercial interests in any product, service, or company that could be perceived as affecting the content or evaluation of the submitted manuscript. Jun Hu, Sijie Xu, Zhikai Huang, Wennan Liu, Jiahao Zheng, and Yuxi Liao declare that they have no conflicts of interest.

References

  1. Tang, L.; Risalat, H.; Cao, R.; Hu, Q.; Pan, X.; Hu, Y.; Zhang, G. Food Security in China: View of Rice Production in Recent 20 Years. Foods 2022, 11, 3324. [Google Scholar] [CrossRef] [PubMed]
  2. Zhao, J.; He, Y.; Huang, S.; Wang, Z. Advances in the identification of quantitative trait loci and genes involved in seed vigor in rice. Front. Plant Sci. 2021, 12, 659307. [Google Scholar] [CrossRef]
  3. Xing, M.Y.; Long, Y.; Wang, Q.Y.; Tian, X.; Fan, S.; Zhang, C.; Huang, W. Physiological Alterations and Nondestructive Test Methods of Crop Seed Vigor: A Comprehensive Review. Agriculture 2023, 13, 527. [Google Scholar] [CrossRef]
  4. Wang, X.; Zheng, H.; Tang, Q. Early Harvesting Improves Seed Vigor of Hybrid Rice Seeds. Sci. Rep. 2018, 8, 11092. [Google Scholar]
  5. Anhar, A.; Putri, D.H.; Advinda, L.; Atika, V.; Amimi, S.; Aldo, W.; Ruchi, W. Correction to: Molecular characterization of Trichoderma strains from West Sumatera, Indonesia and their beneficial effects on rice seedling growth. J. Crop Sci. Biotechnol. 2021, 24, 441–448. [Google Scholar] [CrossRef]
  6. Sukkaew, N.; Kaewnaborn, J.; Soonsuwon, W.; Wongvarodom, V. Tetrazolium test for evaluating viability of stored rice (Oryza sativa) seeds. Seed Sci. Technol. 2023, 51, 97–109. [Google Scholar] [CrossRef]
  7. Alahakoon, A.; Abeysiriwardena, D.S.D.; Damunupola, J.W.; Hay, F.R.; Gama-Arachchige, N.S. Accelerated aging test of seed vigour for predicting field emergence of wet direct-seeded rice. Crop Pasture Sci. 2021, 72, 773–781. [Google Scholar] [CrossRef]
  8. Jin, B.; Qi, H.; Jia, L.; Tang, Q.; Gao, L.; Li, Z.; Zhao, G. Determination of viability and vigor of naturally-aged rice seeds using hyperspectral imaging with machine learning. Infrared Phys. Technol. 2022, 122, 104097. [Google Scholar] [CrossRef]
  9. Al Siam, A.; Salehin, M.M.; Alam, M.S.; Ahamed, S.; Islam, M.H.; Rahman, A. Paddy seed viability prediction based on feature fusion of color and hyperspectral image with multivariate analysis. Heliyon 2024, 10, e36999. [Google Scholar] [CrossRef]
  10. He, X.; Feng, X.; Sun, D.; Liu, F.; Bao, Y.; He, Y. Rapid and nondestructive measurement of rice seed vitality of different years using near-infrared hyperspectral imaging. Molecules 2019, 24, 2227. [Google Scholar] [CrossRef] [PubMed]
  11. Zou, Z.Y.; Chen, J.; Zhou, M.; Zhao, Y.; Long, T.; Wu, Q.; Xu, L. Prediction of peanut seed vigor based on hyperspectral images. Food Sci. Technol. 2022, 42, e32822. [Google Scholar] [CrossRef]
  12. Lakshmanan, M.K.; Boelt, B.; Gislum, R. A chemometric method for the viability analysis of spinach seeds by near infrared spectroscopy with variable selection using successive projections algorithm. J. Near Infrared Spectrosc. 2023, 31, 24–32. [Google Scholar] [CrossRef]
  13. Ambrose, A.; Lohumi, S.; Lee, W.H.; Cho, B.K. Comparative nondestructive measurement of corn seed viability using Fourier transform near-infrared (FT-NIR) and Raman spectroscopy. Sens. Actuators B Chem. 2016, 224, 500–506. [Google Scholar] [CrossRef]
  14. Pardo, G.P.; Pacheco, A.D.; Tomás, S.A.; Orea, A.C.; Aguilar, C.H. Characterization of aged lettuce and chard seeds by photothermal techniques. Int. J. Thermophys. 2018, 39, 118. [Google Scholar] [CrossRef]
  15. Wang, Q.; Zhang, Y.; Ge, H.Y.; Jiang, Y.; Qin, Y. Identification of rice freshness using terahertz imaging and deep learning. Photonics 2023, 10, 547. [Google Scholar] [CrossRef]
  16. Wu, J.Z.; Li, X.Q.; Liu, C.L.; Le, Y.U.; Sun, X.; Sun, L.J. Research on nondestructive testing of corn seed vigor based on THz-TDS reflection imaging. Spectrosc. Spectr. Anal. 2020, 40, 2840. [Google Scholar]
  17. Sano, N.; Rajjou, L.; North, H.M.; Debeaujon, I.; Marion-Poll, A.; Seo, M. Staying alive: Molecular aspects of seed longevity. Plant Cell Physiol. 2016, 57, 660–674. [Google Scholar] [CrossRef] [PubMed]
  18. Chen, B.X.; Fu, H.; Gao, J.D.; Zhang, Y.X.; Huang, W.J.; Chen, Z.J.; Yan, S.J.; Liu, J. Identification of metabolomic biomarkers of seed vigor and aging in hybrid rice. Rice 2022, 15, 7. [Google Scholar] [CrossRef] [PubMed]
  19. Corbineau, F. The effects of storage conditions on seed deterioration and ageing: How to improve seed longevity. Seeds 2024, 3, 56–75. [Google Scholar] [CrossRef]
  20. GB/T 3543.4-1995; Rules for Agricultural Seed Testing—Germination Test. China Standard Press: Beijing, China, 1995.
  21. Liu, H.; Zhu, Y.F.; Liu, X.; Jiang, Y.; Deng, S.; Ai, X.; Deng, Z. Effect of artificially accelerated aging on the vigor of Metasequoia glyptostroboides seeds. J. For. Res. 2020, 31, 769–779. [Google Scholar] [CrossRef]
  22. Wu, F.; Fang, Q.; Yan, S.W.; Pan, L.; Tang, X.; Ye, W. Effects of zinc oxide nanoparticles on arsenic stress in rice (Oryza sativa L.): Germination, early growth, and arsenic uptake. Environ. Sci. Pollut. Res. 2020, 27, 26974–26981. [Google Scholar] [CrossRef] [PubMed]
  23. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  24. Fan, Y.; Ma, S.; Wu, T. Individual wheat kernels vigor assessment based on NIR spectroscopy coupled with machine learning methodologies. Infrared Phys. Technol. 2020, 105, 103213. [Google Scholar] [CrossRef]
  25. Kandpal, L.M.; Lohumi, S.; Kim, M.S.; Kang, J.S.; Cho, B.K. Near-infrared hyperspectral imaging system coupled with multivariate methods to predict viability and vigor in muskmelon seeds. Sens. Actuators B Chem. 2016, 229, 534–544. [Google Scholar] [CrossRef]
  26. Gadotti, G.I.; Ascoli, C.A.; Bernardy, R.; Monteiro, R.D.C.; Pinheiro, R.D.M. Machine learning for soybean seeds lots classification. Eng. Agrícol. 2022, 42, e20210101. [Google Scholar] [CrossRef]
  27. Ahmad, N.A. Numerically stable locality-preserving partial least squares discriminant analysis for efficient dimensionality reduction and classification of high-dimensional data. Heliyon 2024, 10, e26157. [Google Scholar] [CrossRef]
  28. Ye, L.; Su, W.; Zou, J.; Ding, Z.; Luo, Y.; Li, W.; Zhou, Y.; Wu, H.; Yao, H. Ultra-broadband composite terahertz absorber prediction based on K-nearest neighbor. Opt. Laser Technol. 2024, 170, 110208. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of the working principle of the QT-TO1000 Terahertz Spectral Transmission 3D Imaging Scanner.
Figure 1. Schematic diagram of the working principle of the QT-TO1000 Terahertz Spectral Transmission 3D Imaging Scanner.
Agriculture 15 00034 g001
Figure 2. Flowchart of Rice Seed Viability Detection.
Figure 2. Flowchart of Rice Seed Viability Detection.
Agriculture 15 00034 g002
Figure 3. Seed Germination Box with Neatly Arranged Rice Seeds.
Figure 3. Seed Germination Box with Neatly Arranged Rice Seeds.
Agriculture 15 00034 g003
Figure 4. THz Time-Domain Spectra of Rice Seeds at Various Vigor Levels.
Figure 4. THz Time-Domain Spectra of Rice Seeds at Various Vigor Levels.
Agriculture 15 00034 g004
Figure 5. The results of feature extraction from the terahertz spectra of rice seeds obtained through the UVE algorithm.
Figure 5. The results of feature extraction from the terahertz spectra of rice seeds obtained through the UVE algorithm.
Agriculture 15 00034 g005
Figure 6. Feature Extraction Results of Terahertz Spectra of Rice Seeds Using the CARS Algorithm.
Figure 6. Feature Extraction Results of Terahertz Spectra of Rice Seeds Using the CARS Algorithm.
Agriculture 15 00034 g006
Figure 7. Visualization of Rice Seed Terahertz Spectra Based on the First Three Principal Components Extracted by PCA.
Figure 7. Visualization of Rice Seed Terahertz Spectra Based on the First Three Principal Components Extracted by PCA.
Agriculture 15 00034 g007
Figure 8. Confusion Matrix of the Full-Spectrum RF Model for the Prediction Results.
Figure 8. Confusion Matrix of the Full-Spectrum RF Model for the Prediction Results.
Agriculture 15 00034 g008
Figure 9. Confusion Matrix of the CARS-PLS-DA Model for the Prediction Results.
Figure 9. Confusion Matrix of the CARS-PLS-DA Model for the Prediction Results.
Agriculture 15 00034 g009
Figure 10. Confusion Matrix of the CARS-KNN Model for the Prediction Results.
Figure 10. Confusion Matrix of the CARS-KNN Model for the Prediction Results.
Agriculture 15 00034 g010
Table 1. Traditional Vigor Indicators of Rice Seeds Under Different Aging Days.
Table 1. Traditional Vigor Indicators of Rice Seeds Under Different Aging Days.
Aging Days (D)Total Number of SamplesGE (%)GIVIGR (%)
0608516.53735.9295
16078.312.67554.3190
2607010.34426.3283.3
36016.77.35259.0873.3
46011.76.43173.7065
56054.81110.7351.7
66002.9036.4840
Table 2. Rice Seed Vigor Classified by Different Germination Days.
Table 2. Rice Seed Vigor Classified by Different Germination Days.
Aging Days (D)Total SamplesD1D2D3D4D5D6D7D8D9D 10D 11D 12No Germination
High ViabilityLow ViabilityNo Viability
060112421451 3
16000727133 31 6
260 11229422 10
360 37227121 116
460 2520821 1 21
560 314111 2 29
660 149613 36
Note: “D” represents the specific day on which seed germination was observed.
Table 3. RF Model Validation Results for Full-Concentration THz Spectra of Rice Seeds with Different Feature Extraction Methods.
Table 3. RF Model Validation Results for Full-Concentration THz Spectra of Rice Seeds with Different Feature Extraction Methods.
Modeling MethodPre-Processing MethodsNumber of
Variables
TreeNumber of Test
Samples
Number of Correct
Predictions
Test AccuracyOut-of-Bag Error Rate
(%)
RFNone3375421059994.29%5.71
CARS22301059691.43%8.5
UVE141931059893.33%6.67
PCA53231059287.62%12.38
Table 4. Performance Validation of the PLS-DA Model on Full-Concentration THz Spectra of Rice Seeds with Different Feature Extraction Techniques.
Table 4. Performance Validation of the PLS-DA Model on Full-Concentration THz Spectra of Rice Seeds with Different Feature Extraction Techniques.
Methods
Modeling Method
Pre-Processing MethodsNumber of
Variables
PCTraining Set Accuracy
(%)
Number of Test SamplesNumber of Correct
Predictions
Prediction
Accuracy
(%)
PLS-DANone3375594.601059792.38
CARS22594.2910510095.24
UVE141593.971059995.24
PCA53794.601059792.38
Table 5. Validation Results of the THz Spectral Grouping KNN Model Using Different Feature Extraction.
Table 5. Validation Results of the THz Spectral Grouping KNN Model Using Different Feature Extraction.
Methods
Modeling Method
Pre-Treatment MethodsNumber of VariablesK-ValueTraining Set Accuracy (%)Number of Test Set
Samples
Number of Correct
Predictions
Test Accuracy (%)
KNNNone3375593.0210510095.24
CARS22794.6010510297.14
UVE1412388.891059495.24
PCA53395.5610510095.24
Table 6. Results of Different Classification Models for Rice Seed Vigor Based on Terahertz Spectroscopy.
Table 6. Results of Different Classification Models for Rice Seed Vigor Based on Terahertz Spectroscopy.
Modeling MethodPre-Treatment MethodsNumber of VariablesNumber of Test Set
Samples
Number of Correct
Predictions
Number of Incorrect
Predictions
Test Accuracy
(%)
RFNone3375105100595.24
PLS-DACARS22105100595.24
KNNCARS22105102397.14
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, J.; Xu, S.; Huang, Z.; Liu, W.; Zheng, J.; Liao, Y. Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy. Agriculture 2025, 15, 34. https://doi.org/10.3390/agriculture15010034

AMA Style

Hu J, Xu S, Huang Z, Liu W, Zheng J, Liao Y. Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy. Agriculture. 2025; 15(1):34. https://doi.org/10.3390/agriculture15010034

Chicago/Turabian Style

Hu, Jun, Sijie Xu, Zhikai Huang, Wennan Liu, Jiahao Zheng, and Yuxi Liao. 2025. "Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy" Agriculture 15, no. 1: 34. https://doi.org/10.3390/agriculture15010034

APA Style

Hu, J., Xu, S., Huang, Z., Liu, W., Zheng, J., & Liao, Y. (2025). Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy. Agriculture, 15(1), 34. https://doi.org/10.3390/agriculture15010034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop