Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
Inhibitory Effect and Mechanism of Dancong Tea from Different Harvesting Season on the α-Glucosidase Inhibition In Vivo and In Vitro
Previous Article in Journal
Scientific Mapping of Chia Protein Research: State of the Art and Future Trends
Previous Article in Special Issue
Combining Metal(loid) and Secondary Metabolite Levels in Olea europaea L. Samples for Geographical Identification
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms

School of Physical Science and Technology, Tiangong University, Tianjin 300387, China
*
Author to whom correspondence should be addressed.
Foods 2024, 13(24), 4182; https://doi.org/10.3390/foods13244182
Submission received: 15 November 2024 / Revised: 15 December 2024 / Accepted: 19 December 2024 / Published: 23 December 2024
Figure 1
<p>The structure of back propagation neural network (BPNN).</p> ">
Figure 2
<p>A comparison of seven edible oil species: (<b>a</b>) pre-processed Raman spectra, and (<b>b</b>) scatterplot of projections for the first two principal components on a 2D plane.</p> ">
Figure 3
<p>Representative Raman spectra for (<b>a</b>) camellia–rice blended-oil samples and (<b>b</b>) camellia–corn–rice blended-oil samples with varying adulteration rates. Note that in panel (<b>b</b>), it is the total adulteration rate that is noted in the plot. (<b>c</b>,<b>d</b>) are zoom-ins of the different spectral ranges for two spectra of the camellia–corn–rice oil sample when the total adulteration rate reaches 50%. The adulteration rate for rice bran oil and corn oil are noted accordingly.</p> ">
Figure 4
<p>Prediction of adulteration concentration for camellia–rice blended-oil samples using ICA coupled regression models: regression results for (<b>a</b>) BPNN, (<b>b</b>) PLSR, and (<b>c</b>) RF; residual distribution for (<b>d</b>) BPNN, (<b>e</b>) PLSR, and (<b>f</b>) RF. In panels (<b>d</b>–<b>f</b>), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.</p> ">
Figure 5
<p>Comparison among different models for camellia–rice blended-oil samples.</p> ">
Figure 6
<p>Prediction for camellia–corn–rice blended-oil samples using CARS-ICA-PLSR model: regression results for (<b>a</b>) camellia oil, (<b>b</b>) corn oil, and (<b>c</b>) rice bran oil; residual distributions for (<b>d</b>) camellia oil, (<b>e</b>) corn oil, and (<b>f</b>) rice bran oil. In panels (<b>d</b>–<b>f</b>), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.</p> ">
Figure 7
<p>Feature extraction results for camellia–corn–rice blended-oil samples using different methods: (<b>a</b>) ICA, (<b>b</b>) CARS, (<b>c</b>) CARS-ICA, and (<b>d</b>) ICA-CARS. Note that the ICA (CARS) selected spectral ranges are indicated by the shadowed areas (red circles). The overlap of the shadowed areas and red circles are the selected spectral variables by the corresponding dual feature extraction methods.</p> ">
Versions Notes

Abstract

:
The fast and accurate quantitative detection of camellia oil products is significant for multiple reasons. In this study, rice bran oil and corn oil, whose Raman spectra both hold great similarities with camellia oil, are blended with camellia oil, and the concentration of each composition is predicted by models with varying feature extraction methods and regression algorithms. Back propagation neural network (BPNN), which has been rarely investigated in previous work, is used to construct regression models, the performances of which are compared with models using random forest (RF) and partial least squares regression (PLSR). Independent component analysis (ICA), competitive adaptive reweighing sampling (CARS), and their dual combinations served to extract spectral features. In camellia oil adulteration with rice bran oil, both the ICA-BPNN and ICA-PLSR models are found to achieve satisfactory performances. For camellia oil adulteration with rice bran oil and corn oil, on the other hand, the performances of BPNN-based models are substantially deteriorated, and the best prediction accuracy is achieved by a PLSR model coupled with CARS-ICA. In addition to performance fluctuations with varying regression algorithms, the output for feature extraction method also played a vital role in ultimate prediction performance.

1. Introduction

Camellia oil is a type of edible oil extracted from camellia seeds. Being remarkably high in oleic acids and unsaturated fatty acids, camellia oil has a very similar fatty acid composition to olive oil, thus being dubbed as “oriental olive oil” [1,2]. The regular consumption of a camellia oil-rich diet is capable of not only lowering cholesterol levels, but is also able to reduce the probability of cardiovascular diseases [3]. Moreover, as a rich source of antioxidants, camellia oil can effectively prevent liver damage and gastrointestinal ulcers [4], and provide inhibitory effects on cancer-inducing virus [5]. The high nutritional value of camellia oil makes the oil a very good target for adulteration, which not only endangers the economic interests of the consumers, but also poses potential health risks to them. It is, therefore, increasingly urgent to develop a reliable and rapid method to quantitatively detect camellia oil-targeted adulteration.
Various analytical methods have been used to analyze camellia oil adulteration. Chromatography and mass spectrometry, which are probably the mostly used techniques in the field, have been utilized individually or in combined manners to provide predictions with high specificity and sensitivity [6]. However, for these techniques, the sample preparation is generally extensive, and the analysis process is always tedious and invasive. The spectroscopic techniques (involving Raman, IR, etc.), on the other hand, are receiving increasing attention since these techniques usually require only minimal sample pre-treatments and the measurements are easy to handle. Remarkably, these techniques are also non-invasive, which is very appealing for food-related analysis. Among the spectroscopic techniques, Raman spectroscopy is one of the most promising techniques for oil adulteration detection. In addition to advantages such as being fast and non-invasive, Raman spectroscopy demonstrates a very high sensitivity when discriminating materials of similar structures or compositions, and holds great potential to realize instrumental miniaturization. In addition, the technique has been reported to be highly sensitive to lipids [7], which makes the technique especially appropriate to analyze edible oil adulteration [8,9,10,11].
It is necessary to combine Raman spectroscopy with certain algorithms to reveal the relationship between spectra and the corresponding chemical compositions. In addition to traditional chemometric algorithms such as partial least squares regression (PLSR) and linear discriminant analysis (LDA) [12,13], algorithms originating from machine learning are currently receiving increasing attention due to their superiorities in data analysis. Taking back propagation neural network (BPNN) as an instance, it is widely used to solve complex nonlinear problems with strong approximation and generalization abilities. Being excellent in dealing with data of large volume and high dimension, BPNN is a very promising algorithm to realize spectra-based quantitative analysis [14].
Currently, however, there are only a very limited number of quantitative investigations on camellia oil adulteration using algorithms such as BPNN [15,16]. Furthermore, in the ongoing research on camellia oil-targeted adulteration, only limited oil species are considered [6]. In the previous work, adulterations by oil species including sunflower oil, soybean oil, and corn oil have been investigated [17,18,19,20]. However, adulteration with rice bran oil, of which the Raman spectra is very similar to that of camellia oil and the price is substantially lower, is scarcely considered [6].
In this work, with the help of Raman spectroscopy, a study to quantitatively detect binary and ternary camellia oil adulteration is conducted. Different algorithms including BPNN, random forest (RF), and PLSR are individually utilized to establish regression models in combination with feature extraction methods, including independent component analysis (ICA), competitive adaptive reweighing sampling (CARS), and their dual combinations. By doing this, we are not only able to find the optimal model to predict the adulteration rates, but we can also explore and compare the performance differences when the algorithm varies. A commercial Raman spectrometer is used to collect the spectra, which are then properly pre-processed before further analyses. Based on the principal component analysis (PCA) in this work, rice bran oil and corn oil demonstrate the largest similarities with camellia oil among all the oil species that have been characterized. For the sake of brevity, camellia and rice bran blended oil (abbreviated as camellia–rice oil in the following contents) is selected to be the studied binary adulterated oil, since when compared to it, there exists a considerable amount of work already conducted on camellia–corn blended oil products [12,21]. The blended oil obtained by mixing corn oil and rice bran oil into camellia oil (abbreviated as camellia–corn–rice oil in the following contents), on the other hand, serves as the ternary blended oil to be analyzed. The prediction results for binary and ternary adulteration are analyzed and compared accordingly, and the optimal regression models are found for each case.

2. Materials and Methods

2.1. Sample Preparation

Pure edible oil products including camellia oil, rice bran oil, corn oil, walnut oil, sunflower oil, sesame oil, and soybean oil were purchased at local supermarkets. Three pure oil samples were prepared for each oil species. Binary blended-oil samples were prepared by mixing camellia oil and rice bran oil accordingly. A total of 37 different adulteration rates were prepared, with a gradient of 2% when the concentration of rice bran oil (the adulterating oil) ranged from 2% to 60%, and a gradient of 5% when the concentration of rice bran oil ranged from 65% to 95%. For each adulteration rate, three samples were prepared, thus resulting in 111 samples in total. Ternary blended-oil samples, on the other hand, were prepared by mixing camellia oil, corn oil, and rice bran oil accordingly. The total adulteration concentration ranges from 10% to 50%, with a gradient of 5%. When the total adulteration was fixed, the individual concentration of rice bran oil and corn oil were altered, resulting in a set of 4 different concentration rates. For each adulteration rate, 2 samples were prepared, thus resulting in 72 samples in total. When preparing the blended-oil samples, each compositional oil product was pipetted and transferred to beakers which had been washed thoroughly with deionized water and dried afterwards. Subsequently, the beakers were placed on a magnetic stirrer (Surui Instruments, Changzhou, China) and vortexed for 5 min to ensure even oil mixing. All of the samples were stored in a cold and dark environment before spectral collection. When Raman measurements were to be conducted, 0.1 mL oil for each sample was extracted using a pipette and dropped onto a glass slide.

2.2. Raman Spectra Collection

In this work, a commercial Raman spectrometer (Horiba Scientific, Kyoto, Japan) was used for spectral collection. The excitation wavelength was 532 nm, with a maximum power of 100 mW which could be delicately tuned by using a set of built-in neutral density filters. A long-working distance objective lens was used to collect the Raman signals and its magnification was 50×. The spectral range was 300–3200 cm−1 and the spectral resolution was 0.5 cm−1. For each collected spectrum, the integration time was 20 s and the accumulation number was 2 to ensure a good signal-to-noise ratio (SNR). The inevitable non-uniformness in focus position was considered, and the spectra of each sample were collected at 10 randomly selected positions and averaged afterwards. Thus, we have a total of 1110 and 720 spectra for the binary and ternary adulteration case, respectively. When subjected to modeling, the whole set of spectra was divided into a training set and a testing set in a 3:1 ratio, and the validation set was extracted from the training set by 10-fold cross-validation [22,23].

2.3. Raman Data Pre-Processing

The collected spectra are often affected by interference factors originating from the light source, the instrumental noise, and the fluorescent backgrounds, resulting in noise and baseline drift in the raw data. Thus, it is necessary to perform pre-processing before further analysis. The multiplicative scatter correction (MSC) [24,25] was used to remove the interference from scattering, and a polynomial fitting method was used to remove the fluorescence backgrounds. Subsequently, the spectra were further processed by the Savitzky–Golay (S-G) [26,27] convolution smoothing to further reduce the interference from the noise. The corresponding formulas are listed as follows:
X M S C = X X ¯ k = 1 m X k X ¯ 2 m 1
where m is the number of variables and X ¯ is the average for all spectra.
x k ,     S G = x ¯ k = 1 H i = ω + ω x k + i h i
where h i is the smoothing coefficient and H is the normalization factor, H = i = ω + ω h i .

2.4. Spectral Feature Extraction

Multi-collinearity and redundant information in original spectra substantially lower modeling efficiency and deteriorate model accuracy and robustness. Feature extraction will effectively solve these issues, which makes itself a necessity in quantitative modeling. The algorithms used in this work include PCA, ICA, and CARS. While PCA was solely used to visualize the vibrational distinctions among different oil species, the other two algorithms were used to aid quantitative modeling. In the ternary case, ICA and CARS were combined to realize dual feature extraction to further improve the regression performances. The software used in this work was Python 3.9 for spectra pre-processing, feature extraction and modeling.
PCA projects the original data into a new coordinate system of lower dimension by linear transformation and seeks covariance maximization. By obtaining the eigenvalues and vectors of the covariance matrix, vectors corresponding to the top eigenvalues are selected since they are most representative of the original data [28,29,30].
ICA is aimed to extract mutually statistically independent components by linear transformation. Opposite to methods such as PCA, ICA seeks the maximization in statistical independence for the resulting components, which generally offers a better demonstration of data structure and hidden features [31,32].
CARS is a method utilizing the “survival of the fittest” principle. By adaptively reweighted sampling, wavelength variables with large absolute regression coefficients are selected, while variables with trivial coefficients are removed. The optimal subset of variables is determined by cross-validation, since it is expected to minimize the root mean square error (RMSE) [33,34].

2.5. Modeling Algorithms

2.5.1. Back Propagation Neural Network (BPNN)

BPNN is a widely used feedforward neural network which consists of an input layer, a series of hidden layers, and an output layer. As the algorithm’s most distinctive feature, the signals propagate forwards in the network while the errors propagate backwards [16]. By continuously adjusting the weights and the neuron bias, the actual outputs gradually approach the expected ones after a number of iterations [35]. The structure of BPNN is shown in Figure 1.
Despite its advantages, however, BPNN also suffers from severe disadvantages such as being easily to fall into local optimum. In this work, the gradient descent-based Levenberg–Marquardt (LM) algorithm is used to optimize the BPNN method [36]. Combined with the Newton method, the modeling strategy is adaptively adjusted, which not only circumvents the fatal shortcoming of BPNN, but also realizes a faster convergence speed [37].
The weight and the threshold adjustment formula for LM-coupled BPNN are listed as follows [38]:
Δ W = J T X · J X + U · I 1 J T X · e X
W k + 1 = W k J T X · J X + U · I 1 J T X · e X
e x = e 1 x · e 2 x . . . e i x T
where X is the input of BPNN, W K represents the weight vector of the kth iteration, I is the identity matrix, J ( X ) is the Jacobian matrix, U is a non-negative value, and e i x indicates the error between the network output and the actual value.

2.5.2. Partial Least Squares Regression (PLSR)

Roughly speaking, PLSR is a “modified” version of least squares regression in the latent space, used to tackle with the singularity of covariance matrix [39,40,41]. The latent variables, or PLSR factors, are extracted from the original data, which are then used in multiple linear regression to find out the relationship between the original spectral variables and the responses. PLSR is especially advantageous in cases where the number of variables is considerably larger than the number of samples, or where multi-collinearity exists in the variables.

2.5.3. Random Forest (RF)

RF, being an ensemble of decision trees, is a type of un-biased machine learning algorithm [40,42]. In the case of regression, the predicted result is the average of each tree in the forest. In the training process, decision trees are constructed by randomly selected subsets of data, and the corresponding performance is evaluated by cross-validation. Generally, the number of trees, the number of predictors to select at each node, and the minimum size of the nodes need to be tuned. In this work, the number of decision trees is determined to be 33 by multi-parameter debugging.

2.6. Model Evaluation Metrics

For regression models, the coefficient of determination (R2) and RMSE are commonly used evaluation metrics. R2 measures the correlation between the predicted value and the actual value, and a R2 close to 1 indicates the model’s good prediction of the target data. The calculation formula is listed as follows:
R 2 = 1 i = 1 n y i y ^ i 2 i = 1 n y i y ¯ 2
where y represents the true value of the sample, y ^ represents the predicted value, y ¯ represents the average of the true values, and n is the number of samples.
RMSE measures the prediction error of the model, and the closer its value is to 0, the better. The calculation formula is listed as follows:
R M S E = 1 n i = 1 n y i y ^ i 2

3. Results and Discussion

3.1. Raman Spectra

Figure 2a presents the pre-processed Raman spectra for seven species of pure edible oil. Some distinctive features are demonstrated depending on the oil species: for instance, walnut oil shows a quite evident peak at 1150 cm−1, which is absent in the other oil species. These distinctive features, however, are much weaker as compared to the spectral commonness. Except for sesame oil, all the other oil species show very similar spectral features in the 1200–1340 cm−1 range, the 1400–1500 cm−1 range, and the vicinity of the 1654 cm−1 peak, which are all strong spectral features. However, there are also nontrivial relative differences in the peak intensities, for instance, the 1265 cm−1 peak, the 1300 cm−1 peak, and the vicinity of the 1654 cm−1 peak.
PCA is conducted on the oil species in Figure 2a for quick discrimination, and Table S1 lists the percentage of variance for each principal components and their cumulative variances. When extracting the first two principal components, the cumulative variance contribution rate exceeds 90%, indicating that they carry the vast majority of the information in the Raman spectra. Based on the PCA results shown in Figure 2b, most of the oil species can be clearly distinguished, except for camellia oil, rice bran oil, and corn oil, whose PCA projections are located quite closely to each other, demonstrating the substantial similarity in their Raman spectra. Thus, it will be quite challenging to quantitatively analyze camellia–rice and camellia–corn–rice blended-oil samples.
Representative Raman spectra for the camellia–rice and camellia–corn–rice blended-oil samples with varying adulteration rates are shown in Figure 3, while the whole set of the spectra can be referred to in the Supplementary Materials (SM) as Figures S1 and S2. For the binary blended-oil samples in panel (a), no matter what the concentration rate is, they always demonstrate very similar spectral profiles. However, as the concentration of rice bran oil increases, the intensities of the 1600 cm−1 peak and the 1635 cm−1 peak apparently increase. For the ternary blended-oil samples in panel (b), spectral variations could also be tracked in certain ranges. Panels (c) and (d) show the comparison of two Raman spectra, for which they share the same total adulteration rate (50%) but differ in the individual adulteration rates for rice bran oil and corn oil. When the concentration of corn oil increases, an enhancement at the 1265 cm−1 peak and an attenuation at the 1605 cm−1 peak can be simultaneously observed. Oppositely, the increasing concentration of rice bran oil is readily revealed by the enhancement at the 1605 cm−1 and the 1635 cm−1 peak. The above observations ensure that Raman spectroscopy could be effectively used to quantitatively detect camellia–rice and camellia–corn–rice adulterations.

3.2. Regression Models

The regressions of camellia–rice blended-oil samples have been conducted by coupling two feature extraction methods (ICA and CARS) individually with BPNN, PLSR, and RF, respectively. Figure 4 presents the prediction results when ICA is coupled with the three modeling algorithms, and the results by CARS can be referred to by Figure S3 in SM. Taking the ICA-BPNN model as an instance, the predicted concentrations are located in the vicinity of the true ones without any obvious deviation, demonstrating a general linear relationship with a slope of 1. The majority of the errors between true and predicted concentrations, as indicated by panel (d), are within ±2%. Only very limited number of outliers are found, with values within ±5%. When switching to PLSR, the predicted results also follow a linear relationship with a slope close to 1. The residual distributions, however, are found to be more diverse. The errors now present more outliers with larger magnitudes, indicating a little less stability for the PLSR model. When RF is selected, the prediction results further deteriorate. While the deviation between the “Y = X” line (red) and the fitted line for the prediction (blue) is very trivial in panel (a) and (b), the deviation now is non-negligible in panel (c). Such an enlarged deviation is also demonstrated by the increasing number of outliers shown in panel (f).
The evaluation metrics for the different models are shown in Figure 5, with the values of R2 and RMSE listed in Table S2. From the perspective of R2, all the BPNN- and PLSR-based models achieve generally satisfactory results, while the performance of RF-based models is slightly inferior. In contrast, the difference from the perspective of prediction errors is larger. The RMSE is generally the smallest in the BPNN-based models, which is somewhat enlarged in the PLSR-based models and further enlarged in the RF-based models. When applying CARS instead of ICA, the corresponding prediction performances are slightly deteriorated. For instance, the R2 for ICA-PLSR model is 0.969, which goes to 0.967 when CARS is applied instead, and the RMSE is increased from 3.80 to 4.64. While performance differences are demonstrated by all the models, the BPNN-based models undergo the smallest performance fluctuation when switching from ICA to CARS. Generally speaking, both the ICA-BPNN and the ICA-PLSR model could provide a satisfactory prediction for the binary adulteration issue.
The quantitative analysis on camellia–corn–rice blended-oil samples is conducted using the same set of regression algorithms, that is, BPNN, PLSR and RF. Slightly different from the binary case, in addition to CARS and ICA, dual feature extraction methods including ICA-CARS and CARS-ICA are also applied to extract characteristic spectral ranges. Figure 6 presents the regression results for each compositional oil (that is, camellia oil, corn oil, and rice bran oil) by PLSR using CARS-ICA. Inspecting the panels (a–c), the predictions for all the three compositional oils are highly accurate, since the predicted results follow a linear relationship with a slope very close to 1. For rice bran oil and corn oil concentrations, the predictions turn out to be slightly more accurate than camellia oil, which is shown by panels (d–f). The errors of the former two compositions are generally within ±2% range, with only one outlier (less than 4%) shown when predicting rice bran oil concentration. On the other hand, the errors to predict camellia oil concentration are a little larger. While the majority of the errors are within the ±2% range, there are now more outliers, but the error magnitude is still no larger than 5%.
Tables S3–S5 list the evaluation metrics for predicting the concentration of camellia oil, corn oil, and rice bran oil using the four extraction feature methods coupled with the three regression algorithms. It is found that in contrast to the binary case, the feature extraction method now plays a vital role in prediction performances. Taking the prediction of corn oil concentration using PLSR as an instance, when ICA is used, the R2 is only 0.7549, which is improved to 0.848 when switching to CARS. When the dual feature extraction method CARS-ICA is used, R2 is further improved to be 0.989. The above observation turns out to hold in the vast majority of the predictions. Furthermore, the CARS-ICA coupled regression models demonstrate the best accuracy when the oil species varies. Taking BPNN-based models as an instance, the CARS-BPNN model achieves a R2 of 0.919 for camellia oil, but the prediction accuracy for corn oil and rice bran oil drastically deteriorates, ending up with a R2 of 0.688 and 0.457, respectively. The CARS-ICA counterpart, on the other hand, provides an R2 over 0.94 for all the compositional oil species. Through a comparison of the CARS-ICA coupled models, the CARS-ICA-PLSR model is found to provide the best prediction performance. Taking camellia oil as an instance, the R2 obtained by PLSR, BPNN, and RF is 0.968, 0.945, and 0.967, respectively. The prediction results for the BPNN and RF algorithm when coupled with CARS-ICA are presented in Figures S4 and S5 in SM.
Enlightened by the significant influence from feature extraction methods, we list all the extracted spectral ranges by the four methods for the ternary adulteration case in Figure 7. By a comparison between panels (a) and (b), it is found that CARS is able to grasp spectral ranges that comprise the characteristics for the compositional oil species: for instance, the 1265 cm−1 peak at which different oil species show relative differences in peak intensities. On the other hand, ICA fails to extract such characteristic features. CARS-ICA further narrows the selected spectral range and turns out to improve the final prediction performance. Such an improvement is possibly attributed to the further reduction in the redundancy in modeling inputs. ICA-CARS, on the other hand, ends up with the most inferior performance among all the methods. This is possibly due to the fact that the method has ruled out most of the characteristic spectral ranges by performing ICA at the first step, and the selected ranges are further narrowed by performing CARS afterwards, making ICA-CARS fail to provide an effective input for subsequent modeling. The above deduction is clearly supported by panel (d): for the isolated circles, which represent the output by ICA-CARS, none of them are characteristic for the compositional oil species. To provide a further insight into the role of feature extraction methods, Figure S6 presents the selected spectral ranges by CARS and ICA in the binary case. Unlike the ternary case, several selections of CARS and ICA overlap and both methods cover some spectral characteristics for the compositional oil species, which explains why both CARS and ICA are able to provide predictions with a similar level of accuracy.
For the models developed in this work, there is still a long way to go until we can realize their actual industrial application. However, within the scope of this work, we can still consider the time cost for different models and compare them in a more application-oriented perspective. The number of spectral variables serves as the indicator of the model’s complexity, and the total time duration to finish feature extraction and regression modeling serves as the indicator of the model’s time cost. The corresponding values are listed in Tables S6 and S7 for the binary and ternary case. Since very similar trends are demonstrated by both cases, here, we focus on the ternary case. Although feature extraction methods lower the complexity of the model, the time cost is different when the method varies. In our work, the CARS duration is roughly 40 s, in contrast to the ICA duration of nearly 100 s. Such a duration difference further leads to the contrast between CARS-ICA duration and ICA-CARS duration, that is, 54 s vs. 116 s. Based on previous contents, the CARS-ICA-PLSR model not only demonstrates the best performance, but also performs in a very time-saving manner. When switching from PLSR/BP to RF, the model duration is increased from tens of seconds to hundreds of seconds. Thus, the RF-based models not only demonstrate inferior performances, but also work in a more time-consuming way. For the BP- and PLSR-based models, it is the feature extraction duration that takes up the majority of the total duration. Thus, there is a general anti-correlation between the complexity and the total time cost. However, as indicated by our work, the lowering of complexity does not always lead to better performances (i.e., CARS-ICA-PLSR vs. ICA-CARS-PLSR); it is possible to achieve a quite satisfactory performance in a time-saving manner.

4. Conclusions

In this work, both binary and ternary camellia oil-targeted adulterations are analyzed quantitatively, using different feature extraction methods and regression algorithms. For camellia–rice blended-oil samples, both the ICA-BPNN and ICA-PLSR models have generally satisfactory prediction performances, demonstrating a high R2 and low RMSE. The performance for the RF-based models is somewhat inferior. The performance fluctuation when switching from ICA to CARS is not large. As for the camellia–corn–rice blended-oil samples, the best prediction is achieved by a PLSR model coupled with CARS-ICA dual feature extraction, demonstrating a high R2 for all the three compositional oil species. The performance difference among varying models is much larger in the ternary case than in the binary case. Such a difference is mostly rooted in the fact that CARS outstrips ICA in grasping the most characteristic spectral ranges in the ternary case, and the corresponding dual extraction method CARS-ICA further cuts off the redundancy and improves the subsequent regression modeling. The time costs of the models are also considered. It turns out that the CARS-ICA-PLSR model, which achieves the best performance in the ternary case, works in a very time-saving manner.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/foods13244182/s1: Figure S1: (a–g) Raman spectra for camellia–rice blended-oil samples with varying adulteration concentrations. The adulteration concentrations are marked in the plots accordingly; Figure S2: Raman spectra for camellia–corn–rice blended-oil samples with varying total adulteration concentrations. The adulteration concentrations are marked in the plots accordingly; Figure S3: Prediction of adulteration concentration for camellia–rice blended-oil samples using CARS coupled regression models: regression results for (a) BPNN, (b) PLSR, and (c) RF; residual distribution for (d) BPNN, (e) PLSR, and (f) RF. In panels (d–f), the dash-dotted horizontal line are for eye guidance purposes; Figure S4: Prediction for camellia–corn–rice blended-oil samples using CARS-ICA-BPNN model: regression results for (a) camellia oil, (b) corn oil, and (c) rice bran oil; residual distributions for (d) camellia oil, (e) corn oil, and (f) rice bran oil. In panels (d–f), the dash-dotted horizontal lines are for eye guidance purposes; Figure S5: Prediction for camellia–corn–rice blended-oil samples using CARS-ICA-RF model: regression results for (a) camellia oil, (b) corn oil, and (c) rice bran oil; residual distributions for (d) camellia oil, (e) corn oil, and (f) rice bran oil. In panels (d–f), the dash-dotted horizontal lines are for eye guidance purposes; Figure S6: Feature extraction results for camellia- rice blended-oil samples using different methods: (a) ICA, (b) CARS; Table S1: Cumulative Variance by different number of principal components; Table S2: Evaluation metrics of regression models for camellia–rice blended-oil samples; Table S3: Evaluation metrics of regression models for predicting camellia oil concentration in camellia–corn–rice blended-oil samples; Table S4: Evaluation metrics of regression models for predicting corn oil concentration in camellia–corn–rice blended-oil samples; Table S5: Evaluation metrics of regression models for predicting rice bran oil concentration in camellia–corn–rice blended-oil samples; Table S6: Time cost (duration) and complexity (number of spectral variables) for predicting rice bran oil concentration in camellia–rice blended-oil samples; Table S7: Time cost (duration) and complexity (number of spectral variables) for predicting camellia oil concentration in camellia–corn–rice blended-oil samples; Table S8: List of abbreviations.

Author Contributions

Conceptualization, H.L. and S.M.; methodology, H.L., S.M., N.L., and X.W.; software, S.M.; validation, S.M.; formal analysis, H.L. and S.M.; investigation, H.L. and S.M.; resources, H.L.; data curation, H.L. and S.M.; writing—original draft preparation, H.L.; writing—review and editing, H.L., S.M., N.L., and X.W.; visualization, H.L. and S.M.; supervision, H.L.; project administration, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ma, J.; Ye, H.; Rui, Y.; Chen, G.; Zhang, N. Fatty acid composition of Camellia oleifera oil. J. Für Verbraucherschutz Leb. 2010, 6, 9–12. [Google Scholar] [CrossRef]
  2. Chou, T.; Lu, Y.; Inbaraj, B.; Chen, B. Camelia oil and soybean-camelia oil blend enhance antioxidant activity and cardiovascular protection in hamsters. Nutrition 2018, 51–52, 86–94. [Google Scholar] [CrossRef] [PubMed]
  3. Wang, R.; Tung, Y.; Chen, S.; Lee, Y.; Yen, G. Protective effects of camellia oil (Camellia brevistyla) against indomethacin-induced gastrointestinal mucosal damage in vitro and in vivo. J. Funct. Foods 2019, 62, 103539. [Google Scholar] [CrossRef]
  4. Cheng, Y.; Lu, C.; Yen, G. Beneficial Effects of Camellia Oil (Camellia oleifera Abel.) on Hepatoprotective and Gastroprotective Activities. J. Nutr. Sci. Vitaminol. 2015, 61, S100–S102. [Google Scholar] [CrossRef]
  5. Akihisa, T.; Tokuda, H.; Ukiya, M.; Suzuki, T.; Enjo, F.; Koike, K.; Nikaido, T.; Nishino, H. 3-epicabraleahydroxylactone and other triterpenoids from camellia oil and their inhibitory effects on Epstein-Barr virus activation. Chem. Pharm. Bull. 2004, 52, 153–156. [Google Scholar] [CrossRef]
  6. Shi, T.; Wu, G.; Jin, Q.; Wang, X. Camellia oil authentication: A comparative analysis and recent analytical techniques developed for its assessment. A review. Trends Food Sci. Technol. 2020, 97, 88–99. [Google Scholar] [CrossRef]
  7. Czamara, K.; Majzner, K.; Pacia, M.; Kochan, K.; Kaczor, A.; Baranska, M. Raman spectroscopy of lipids: A review. J. Raman Spectrosc. 2014, 46, 4–20. [Google Scholar] [CrossRef]
  8. Barros, I.; Santos, L.; Filgueiras, P.; Romao, W. Design experiments to detect and quantify soybean oil in extra virgin olive oil using portable Raman spectroscopy. Vib. Spectrosc. Int. J. Devoted Appl. Infrared Raman Spectrosc. 2021, 116, 103294. [Google Scholar] [CrossRef]
  9. El-Abassy, R.M.; Donfack, P.; Materny, A. Rapid Determination of Free Fatty Acid in Extra Virgin Olive Oil by Raman Spectroscopy and Multivariate Analysis. J. Am. Oil Chem. Soc. 2009, 86, 507–511. [Google Scholar] [CrossRef]
  10. Xue, Y.; Jiang, H. Monitoring of Chlorpyrifos Residues in Corn Oil Based on Raman Spectral Deep-Learning Model. Foods 2023, 12, 2402. [Google Scholar] [CrossRef]
  11. Luo, J.; Liu, T.; Liu, Y. FT-NIR and Confocal Microscope Raman Spectroscopic Studies of Sesame Oil Adulteration. In Proceedings of the Computer & Computing Technologies in Agriculture V-ifip Tc 5/sig 51 Conference, Beijing, China, 29–31 October 2011. [Google Scholar]
  12. Du, Q.; Zhu, M.; Shi, T.; Luo, X.; Gan, B.; Tang, L.; Chen, Y. Adulteration detection of corn oil, rapeseed oil and sunflower oil in camellia oil by in situ diffuse reflectance near-infrared spectroscopy and chemometrics. Food Control 2021, 121, 107577. [Google Scholar] [CrossRef]
  13. Oussama, A.; Elabadi, F.; Platikanov, S.; Kzaiber, F.; Tauler, R. Detection of Olive Oil Adulteration Using FT-IR Spectroscopy and PLS with Variable Importance of Projection (VIP) Scores. J. Am. Oil Chem. Soc. 2012, 89, 1807–1812. [Google Scholar] [CrossRef]
  14. Zhong, Z.; Tang, S.; Peng, G.; Zhang, Y. A novel quantitative spectral analysis method based on parallel BP neural network for dissolved gas in transformer oil. In Proceedings of the 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), Xi’an, China, 25–28 October 2016. [Google Scholar]
  15. Li, J.; Li, T.; Zhang, J.; Zhang, W.; Gu, W. Adulteration detection model of Tea Oil research based on FTIR back-propagation neural network. In Proceedings of the 2021 International Conference on Computer Technology and Media Convergence Design (CTMCD), Sanya, China, 23–25 April 2021. [Google Scholar]
  16. Kuang, J.; Luo, N.; Hao, Z.; Xu, J.; He, X.; Shi, J. NI-Raman spectroscopy combined with BP-Adaboost neural network for adulteration detection of soybean oil in camellia oil. J. Food Meas. Charact. 2022, 16, 3208–3215. [Google Scholar] [CrossRef]
  17. Chu, X.; Wang, W.; Li, C.; Zhao, X.; Jiang, H. Identifying camellia oil adulteration with selected vegetable oils by characteristic near-infrared spectral regions. J. Innov. Opt. Health Sci. 2017, 11, 1850006. [Google Scholar] [CrossRef]
  18. Li, S.; Zhu, X.; Zhang, J.; Li, G.; Su, D.; Shan, Y. Authentication of Pure Camellia Oil by Using Near Infrared Spectroscopy and Pattern Recognition Techniques. J. Food Sci. 2012, 77, 374–380. [Google Scholar] [CrossRef]
  19. Han, J.; Sun, R.; Zeng, X.; Zhang, J.; Xing, R.; Sun, C.; Chen, Y. Rapid Classification and Quantification of Camellia (Camellia oleifera Abel.) Oil Blended with Rapeseed Oil Using FTIR-ATR Spectroscopy. Molecules 2020, 25, 2036. [Google Scholar] [CrossRef]
  20. Luo, Q.; Yu, Y.; Xu, Q.; Chen, Y.; Zheng, X. Detection of Adulteration in Camellia Oil Using Near-Infrared Spectroscopy. MATEC Web Conf. 2018, 232, 04081. [Google Scholar] [CrossRef]
  21. Liu, Q.; Gong, Z.; Li, D.; Wen, T.; Guan, J.; Zheng, W. Rapid and Low-Cost Quantification of Adulteration Content in Camellia Oil Utilizing UV-Vis-NIR Spectroscopy Combined with Feature Selection Methods. Molecules 2023, 28, 5943. [Google Scholar] [CrossRef]
  22. Malakouti, S.; Menhaj, M.; Suratgar, A. The usage of 10-fold cross-validation and grid search to enhance ML methods performance in solar farm power generation prediction. Clean. Eng. Technol. 2023, 15, 100664. [Google Scholar] [CrossRef]
  23. Ali, S.; Lalji, S.; Awan, Z.; Qasim, M.; Alshahrani, T.; Khan, F.; Ullah, S.; Ashraf, A. Prediction of asphaltene stability in crude oils using machine learning algorithms. Chemom. Intell. Lab. Syst. 2023, 235, 104784. [Google Scholar] [CrossRef]
  24. Dotto, A.; Dalmolin, R.; ten Caten, A.; Grunwald, S. A systematic study on the application of scatter-corrective and spectral-derivative preprocessing for multivariate prediction of soil organic carbon by Vis-NIR spectra. Geoderma 2018, 314, 262–274. [Google Scholar] [CrossRef]
  25. Lu, Y.; Qu, Y.; Feng, Z.; Song, M. Research on the method of choosing optimum wavelengths combination by using multiple scattering correction technique. Spectrosc. Spectr. Anal. 2007, 27, 58–61. [Google Scholar]
  26. Chen, H.; Pan, T.; Chen, J.; Lu, Q. Waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods. Chemom. Intell. Lab. Syst. 2011, 107, 139–146. [Google Scholar] [CrossRef]
  27. Delwiche, S.; Reeves, J. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: Example with Savitzky-Golay filters and partial least squares regression. Appl. Spectrosc. 2010, 64, 73–82. [Google Scholar] [CrossRef] [PubMed]
  28. Beattie, J.; Esmonde-White, F. Exploration of Principal Component Analysis: Deriving Principal Component Analysis Visually Using Spectra. Appl. Spectrosc. 2021, 75, 361–375. [Google Scholar] [CrossRef]
  29. Bossart, R.; Keller, H.; Kellerhals, H.; Oelichmann, J. Principal components analysis as a tool for identity control using near-infrared spectroscopy. J. Mol. Struct. 2003, 661, 319–323. [Google Scholar] [CrossRef]
  30. Zhang, X.; Qi, X.; Zou, M.; Liu, F. Rapid Authentication of Olive Oil by Raman Spectroscopy Using Principal Component Analysis. Anal. Lett. 2011, 44, 2209–2220. [Google Scholar] [CrossRef]
  31. Liu, H.; Wang, J. Integrating Independent Component Analysis and Principal Component Analysis with Neural Network to Predict Chinese Stock Market. Math. Probl. Eng. 2011, 2011, 382659. [Google Scholar] [CrossRef]
  32. Yang, W.; Si, Y.; Wang, D.; Zhang, G. A novel method for identifying electrocardiograms using an independent component analysis and principal component analysis network. Measurement 2020, 152, 107363. [Google Scholar] [CrossRef]
  33. Li, H.; Liang, Y.; Xu, Q.; Cao, D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 2009, 648, 77–84. [Google Scholar] [CrossRef]
  34. Pang, L.; Chen, H.; Yin, L.; Cheng, J.; Jin, J.; Zhao, H.; Liu, Z.; Dong, L.; Yu, H.; Lu, X. Rapid fatty acids detection of vegetable oils by Raman spectroscopy based on competitive adaptive reweighted sampling coupled with support vector regression. Food Qual. Saf. 2022, 6, fyac053. [Google Scholar] [CrossRef]
  35. Xu, Y.; Hassan, M.; Kutsanedzie, F.; Li, H.; Chen, Q. Evaluation of extra-virgin olive oil adulteration using FTIR spectroscopy combined with multivariate algorithms. Qual. Assur. Saf. Crops Foods 2018, 10, 411–421. [Google Scholar] [CrossRef]
  36. Song, J. Evaluation and Analysis of an Industrial Cluster Based on the BP Neural Network and LM Algorithm. Wireless Commun. Mob. Comput. 2022, 2022, 8964573. [Google Scholar] [CrossRef]
  37. Huang, X.; Cao, H.; Jia, B. Optimization of Levenberg Marquardt Algorithm Applied to Nonlinear Systems. Processes 2023, 11, 1794. [Google Scholar] [CrossRef]
  38. Hu, N.; Zhao, J.; Liu, Y.; Wang, M.; Liu, D.; Gong, Y.; Rao, X. Spectral Level Prediction Model of Ocean Ambient Noise Based on GA-LM-BP Neural Network. Acoust. Aust. 2023, 51, 265–278. [Google Scholar] [CrossRef]
  39. Weng, S.; Chen, C.; Li, M.; Zeng, X.; Zheng, S.; Zhang, J.; Chen, J.; Chen, L. Quantitative analysis of thiram based on SERS and PLSR combined with wavenumber selection. Anal. Methods 2014, 6, 242–247. [Google Scholar]
  40. Zhu, C.; Jiang, H.; Chen, Q. High Precisive Prediction of Aflatoxin B1 in Pressing Peanut Oil Using Raman Spectra Combined with Multivariate Data Analysis. Foods 2022, 11, 1565. [Google Scholar] [CrossRef]
  41. Zhu, J.; Jiang, X.; Rong, Y.; Wei, W.; Wu, S.; Jiao, T.; Chen, Q. Label-free detection of trace level zearalenone in corn oil by surface-enhanced Raman spectroscopy (SERS) coupled with deep learning models. Food Chem. 2023, 414, 135705. [Google Scholar] [CrossRef]
  42. Tian, H.; Wu, D.; Chen, B.; Yuan, H.; Yu, H.; Lou, X.; Chen, C. Rapid identification and quantification of vegetable oil adulteration in raw milk using a flash gas chromatography electronic nose combined with machine learning. Food Control 2023, 150, 109758. [Google Scholar] [CrossRef]
Figure 1. The structure of back propagation neural network (BPNN).
Figure 1. The structure of back propagation neural network (BPNN).
Foods 13 04182 g001
Figure 2. A comparison of seven edible oil species: (a) pre-processed Raman spectra, and (b) scatterplot of projections for the first two principal components on a 2D plane.
Figure 2. A comparison of seven edible oil species: (a) pre-processed Raman spectra, and (b) scatterplot of projections for the first two principal components on a 2D plane.
Foods 13 04182 g002
Figure 3. Representative Raman spectra for (a) camellia–rice blended-oil samples and (b) camellia–corn–rice blended-oil samples with varying adulteration rates. Note that in panel (b), it is the total adulteration rate that is noted in the plot. (c,d) are zoom-ins of the different spectral ranges for two spectra of the camellia–corn–rice oil sample when the total adulteration rate reaches 50%. The adulteration rate for rice bran oil and corn oil are noted accordingly.
Figure 3. Representative Raman spectra for (a) camellia–rice blended-oil samples and (b) camellia–corn–rice blended-oil samples with varying adulteration rates. Note that in panel (b), it is the total adulteration rate that is noted in the plot. (c,d) are zoom-ins of the different spectral ranges for two spectra of the camellia–corn–rice oil sample when the total adulteration rate reaches 50%. The adulteration rate for rice bran oil and corn oil are noted accordingly.
Foods 13 04182 g003
Figure 4. Prediction of adulteration concentration for camellia–rice blended-oil samples using ICA coupled regression models: regression results for (a) BPNN, (b) PLSR, and (c) RF; residual distribution for (d) BPNN, (e) PLSR, and (f) RF. In panels (df), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.
Figure 4. Prediction of adulteration concentration for camellia–rice blended-oil samples using ICA coupled regression models: regression results for (a) BPNN, (b) PLSR, and (c) RF; residual distribution for (d) BPNN, (e) PLSR, and (f) RF. In panels (df), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.
Foods 13 04182 g004
Figure 5. Comparison among different models for camellia–rice blended-oil samples.
Figure 5. Comparison among different models for camellia–rice blended-oil samples.
Foods 13 04182 g005
Figure 6. Prediction for camellia–corn–rice blended-oil samples using CARS-ICA-PLSR model: regression results for (a) camellia oil, (b) corn oil, and (c) rice bran oil; residual distributions for (d) camellia oil, (e) corn oil, and (f) rice bran oil. In panels (df), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.
Figure 6. Prediction for camellia–corn–rice blended-oil samples using CARS-ICA-PLSR model: regression results for (a) camellia oil, (b) corn oil, and (c) rice bran oil; residual distributions for (d) camellia oil, (e) corn oil, and (f) rice bran oil. In panels (df), the dash-dotted horizontal lines are for eye guidance purposes. The red (blue) dots show residuals with magnitudes larger (no larger) than 2%.
Foods 13 04182 g006
Figure 7. Feature extraction results for camellia–corn–rice blended-oil samples using different methods: (a) ICA, (b) CARS, (c) CARS-ICA, and (d) ICA-CARS. Note that the ICA (CARS) selected spectral ranges are indicated by the shadowed areas (red circles). The overlap of the shadowed areas and red circles are the selected spectral variables by the corresponding dual feature extraction methods.
Figure 7. Feature extraction results for camellia–corn–rice blended-oil samples using different methods: (a) ICA, (b) CARS, (c) CARS-ICA, and (d) ICA-CARS. Note that the ICA (CARS) selected spectral ranges are indicated by the shadowed areas (red circles). The overlap of the shadowed areas and red circles are the selected spectral variables by the corresponding dual feature extraction methods.
Foods 13 04182 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Ma, S.; Liang, N.; Wang, X. Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms. Foods 2024, 13, 4182. https://doi.org/10.3390/foods13244182

AMA Style

Liu H, Ma S, Liang N, Wang X. Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms. Foods. 2024; 13(24):4182. https://doi.org/10.3390/foods13244182

Chicago/Turabian Style

Liu, Henan, Sijia Ma, Ni Liang, and Xin Wang. 2024. "Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms" Foods 13, no. 24: 4182. https://doi.org/10.3390/foods13244182

APA Style

Liu, H., Ma, S., Liang, N., & Wang, X. (2024). Quantitatively Detecting Camellia Oil Products Adulterated by Rice Bran Oil and Corn Oil Using Raman Spectroscopy: A Comparative Study Between Models Utilizing Machine Learning Algorithms and Chemometric Algorithms. Foods, 13(24), 4182. https://doi.org/10.3390/foods13244182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop