CN101430276B - Wavelength variable optimization method in spectrum analysis - Google Patents
Wavelength variable optimization method in spectrum analysis Download PDFInfo
- Publication number
- CN101430276B CN101430276B CN2008102395880A CN200810239588A CN101430276B CN 101430276 B CN101430276 B CN 101430276B CN 2008102395880 A CN2008102395880 A CN 2008102395880A CN 200810239588 A CN200810239588 A CN 200810239588A CN 101430276 B CN101430276 B CN 101430276B
- Authority
- CN
- China
- Prior art keywords
- wavelength
- wavelength variable
- value
- variable
- variables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 91
- 238000005457 optimization Methods 0.000 title claims description 23
- 238000010183 spectrum analysis Methods 0.000 title abstract description 10
- 238000007781 pre-processing Methods 0.000 claims description 23
- 238000012937 correction Methods 0.000 claims description 20
- 238000002329 infrared spectrum Methods 0.000 claims description 14
- 238000010238 partial least squares regression Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 7
- 238000010219 correlation analysis Methods 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 6
- 238000003379 elimination reaction Methods 0.000 claims description 6
- 238000004611 spectroscopical analysis Methods 0.000 claims description 4
- 238000011426 transformation method Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 abstract description 45
- 230000003595 spectral effect Effects 0.000 abstract description 19
- 239000011159 matrix material Substances 0.000 description 34
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 230000002068 genetic effect Effects 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 5
- 239000008103 glucose Substances 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 210000002381 plasma Anatomy 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 238000002922 simulated annealing Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 239000012491 analyte Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000005033 Fourier transform infrared spectroscopy Methods 0.000 description 1
- 108010015776 Glucose oxidase Proteins 0.000 description 1
- 239000004366 Glucose oxidase Substances 0.000 description 1
- HTTJABKRGRZYRN-UHFFFAOYSA-N Heparin Chemical compound OC1C(NC(=O)C)C(O)OC(COS(O)(=O)=O)C1OC1C(OS(O)(=O)=O)C(O)C(OC2C(C(OS(O)(=O)=O)C(OC3C(C(O)C(O)C(O3)C(O)=O)OS(O)(=O)=O)C(CO)O2)NS(O)(=O)=O)C(C(O)=O)O1 HTTJABKRGRZYRN-UHFFFAOYSA-N 0.000 description 1
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000000862 absorption spectrum Methods 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 229940116332 glucose oxidase Drugs 0.000 description 1
- 235000019420 glucose oxidase Nutrition 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 229960002897 heparin Drugs 0.000 description 1
- 229920000669 heparin Polymers 0.000 description 1
- WPYVAWXEWQSOGY-UHFFFAOYSA-N indium antimonide Chemical group [Sb]#[In] WPYVAWXEWQSOGY-UHFFFAOYSA-N 0.000 description 1
- 230000031700 light absorption Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 230000002572 peristaltic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 239000010453 quartz Substances 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Landscapes
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention discloses a method for optimizing wavelength variable in spectral analysis. The method comprises the steps as follows: obtained original spectrum is pretreated to obtain a spectral array with useless information eliminated; the purity value of each wavelength variable is calculated in the obtained spectral array to select the wavelength variable with maximum purity value as a first wavelength variable; the relative weighting function of no. j wavelength variable and selected (j-1) wavelength variables is calculated, and the purity value of each wavelength variable after the relative weighting function is added is calculated; the wavelength variable with the maximum purity value is selected as no. j wavelength variable, wherein, j is the integral more than or equal to 2; partial least square regression modeling is carried out by optimized different quantities of the wavelength variables, and predicted root mean square error is calculated; when the predicted root mean square error is minimum, the wavelength variable combination selected for modeling is the optimized wavelength variable combination. The quantity of the wavelength variables selected by the method is small, and the method can minimize redundant information and can improve modeling speed and efficiency obviously.
Description
Technical Field
The invention relates to a spectral analysis technology, in particular to a method for optimizing wavelength variables in spectral analysis.
Background
The method is a new technology for rapidly and nondestructively detecting the content or the property of the components in the sample by combining the spectrum analysis technology of a multivariate calibration model. Because the absorption spectrum changes when the content or the property of the component of the sample to be detected changes, a multivariate calibration model is established by correlating the spectrum of the sample with the concentration or the property of the sample, and then the unknown component concentration or the unknown component property in the sample to be detected is predicted through the multivariate calibration model and the spectrum information of the sample to be detected. However, since spectral information is complicated and easily overlapped due to the presence of various disturbances, it is necessary to extract useful information by removing redundant information from the complicated spectral information to create a multivariate calibration model with high efficiency and high accuracy.
However, when the multivariate calibration model is established, in order not to lose the information of the spectrum, the modeling may be performed by using the spectrum data in all the wavelength ranges, but the modeling by using all the spectrum data has a large calculation amount, poor spectrum selectivity, large spectrum noise at some wavelengths and less useful information. Therefore, how to select the wavelength variable to obtain the most effective spectrogram information in the spectrum, simplify the data operation, and enable the multivariate correction model to have the best prediction capability is an important problem for establishing the multivariate correction model.
In fact, there are many preferred methods of wavelength variation, such as: the wavelength variable optimization is carried out by adopting a forward selection variable method, a backward deletion variable method or a stepwise regression method in the multiple regression analysis, however, theoretically, the wavelength variable optimization methods are all directed at data without correlation, and under the condition that multiple correlations are serious, the reliability of the conclusion obtained by the methods is influenced to a certain extent; for another example: the method comprises a correlation analysis method, a display variation analysis method, a method for optimizing wavelength variables according to the ratio of regression coefficients and spectrum residuals of a multivariate calibration model of a measured component, and the like, but the wavelength variable optimization method is only suitable for spectrum applications with measurement conditions which are not particularly complex, and has a not obvious effect on improving the quality of the multivariate calibration model.
At present, some global optimization methods are applied to wavelength variable optimization, such as a simulated annealing method, a genetic algorithm and the like, wherein the simulated annealing method is a random search method developed by inspiring of a metal heating technology; the genetic algorithm is a method for searching an optimal solution by simulating a natural evolution process of an organism by using a computer. Although search algorithms such as simulated annealing method and genetic algorithm have quite strong search capability, the parameter setting of the method is complex, the capability of searching the global optimal solution and the local optimal solution is influenced, and the parameter setting also depends on the experience of researchers and the grasp of the researched problems, so that the method has certain subjectivity and randomness. In addition, when the genetic algorithm is adopted for wavelength variable optimization, although the prediction capability of the multivariate correction model in a single experiment is high, the adaptability of the multivariate correction model is low due to certain randomness, so the robustness and the adaptability of the multivariate correction model are not improved by the wavelength optimized by the genetic algorithm.
In view of the above, the main objective of the present invention is to provide a method for optimizing wavelength variables in spectral analysis, which can improve modeling efficiency and prediction accuracy.
In order to achieve the above object, the present invention provides a method for optimizing wavelength variation in spectral analysis, comprising:
acquiring near infrared spectrum data of a sample through a near infrared spectrometer, and preprocessing the currently acquired near infrared spectrum data to obtain a near infrared spectrum without useless information; according to the preprocessed near infrared spectrum, calculating purity values of all wavelength variables, selecting the wavelength variable with the maximum purity value as the 1 st wavelength variable, and applying an MATLAB program to automatically select the first j wavelength variables in sequence; calculating a correlation weight function of the jth wavelength variable and the selected first (j-1) wavelength variables, calculating purity values of the wavelength variables after the correlation weight function is added, and selecting the wavelength variable with the maximum purity value as the jth wavelength variable, wherein j is an integer greater than or equal to 2; performing partial least squares regression by using the optimized wavelength variables with different numbers to establish a multivariate correction model, and calculating and predicting a root mean square error; when the predicted root mean square error is minimum, the wavelength variable combination selected by modeling is the optimal wavelength variable combination; predicting a sample which is pre-configured and used as a prediction set by adopting a multivariate correction model; the preprocessing is to process the collected near infrared spectrum data by adopting a correlation analysis method, a useless information variable elimination method or a wavelet transformation method. Wherein the purity value is a percentage of a standard deviation of the wavelength variable to a mean value after adding a compensation factor.
The preferred different number of wavelength variables may be the first j wavelength variables selected in sequence.
The method for optimizing wavelength variables in spectral analysis provided by the invention comprises the steps of preprocessing the spectral data of samples in a correction sample set, removing noise, background interference and information irrelevant to an analyte, calculating the purity value of each wavelength variable in a spectrum matrix after preprocessing, selecting the wavelength variable with the maximum purity value as the 1 st wavelength variable, and when calculating the purity value of the jth (j is more than or equal to 2) wavelength variable, recording the correlation weight function of the jth wavelength variable and the selected previous (j-1) wavelength variables into the correlation weight function. The above process has good repeatability and no randomness, and is a deterministic algorithm. The method only needs to set a compensation factor parameter when calculating the purity value, and can overcome the problem of complex parameter setting in the prior wavelength variable optimization method. It can be seen that the preferred method of wavelength variation of the present invention is simple and easy to implement.
In addition, the method of the present invention is a self-modeling wavelength variable selection method, i.e. analysis is performed on the data of the spectrum itself, unlike some prior art wavelength variable selection methods that relate to concentration information. Moreover, since the selected wavelength variable is the original wavelength, it can be used as a reference in analyzing the prediction result to evaluate the information of the molecules or groups in the substance to be measured.
In addition, when the Method of the invention is applied to modeling of the selected wavelength variables, different numbers of wavelength variables can be sequentially selected to perform Partial Least Squares (PLS) regression modeling, and the Root Mean Square Error (RMSEP) of Prediction is calculated, and when the Root Mean Square Error is minimum, the wavelength variable combination selected by modeling is the most preferable wavelength variable combination, so that the Prediction precision of the established multivariate correction model can be obviously improved.
It can be seen that the wavelength variable optimization method of the present invention can minimize redundant information; the method can solve the problem of collinearity among wavelength variables due to the introduction of the related weight function, so that the number of the selected wavelength variables is small, and the method can establish a spectrum quantitative correction model with higher prediction precision through the selected few wavelength variables, thereby obviously improving the modeling speed and efficiency.
Drawings
FIG. 1 is a schematic flow diagram of a preferred method of wavelength variation in spectroscopic analysis in accordance with the present invention;
FIG. 2(a) is a graph of the original near infrared spectrum before pretreatment according to an embodiment of the method of the present invention;
FIG. 2(b) is a diagram of a near infrared spectrum after being preprocessed by a wavelet transform method according to an embodiment of the method of the present invention;
FIG. 3(a) is a graph of the purity value curve at each wavelength for an embodiment of the method of the present invention when selecting a second wavelength variable;
FIG. 3(b) is a graph of the distribution of the standard deviation curves at each wavelength for an embodiment of the method of the present invention when selecting a second wavelength variable;
FIG. 4 is a distribution diagram of RMSEP values obtained when different numbers of wavelength variables are selected in sequence for modeling according to an embodiment of the method of the present invention;
FIG. 5 is a graph of a preferred wavelength variation for an embodiment of the method of the present invention;
FIG. 6 is a diagram illustrating the predicted results of a PLS multivariate calibration model established by optimal wavelength variable combinations according to an embodiment of the method of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The basic idea of the invention is: firstly, preprocessing the spectrum data of a sample in a correction sample set, then carrying out self-model wavelength variable optimization on the preprocessed spectrum data, and selecting the wavelength variable with the maximum purity value as the 1 st wavelength variable; when the purity value of the jth wavelength variable (j is more than or equal to 2) is calculated, the purity value of the jth wavelength variable is counted into a correlation weight function of the jth wavelength variable and the selected previous (j-1) wavelength variable; then sequentially selecting different numbers of wavelength variables to perform PLS regression modeling, and calculating a predicted root mean square error; when the root mean square error is predicted to be the smallest, the selected modeling wavelength variable combination is the most preferable wavelength variable combination.
It should be noted that, before the wavelength variable optimization is performed, the spectrum samples obtained in advance through experiments can be divided into a training set and a prediction set, wherein the training set samples are used for performing the wavelength variable optimization and the multivariate correction model training, and the prediction set samples are used for evaluating the wavelength variable optimization and the prediction accuracy of the multivariate correction model.
Generally, the spectral measurement data of the sample may contain useless information such as high-frequency noise caused by instrument noise, measurement condition changes and the like, and background interference generated by light absorption of other chemical components, so the preprocessing is mainly to remove the information in the spectrum which is irrelevant to the component concentration or property of the sample to be measured, to ensure that the selected variables are related to the component concentration or property parameter of the sample to be measured as much as possible, and further improve the spectral quality.
In fact, there are several methods for preprocessing the raw spectra: correlation analysis, garbage variable elimination, wavelet transform, and the like, the implementation of the preprocessing method will be described in detail below by way of example.
A first method of spectral preprocessing, correlation analysis, comprising the steps of:
step a101, correlating the measured component concentration or component property data Y (n × 1) with the spectral data X (n × m) of the sample, and obtaining a correlation coefficient C at each wavelength according to formula (1):
wherein n is the number of samples, m is the number of variables, xiAs elements in the spectral data X, yiIs an element in data Y, and x and Y are x respectivelyiAnd yiMean of the columns.
Step A102, setting a threshold value C of a correlation coefficient0The correlation coefficient exceeding the threshold value, i.e. Ck>C0The corresponding kth wavelength is selected.
Step A103, forming a new matrix X by the selected wavelength variablesNEWFor further spectral processing.
A second spectrum preprocessing method, a useless information variable elimination method, which comprises the following steps:
step A201, performing PLS regression analysis on a sample collection spectrum matrix X (nxm) and a concentration matrix Y (nx1), and selecting the number f of principal components, wherein f is a positive integer;
where n represents the number of samples and m represents the number of wavelength variables.
A202, generating a random noise matrix R (n × m), and combining X and R into a matrix XR (n × 2 m);
here, the combined matrix XR has the first m columns as X and the last m columns as R.
Step A203, performing PLS regression on the matrixes XR and Y, removing the interactive verification of one sample each time to obtain a regression coefficient vector B, and obtaining n PLS regression coefficients to form a matrix B (n multiplied by 2 m).
Step A204, calculating the standard deviation std (B) and mean (B) of the matrix B (n × 2m) by columns, and then calculating CiMean (bi)/std (bi), where i is 1, 2.
Step A205, at [ m +1, 2m ]]The interval is the maximum absolute value C of Cmax=max(abs(C))。
Step A206 at [1, m]Interval removal matrix X corresponds to Ci<CmaxAnd the remaining variables are combined into a new matrix X selected by a garbage variable elimination methodNEWAnd preparing for subsequent spectrum processing.
The third spectrum preprocessing method, wavelet transform method. The wavelet transform has the characteristic of multi-resolution analysis, and because the background generated by the absorption of noise and other component light in a spectrum signal is mostly represented by a low-scale detail coefficient and a high-scale approximate component, various noise, background interference and other useless information can be removed simultaneously by utilizing the wavelet transform. The process of preprocessing the original spectrum by wavelet transform comprises the following steps:
step A301, generating a random noise and background matrix R (n × m), and combining a sample collection spectrum matrix X (n × m) and R into a matrix XR (n × 2 m);
wherein n is the number of samples, m is the number of variables, m is X before the combined matrix XR, and R is in the last m.
Step a302, performing wavelet decomposition on each signal of the matrix XR, selecting a wavelet basis and a wavelet decomposition layer number k, and obtaining a wavelet detail coefficient matrix D (k × 2m/(i × 2)) and an approximate component matrix a (k × 2m), where i is 1, 2.
Step A303, calculating the standard deviation std (di) and the mean value mean (di) of the matrix D (kX2 m/(i X2)) by columns, and then calculating Cdi=mean(di)/std(di)。
Step A304, calculating the standard deviation std (bi) and the mean (bi) of the matrix A (k × 2m) by columns, and then calculating Cai=mean(ai)/std(ai)。
Step A305, if [1, m/(i × 2)]Interval | CdiThe value of | and [ m +1, 2m]Interval | CdiIf the | values are similar, the component represented by the detail coefficient is removed as noise, where i is 1, 2. If [1, m ]]Interval | CaiThe value of | is less than [ m +1, 2m]Interval | CaiIf the value is | then the corresponding component is removed as background.
A306, reconstructing signals by using the denoised and background-removed low-frequency and high-frequency coefficients of a k-th layer, establishing a correction model by using a reconstructed spectrum signal, and selecting an optimal wavelet basis according to a predicted root-mean-square error; the reconstructed spectrum signals form a new spectrum matrix XNEWSo as to carry out the optimal equal spectrum processing work of the wavelength variable.
It should be noted that the preprocessing method according to the preferred embodiment of the present invention is not limited to the above-mentioned method, and any other preprocessing method for removing useless information such as noise and background should fall within the scope of the present invention.
Based on the pretreatment method described above, as shown in fig. 1, the flow chart of the preferred method for wavelength variation in spectroscopic analysis of the present invention comprises the following steps:
s101, preprocessing original spectrum data of all samples in an experiment to obtain a spectrum matrix with useless information eliminated;
by eliminating the unwanted information through pre-processing, the quality of the spectrum can be improved, making the relationship between the spectrum and the concentration or properties of the analyte component tighter. In the preprocessing method, the correlation analysis method and the useless information variable elimination method are suitable for the situation that the spectrum is not complex, and are generally only used for removing noise; the wavelet transformation method can simultaneously remove noise, background and other useless information by means of the multi-resolution analysis characteristics. The method of spectral pre-processing may be selected as appropriate.
Step S102, spectrum matrix X after pretreatmentNEWCalculating the purity value of each wavelength variable, and selecting the wavelength variable with the maximum purity value as the selected 1 st wavelength variable;
the purity value is used for representing the contribution of each variable to the multivariate correction model and can be expressed as the percentage of the discrete degree of the wavelength variable and the concentration trend after the compensation factor is added; the degree of dispersion is the standard deviation of the wavelength variable and the central tendency is the mean of the wavelength variable. In addition, in the case of weak signals and comparable noise, the adjustment can be made by a compensation factor. Generally, the compensation factor may be set to 1% to 5% of the mean value.
Spectral matrix XNEWIn (3), the method for calculating the purity value of each wavelength variable i is shown in the following formula (2):
pi,1=σi/(μi+α) (2)
wherein σiIs standard deviation, muiIs the mean value and alpha is the compensation factor. The purity value p of each wavelength variable i obtained by the formula (2)i,1Then, p is judgedi,1Size of value, having maximum pi,1The ith wavelength variable of the value is the selected 1 st wavelength variable.
Step S103, calculating a correlation weight function of the jth wavelength variable, calculating a purity value of each wavelength variable after the wavelength variable is included in the correlation weight function, and selecting the wavelength variable with the maximum purity value as the selected jth wavelength variable;
wherein j is an integer greater than or equal to 2; the correlation weight function is used for representing the importance degree of the relationship between the jth wavelength variable and the selected first (j-1) wavelength variable.
The general procedure for selecting the jth (j ≧ 2) wavelength variable is as follows:
calculating the spectral matrix XNEWLength l of each wavelength variable iiAs shown in equation (3):
wherein d isi,jIs a spectral matrix XNEWThe ith row and the jth column of the element are as follows:obtaining a relationship matrix C ═ D (l)TWherein D (l) is a radical of the elements d (l)i,jA matrix of compositions; and calculates a correlation weight function ρi,jAs shown in equation (4).
Where j denotes the number of the jth wavelength variable to be determined, Pj-1Indicating the number, p, of the (j-1) th wavelength variable that has been selected so far in the relation matrix C1Indicating the number of the selected 1 st wavelength variable in the relation matrix C, the j-th wavelength variable purity value pi,jComprises the following steps:
pi,j=ρi,j(σi/(μi+α)) (5)
having a maximum of pi,jThe selected j wavelength variable is the wavelength variable with the corresponding standard deviation value si,jThe expression is shown in formula (6).
si,j=ρi,jσi (6)
Generally, with a maximum of pi,jOf wavelength variations of valueStandard deviation si,jWill also be relatively high, and therefore the standard deviation si,jMay be used as a reference value to supervise the selected wavelength variable.
S104, sequentially selecting different numbers of wavelength variables to perform PLS regression modeling, and calculating a predicted Root Mean Square Error (RMSEP);
in general, the formula for RMSEP is:
The predicted root mean square error RMSEP reflects the degree to which the measured data deviates from the true value, and in general, a smaller value of RMSEP indicates a higher measurement accuracy, and therefore RMSEP can be used as a criterion for assessing the accuracy of this measurement process. When the value of RMSEP is minimal, the selected modeled wavelength variable combination is the optimal wavelength variable combination. In step S104, PLS regression analysis is performed on the selected wavelength variables sequentially and iteratively, and the number of the selected modeling wavelength variables is determined.
Step S105, judging whether the value of RMSEP reaches the minimum value, if so, executing step S106, otherwise, returning to step S103;
generally, when the value of RMSEP obtained by the current modeling is greater than the result obtained by the previous modeling, the RMSEP of the previous time is considered as the minimum value; if the value of RMSEP does not reach the minimum value, step S103 is repeated, and the iteration is performed in sequence until the optimal wavelength variable combination is selected.
Alternatively, a minimum RMSEP value may be preset as a condition for ending the loop at the start of the algorithm.
And step S106, when the RMSEP reaches the minimum value, the optimization of the wavelength variable at this time can be finished, and the modeling is finished.
The wavelength variable optimization method can be applied to MATLAB program design, and can realize automatic selection of the wavelength variable. And establishing a multivariate correction model according to the sequentially iteratively selected wavelength variables, judging by using the RMSEP of interactive verification, and determining the variable combination for modeling as the selected optimal wavelength variable combination when the RMSEP value is minimum.
It should be emphasized that, the wavelength variable optimization method of the present invention may also select a certain number of wavelength variables first, and then perform PLS regression modeling on the selected wavelength variables, wherein, more preferably, the different number of wavelength variables used in the modeling may be the wavelength variables selected in turn; otherwise, some variable selection methods (such as an exhaustive method, a genetic algorithm, a Monte Carlo method and the like) can be further combined to select part of the wavelength variables for modeling. Then calculating the value of RMSEP, and judging whether the value of RMSEP is minimum, if the minimum value of RMSEP appears, the selection of wavelength variable can be stopped; otherwise, the preference of the wavelength variant continues.
The preferred method of wavelength variation in the spectroscopic analysis of the present invention is described in detail below with reference to a specific embodiment.
Taking a human plasma near infrared spectrum blood sugar detection experiment as an example, the glucose concentration in a sample is subjected to prediction analysis. In the embodiment, a Fourier transform infrared spectrometer is adopted in the human plasma near infrared spectrum experiment, the spectrum acquisition range is 900-3600 nm, the adopted detector is an InSb detector cooled by liquid nitrogen, and in addition, instruments such as a 1mm quartz sample cell, a peristaltic pump automatic sample feeding system, a full-automatic biochemical analyzer and the like are selected in the experiment.
The preparation method of the plasma experimental sample comprises the following steps: adding heparin anticoagulant into whole blood, separating in centrifuge at 1500 rpm for 10min, adding glucose into separated blood plasma, and calibrating blood sugar value with full-automatic biochemical analyzer and glucose oxidase method. 33 samples are obtained in the plasma experiment, wherein 22 samples are used as a training set for training wavelength variable optimization and multivariate calibration models; and 11 samples are used as a prediction set to evaluate the wavelength variable optimization and the prediction accuracy of the multivariate correction model. In addition, the glucose concentration range is 10.4-44.4 mg/dL, and the glucose concentration range is randomly distributed, and the standard deviation of the glucose concentration range is 8.5 mg/dL.
The implementation process of the wavelength variable optimization method of the embodiment comprises the following steps:
step S201, preprocessing the spectrum data of all samples in the experiment, and removing useless information such as noise, background and the like.
The spectral analysis range is 1000-1890.36 nm, and each spectrum has 4711 wavelength variables in total. And removing useless information by applying a wavelet transform method to each spectrum, selecting a wavelet base db3, decomposing by Mallat, wherein the decomposition scale is 12, removing components corresponding to scales of 1, 2, 3 and 10 respectively, and then reconstructing the spectrum information.
As shown in FIG. 2(a), it is the original near-infrared spectrogram before the pre-processing of the present embodiment, and FIG. 2(b) is the near-infrared spectrogram after the pre-processing of the present embodiment by the wavelet transform methodThe infrared spectrogram is a new spectrum matrix X after pretreatmentNEW. Due to the spectral matrix XNEWThe spectral data amount in (1) is large (matrix of 33 × 4711), and can be seen from fig. 2 (b).
Step S202, spectrum data X after pretreatmentNEWOptimizing wavelength variable, and calculating sample spectral matrix X of training setNEWThe purity value of each wavelength variable i in the spectrum to select the 1 st wavelength variable;
the present embodiment sets the value of the compensation factor α to 5% of the mean value. By comparing the purity values at the respective wavelength variables calculated in step B1 with the standard deviation values, the maximum purity value p can be obtained1,10.0431, it can be determined that the selected 1 st wavelength variable is the wavelength variable with the variable index 1 (wavelength 1000 nm).
Step S203, selecting a second wavelength variable;
calculating the purity value and the standard deviation value after adding the correlation weight function according to the formula (5) and the formula (6), wherein the obtained corresponding result between the purity value and each wavelength variable is shown in fig. 3(a), which is a distribution diagram of a purity value curve at each wavelength when selecting the second wavelength variable in the present embodiment; the obtained correspondence between the standard deviation values and the respective wavelength variables is shown in fig. 3(b), which is a distribution graph of the standard deviation value curve at the respective wavelengths when the second wavelength variable is selected in the present embodiment. As can be seen from fig. 3, the second wavelength variable selected is the 4711 th wavelength variable (wavelength 1890.36nm) with a maximum.
Repeating the step S203, and further obtaining the 3 rd to 16 th wavelength variables as follows: 4223 th variable (wavelength 1730.7nm), 1944 th variable (wavelength 1241.16nm), 2655 th variable (wavelength 1361.29nm), 4700 th variable (wavelength 1886.44nm), 3281 th variable (wavelength 1488.1nm), 4684 th variable (wavelength 1880.76nm), 2973 th variable (wavelength 1422.88nm), 3857 th variable (wavelength 1627.6nm), 2814 th variable (wavelength 1391.4nm), 1232 nd variable (wavelength 1140.38nm), 2558 th variable (wavelength 1343.54nm), 4078 th variable (wavelength 1688.33 nm).
And step S204, judging the number of the selected wavelength variables.
And (3) establishing a PLS regression multivariate calibration model by using the sequentially selected 1 st to 16 th wavelength variables, and obtaining RMSEP when modeling is carried out by adopting different numbers of wavelength variables by an interactive verification method.
As shown in fig. 4, a distribution diagram of RMSEP values obtained when different numbers of wavelength variables are selected in sequence for modeling in the present embodiment is shown, where an inverted triangle in fig. 4 represents the RMSEP value, and a curve represents a trend of the RMSEP value changing with the change of the number of wavelength variables. It can be seen that when the first 14 wavelength variables are selected for modeling, the RMSEP value is the smallest, so the number of the optimal wavelength variables is 14, and at this time, the selected first 14 wavelength variables are combined into the optimal wavelength variable combination, as shown in fig. 5, which is a distribution diagram of the preferred wavelength variables of this embodiment, and can reflect the range of the preferred wavelength variables, the circle in fig. 5 represents the selected wavelength variable, and the curve represents the spectral curve.
By the wavelength variable optimization method of the present invention, a PLS regression multivariate calibration model is established, and the prediction set samples are predicted, so that the RMSEP value is 1.9mg/dL, the Correlation coefficient (Correlation) of the prediction results of the multivariate calibration model is 0.94, and the Correlation is shown in fig. 6, which is a schematic diagram of the prediction results of the PLS multivariate calibration model established for the optimal wavelength variable combination in this embodiment. In fig. 6, a black dot represents a correlation between the reference value and the predicted value, and a straight line represents a reference of the correlation. When the black point is closer to the straight line, the correlation between the predicted value and the reference value is larger.
As shown in table 1, in order to select prediction parameters for modeling in different wavelength variable ranges, 14 wavelength variables are selected for modeling by using a self-model wavelength variable optimization method in this embodiment, and compared with the effect of performing full-spectrum modeling by selecting 4711 wavelength variables, the self-model wavelength variable optimization method provided by the invention is not only simple and easy to implement, but also has high modeling efficiency, and the prediction accuracy of the established multivariate correction model is also significantly improved.
TABLE 1
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above description is only for the purpose of illustrating the present invention and is not intended to limit the scope of the present invention. For simplicity of explanation, the foregoing embodiments are described as a series of acts or combinations, but it will be appreciated by those skilled in the art that the invention is not limited by the order of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. In addition, any modification and variation of the present invention within the spirit of the present invention and the scope of the claims fall within the scope of the present invention.
Claims (4)
1. A method for wavelength variable optimization in spectroscopic analysis, the method comprising:
acquiring near infrared spectrum data of a sample through a near infrared spectrometer, and preprocessing the currently acquired near infrared spectrum data to obtain a near infrared spectrum without useless information;
according to the preprocessed near infrared spectrum, calculating purity values of all wavelength variables, selecting the wavelength variable with the maximum purity value as the 1 st wavelength variable, and applying an MATLAB program to automatically select the first j wavelength variables in sequence;
calculating a correlation weight function of the jth wavelength variable and the selected first (j-1) wavelength variables, calculating purity values of the wavelength variables after the correlation weight function is added, and selecting the wavelength variable with the maximum purity value as the jth wavelength variable, wherein j is an integer greater than or equal to 2;
performing partial least squares regression by using the optimized wavelength variables with different numbers to establish a multivariate correction model, and calculating and predicting a root mean square error; when the predicted root mean square error is minimum, the wavelength variable combination selected by modeling is the optimal wavelength variable combination; predicting a sample which is pre-configured and used as a prediction set by adopting a multivariate correction model;
wherein, the preprocessing is to process the collected near infrared spectrum data by adopting a correlation analysis method, a useless information variable elimination method or a wavelet transformation method;
wherein the purity value is a percentage of a standard deviation of the wavelength variable to a mean value after adding a compensation factor.
2. The method of claim 1, wherein the compensation factor is 1% to 5% of the mean value.
3. The method of claim 1, further comprising: a predicted RMS error value is preset to its minimum value.
4. The method according to claim 1 or 3, wherein when the predicted root mean square error value obtained by the current modeling is larger than the predicted root mean square error value obtained by the previous modeling, the previous predicted root mean square error value is the minimum value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102395880A CN101430276B (en) | 2008-12-15 | 2008-12-15 | Wavelength variable optimization method in spectrum analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102395880A CN101430276B (en) | 2008-12-15 | 2008-12-15 | Wavelength variable optimization method in spectrum analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101430276A CN101430276A (en) | 2009-05-13 |
CN101430276B true CN101430276B (en) | 2012-01-04 |
Family
ID=40645798
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2008102395880A Expired - Fee Related CN101430276B (en) | 2008-12-15 | 2008-12-15 | Wavelength variable optimization method in spectrum analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101430276B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104502306A (en) * | 2014-12-09 | 2015-04-08 | 西北师范大学 | Near infrared spectrum wavelength selecting method based on variable significance |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102313712B (en) * | 2011-05-30 | 2013-04-24 | 中国农业大学 | Correction method of difference between near-infrared spectrums with different light-splitting modes based on fiber material |
CN103955711B (en) * | 2014-05-20 | 2017-07-07 | 北京航空航天大学 | A kind of mode identification method in imaging spectral target identification analysis |
CN104865228B (en) * | 2015-06-02 | 2017-08-15 | 中国科学院上海技术物理研究所 | The quantitative LIBS detection method solved based on fusion entropy optimization |
CN105630743B (en) * | 2015-12-24 | 2018-05-01 | 浙江大学 | A kind of system of selection of spectrum wave number |
CN105891141A (en) * | 2016-03-30 | 2016-08-24 | 南京富岛信息工程有限公司 | Method for rapidly measuring gasoline property data |
CN106950193B (en) * | 2017-05-24 | 2019-04-26 | 长春理工大学 | Based on the near infrared spectrum Variable Selection from weight variable combination cluster analysis |
CN107728602A (en) * | 2017-09-28 | 2018-02-23 | 合肥工业大学 | A kind of Personalized service method of hydroforming equipment failure |
CN108918446B (en) * | 2018-04-18 | 2021-05-04 | 天津大学 | Ultra-low concentration sulfur dioxide ultraviolet difference feature extraction algorithm |
CN109409350B (en) * | 2018-10-23 | 2022-05-31 | 桂林理工大学 | PCA modeling feedback type load weighting-based wavelength selection method |
CN109596558B (en) * | 2018-12-17 | 2020-05-19 | 华中科技大学 | Spectrogram basic dimension correction and differential analysis method based on moving least square method |
CN110069895B (en) * | 2019-05-20 | 2021-06-01 | 中国水利水电科学研究院 | Method for establishing winter wheat nitrogen content full-growth period spectrum monitoring model |
CN110338813B (en) * | 2019-06-04 | 2022-04-12 | 西安理工大学 | Noninvasive blood glucose detection method based on spectrum analysis |
CN110726694A (en) * | 2019-10-22 | 2020-01-24 | 常州大学 | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm |
CN111982949B (en) * | 2020-08-19 | 2022-06-07 | 东华理工大学 | Method for separating EDXRF spectrum overlapping peak by combining fourth derivative with three-spline wavelet transform |
CN113406058A (en) * | 2021-05-28 | 2021-09-17 | 中国科学院沈阳自动化研究所 | LIBS iron ore pulp quantitative analysis method for screening PLS based on mutual information characteristics |
CN113592743B (en) * | 2021-08-11 | 2024-01-23 | 北华航天工业学院 | Spectral high-frequency information and low-frequency information separation and coupling method based on complex wavelet transformation |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101299022A (en) * | 2008-06-20 | 2008-11-05 | 河南中医学院 | Method for evaluating Chinese medicine comprehensive quality using near infrared spectra technique |
-
2008
- 2008-12-15 CN CN2008102395880A patent/CN101430276B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101299022A (en) * | 2008-06-20 | 2008-11-05 | 河南中医学院 | Method for evaluating Chinese medicine comprehensive quality using near infrared spectra technique |
Non-Patent Citations (3)
Title |
---|
李丽娜等.基于交互式自模型混合物分析的近红外光谱波长变量优选方法.《分析化学》.2009,第37卷(第6期),823-827. * |
柳艳云等.近红外分析中光谱波长选择方法进展与应用.《药物分析杂志》.2010,第30卷(第5期),968-975. * |
褚小立等.近红外分析中光谱预处理及波长选择方法进展与应用.《化学进展》.2004,第16卷(第4期),528-542. * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104502306A (en) * | 2014-12-09 | 2015-04-08 | 西北师范大学 | Near infrared spectrum wavelength selecting method based on variable significance |
Also Published As
Publication number | Publication date |
---|---|
CN101430276A (en) | 2009-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101430276B (en) | Wavelength variable optimization method in spectrum analysis | |
Bai et al. | Accurate prediction of soluble solid content of apples from multiple geographical regions by combining deep learning with spectral fingerprint features | |
CN101915744B (en) | Near infrared spectrum nondestructive testing method and device for material component content | |
Xiaobo et al. | Variables selection methods in near-infrared spectroscopy | |
US5606164A (en) | Method and apparatus for biological fluid analyte concentration measurement using generalized distance outlier detection | |
JP3245157B2 (en) | Measurement and correction of spectral data | |
Pizarro et al. | Determination of the peroxide value in extra virgin olive oils through the application of the stepwise orthogonalisation of predictors to mid-infrared spectra | |
JP3323512B2 (en) | Biological fluid analysis using distance outlier detection | |
CN107632010B (en) | Method for quantifying steel sample by combining laser-induced breakdown spectroscopy | |
CN110567937A (en) | Competitive self-adaptive heavy-weighted key data extraction method for Raman spectrum analysis of insulating oil | |
Jiang et al. | Qualitative and quantitative analysis in solid-state fermentation of protein feed by FT-NIR spectroscopy integrated with multivariate data analysis | |
CN104990895A (en) | Near infrared spectral signal standard normal correction method based on local area | |
Chen et al. | A novel variable selection method based on stability and variable permutation for multivariate calibration | |
CN102128805A (en) | Method and device for near infrared spectrum wavelength selection and quick quantitative analysis of fruit | |
Shariati‐Rad et al. | Selection of individual variables versus intervals of variables in PLSR | |
Xin et al. | Construction of spectral detection models to evaluate soluble solids content and acidity in Dangshan pear using two different sensors | |
Povlsen et al. | Direct decomposition of NMR relaxation profiles and prediction of sensory attributes of potato samples | |
CN110672578A (en) | Model universality and stability verification method for polar component detection of frying oil | |
CN117874480A (en) | ICO-BOSS algorithm-based soil heavy metal spectral feature extraction method | |
CN116026780B (en) | Method and system for online detection of coating moisture absorption rate based on series strategy wavelength selection | |
CN114062306B (en) | Near infrared spectrum data segmentation preprocessing method | |
CN116399836A (en) | Cross-talk fluorescence spectrum decomposition method based on alternating gradient descent algorithm | |
Hao et al. | Application of effective wavelength selection methods to determine total acidity of navel orange | |
Priyadarshi et al. | Comparing various machine learning algorithms for sugar prediction in chickpea using near-infrared spectroscopy | |
CN117805024B (en) | Crisp pear saccharin degree detection method and device, cloud equipment and computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120104 Termination date: 20141215 |
|
EXPY | Termination of patent right or utility model |