CN118549367B - Seawater nitrate concentration measurement method based on improved least square method - Google Patents
Seawater nitrate concentration measurement method based on improved least square method Download PDFInfo
- Publication number
- CN118549367B CN118549367B CN202410976960.5A CN202410976960A CN118549367B CN 118549367 B CN118549367 B CN 118549367B CN 202410976960 A CN202410976960 A CN 202410976960A CN 118549367 B CN118549367 B CN 118549367B
- Authority
- CN
- China
- Prior art keywords
- nitrate
- training
- seawater
- data
- improved
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000013535 sea water Substances 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 90
- 229910002651 NO3 Inorganic materials 0.000 title claims abstract description 77
- NHNBFGGVMKEFGY-UHFFFAOYSA-N Nitrate Chemical compound [O-][N+]([O-])=O NHNBFGGVMKEFGY-UHFFFAOYSA-N 0.000 title claims abstract description 76
- 238000000691 measurement method Methods 0.000 title abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 79
- 238000002835 absorbance Methods 0.000 claims abstract description 58
- 238000001228 spectrum Methods 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 28
- 239000011159 matrix material Substances 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 239000000203 mixture Substances 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 33
- 238000000862 absorption spectrum Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 19
- 238000011156 evaluation Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000003595 spectral effect Effects 0.000 claims description 7
- 238000010998 test method Methods 0.000 claims description 7
- 238000010521 absorption reaction Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 5
- 238000012417 linear regression Methods 0.000 claims description 5
- 238000003491 array Methods 0.000 claims description 4
- 238000012512 characterization method Methods 0.000 claims description 4
- 230000002401 inhibitory effect Effects 0.000 claims description 4
- 238000010801 machine learning Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 2
- 239000000243 solution Substances 0.000 description 50
- 239000000523 sample Substances 0.000 description 25
- 230000000875 corresponding effect Effects 0.000 description 24
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- FGIUAXJPYTZDNR-UHFFFAOYSA-N potassium nitrate Chemical compound [K+].[O-][N+]([O-])=O FGIUAXJPYTZDNR-UHFFFAOYSA-N 0.000 description 4
- 239000012086 standard solution Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- -1 nitrate ions Chemical class 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 235000010333 potassium nitrate Nutrition 0.000 description 2
- 239000004323 potassium nitrate Substances 0.000 description 2
- 239000012488 sample solution Substances 0.000 description 2
- JHJLBTNAGRQEKS-UHFFFAOYSA-M sodium bromide Chemical compound [Na+].[Br-] JHJLBTNAGRQEKS-UHFFFAOYSA-M 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- CPELXLSAUQHCOX-UHFFFAOYSA-M Bromide Chemical compound [Br-] CPELXLSAUQHCOX-UHFFFAOYSA-M 0.000 description 1
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 1
- MMDJDBSEMBIJBB-UHFFFAOYSA-N [O-][N+]([O-])=O.[O-][N+]([O-])=O.[O-][N+]([O-])=O.[NH6+3] Chemical compound [O-][N+]([O-])=O.[O-][N+]([O-])=O.[O-][N+]([O-])=O.[NH6+3] MMDJDBSEMBIJBB-UHFFFAOYSA-N 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 229910052805 deuterium Inorganic materials 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 229910052736 halogen Inorganic materials 0.000 description 1
- 239000010842 industrial wastewater Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000000870 ultraviolet spectroscopy Methods 0.000 description 1
- 238000002211 ultraviolet spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/01—Arrangements or apparatus for facilitating the optical investigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/01—Arrangements or apparatus for facilitating the optical investigation
- G01N2021/0106—General arrangement of respective parts
- G01N2021/0112—Apparatus in one mechanical, optical or electronic block
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention relates to the field of seawater quality analysis, in particular to a seawater nitrate concentration measurement method based on an improved least square method, which comprises the following steps: step 1: preparing a solution sample to obtain an absorbance difference value array of the sample; step 2: preprocessing an absorbance difference value array to construct an absorbance difference value matrix; step 3: extracting sample composition corresponding to characteristic wavelength with highest weight in matrixA matrix separating the feature variable and the target variable; and step 4, adding a nitrate spectrum adjustment factor into the least square model to obtain an improved cost function, training the cost function, and selecting a spectrum adjustment factor gamma, and step 5, dividing a training set and a testing set, substituting gamma into the cost function for training, evaluating a training result, storing the trained model, and importing spectrum data to obtain the concentration of the seawater nitrate. The technical scheme provided by the invention effectively solves the challenges of spectrum overlapping interference, non-Gaussian noise, high-dimensional data, multiple collinearity and the like.
Description
Technical Field
The invention relates to the field of rapid analysis of seawater quality by a spectrometry, in particular to a seawater nitrate concentration measurement method based on an improved least square method.
Background
Nitrate is a common contaminant derived from agricultural emissions, industrial waste water and other human activities, and constitutes a potential risk to the ecosystem and public health. Therefore, the development of efficient, sensitive nitrate sensors is critical for environmental monitoring and management.
The principle of measuring nitrate by ultraviolet spectroscopy is based on the characteristic absorption of nitrate ions (NO 3-) in the ultraviolet band (200-240 nm). By measuring the data of the ultraviolet absorption spectrum of the water sample in the wave band by using an ultraviolet light source and a spectrum detector, the concentration of nitrate can be deduced through a series of calculation and modeling. The ultraviolet spectrum measurement has the advantages of rapidness, sensitivity and non-destructiveness, and is a common method for detecting the nitrate in the seawater at present.
As the seawater contains high-concentration chloride ions, bromide ions, organic matters and other substances, the absorption spectrum of the substances overlaps with the nitrate spectrum in the ultraviolet band, and the difficulty of establishing a seawater nitrate spectrum model is increased. The key to solving the seawater nitrate measurement is how to effectively separate nitrate spectrum data in the interference spectrum and obtain nitrate concentration.
In spectrum nitrate detection, fitting by using a least square method is a very common fitting method, and the least square method assumes that errors are gaussian distributed and are independently distributed at the same time, but actual spectrum data may be affected by noise, baseline drift and other interferences, so that the least square method cannot effectively process these non-gaussian noise and systematic errors. In multivariate spectral data, the absorption values of different wavelengths may be highly correlated, which may lead to multiple collinearity problems, making the estimation result of the least squares method unstable and the variance of the regression coefficients increasing.
In view of the foregoing, there is a need to devise a seawater nitrate measurement method that improves least squares estimation to solve the above-mentioned problems.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention aims at: the seawater nitrate concentration measuring method with the improved least square method can effectively solve the problems of spectrum overlapping interference, non-Gaussian noise, systematic error, multiple collinearity and the like in the prior art, improves fitting and predicting capabilities of a model, and provides a new solution for rapid and accurate measurement of the seawater nitrate concentration.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a seawater nitrate concentration measuring method based on an improved least square method comprises the following steps:
Step 1: preparing n groups of artificial seawater and nitrate-artificial seawater solution samples with different concentrations, measuring the absorption spectrum data of each sample, and calculating the absorbance data of each sample by utilizing the absorption spectrum data to obtain n groups of absorbance difference value arrays of nitrate-artificial seawater samples;
step 2: preprocessing the absorbance difference value sequence by adopting a filtering algorithm, and constructing the processed data into an absorbance difference value matrix ;
Step 3: extracting the absorbance difference matrixSample composition corresponding to characteristic wavelength with highest medium weightA matrix separating the feature variable and the target variable;
Step 4, adding the nitrate spectrum adjustment factor into a least square method model to obtain an improved cost function, substituting the characteristic variable and the target variable into the cost function, training for multiple times by using a ten-fold cross test method, and selecting the value of the nitrate spectrum adjustment factor after training as an optimal adjustment factor ;
Step 5, dividing the training set and the test set of the absorption spectrum data, and obtaining the optimal adjustment factor in the step 4Substituting the model into the cost function to perform machine learning model training, evaluating the training result, storing the trained model, and importing spectral data when in use to obtain the predicted concentration of the seawater nitrate.
The above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 1 comprises the following steps:
step 1-1: preparing artificial seawater solution and different nitrate-artificial seawater solutions;
Step 1-2: spectral data of the artificial seawater solution and the different nitrate-artificial seawater solutions were measured, and absorbance at each wavelength was calculated, and recorded as S 0 and S n, ,Wherein, the method comprises the steps of, wherein,Is sample 0 at wavelengthAbsorbance data at the point(s) where the absorbance data,The superscript of (1) represents the number of the sample, and the subscript represents the serial number of the wavelength;
step 1-3: respectively calculating the absorbance difference value of the different nitrate-artificial seawater solution and the artificial seawater solution at each corresponding wavelength Obtaining n groups of absorption luminosity difference value series of different nitrate-artificial seawater solutions,Wherein, the method comprises the steps of, wherein,Is the sample n at the wavelengthAbsorbance difference data at.
The above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 2 comprises the following steps:
Step 2-1: using a filtering algorithm on the absorbance difference Performing data smoothing processing, improving the signal-to-noise ratio, and inhibiting or eliminating random noise and high-frequency noise of signals;
Step 2-2: the absorbance difference value series of the absorbance difference values of the treated n groups of different nitrate-artificial seawater solutions are arranged as a matrix:
。
the above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 3 comprises the following steps:
Step 3-1: using Training a linear regression model for all wavelength column vectors in (1): where x represents a characteristic variable, y represents a target variable, Representative wavelengthThe corresponding wavelength vector, b, is the offset;
step 3-2: repeatedly calculating in the regression model by using different wavelengths to obtain the wavelengths corresponding to the feature vectors of the wavelengths with the highest absolute values of the weights ;
Step 3-3: samples corresponding to three wavelengths are extracted: Matching with the corresponding nitrate concentration Converting the sample data format, wherein 1 … … n represents the numbers of different nitrate-artificial seawater solutions;
step 3-4: separating a feature scalar x and a target variable y, wherein n groups of the feature variables x are expressed as: Wherein a, b, c represent wavelength numbers; n sets of said target variables y are expressed as: 。
in the above method for measuring the concentration of nitrate in seawater based on the improved least square method, in step 4, the cost function is: wherein, the method comprises the steps of, wherein, ,,Is a coefficient of three characteristic variables,Is composed of characteristic variable coefficients,,Is used for the column vectors of (a),Representing the optimal adjustment factor, T represents the matrix transpose,An index is represented for traversing all elements of the range 1-n.
According to the seawater nitrate concentration measurement method based on the improved least square method, the extreme value is obtained by deriving beta on the cost function, and a coefficient obtaining formula is obtained: wherein I is an identity matrix.
The above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 4 comprises the following steps:
step 4-1: the method comprises the steps of (1) standardizing a training set and a testing set, and processing data by using a standardized method to enable the mean value of the data to be 0 and the standard deviation to be 1;
step 4-2: adding a nitrate spectrum adjustment factor into the least square method model to obtain an improved cost function;
step 4-3: substituting the characteristic variable and the target variable into the cost function, training all samples by adopting a ten-fold cross test method, and determining coefficients according to a regression model Determining optimal adjustment factors in regression models。
The above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 4-3 comprises the following steps:
step a: dividing the power of 10 from minus 5 to power of 10 by logarithmic division to generate 50 parameters as optimal adjustment factors Is an alternative parameter to (a);
Step b: the training set adopts ten-fold cross test, namely, training is respectively carried out by nine folds, ten times of verification is carried out by using one fold as verification, and the decision coefficient of the ten times of verification is carried out As a model on the training set;
Step c: verifying in step b for 50 alternative parameters, and selecting decision coefficients The corresponding alternative parameter at the highest is taken as the optimal adjustment factor。
In the above method for measuring the concentration of nitrate in seawater based on the improved least square method, in the step b, a coefficient is determinedWhereinIs the actual value of the current,Is a predicted value of the current value,Is the average of the actual values.
The above method for measuring the concentration of nitrate in seawater based on the improved least square method, wherein the step 5 comprises the following steps:
step 5-1: carrying out training set and test set division on absorption spectrum data of n samples by using an SPXY algorithm;
Step 5-2: calculating the distance between samples by utilizing the spectrum data and the nitrate concentration to obtain the distribution of the characterization samples;
Step 5-3: dividing a training set and a testing set according to the proportion, and optimizing a cost function by using the training set to obtain a regression coefficient;
step 5-4: and performing performance evaluation on the trained cost function by using the test set.
The seawater nitrate concentration measuring method based on the improved least square method has the beneficial effects that: the improved least square method provided by the invention is used for measuring the concentration of the nitrate in the seawater, effectively solves the challenges of spectrum overlapping interference, non-Gaussian noise, high-dimensional data, multiple collinearity and the like, and compared with the traditional method, the method remarkably improves the fitting capacity and the prediction accuracy of the model and improves the generalization capacity of the model. Provides a new solution for quick and accurate measurement of the nitrate concentration of the seawater. By using the cost function, adding the super-parameters of the optimal adjustment factors obtained by the nitrate spectrum adjustment factors, the problem of overlarge characteristic weights is avoided, the influence of abnormal values on model training is reduced, the model is optimized more effectively, the risk of overfitting is reduced, and the performance of the model on a training set and a testing set is improved. So that the model better addresses the over-fitting problem to ensure that the model fits well over the training data and can be generalized over the unseen data.
Drawings
FIG. 1 is a flow chart of the preferred nitrate spectrum adjustment factor of the present invention;
FIG. 2 is a training flow chart of the present invention;
FIG. 3 is a graph showing the relationship between training set actual and predicted values according to the present invention;
FIG. 4 is a graph showing the relationship between actual and predicted values of a test set according to the present invention.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described examples are only some examples of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1-2, a seawater nitrate concentration measurement method based on an improved least square method comprises the following steps:
step 1: and (3) preparing n groups of artificial seawater and nitrate-artificial seawater solution samples with different concentrations, measuring absorption spectrum data of each sample, and calculating absorbance data of each sample by using the absorption spectrum data, wherein the absorbance data S 0 to S n are n+1 groups of data, so as to obtain n groups of absorbance difference value arrays of nitrate-artificial seawater samples.
Step 1-1: preparing artificial seawater solution and different nitrate-artificial seawater solutions.
Step 1-2: spectral data of the artificial seawater solution and the different nitrate-artificial seawater solutions were measured, and absorbance at each wavelength was calculated, and recorded as S 0 and S n,,Wherein, the method comprises the steps of, wherein,Is sample 0 at wavelengthAbsorbance data at the point(s) where the absorbance data,The superscript of (1) indicates the number of the sample, and the subscript indicates the number of the wavelength.
Step 1-3: respectively calculating the absorbance difference value of the different nitrate-artificial seawater solution and the artificial seawater solution at each corresponding wavelengthObtaining n groups of absorption luminosity difference value series of different nitrate-artificial seawater solutions,Wherein, the method comprises the steps of, wherein,Is the sample n at the wavelengthAbsorbance difference data at.
Step 2: preprocessing the absorbance difference value sequence by adopting a filtering algorithm, and constructing the processed data into an absorbance difference value matrix。
Step 2-1: data smoothing of NOS 1, NOS2, NOS3, ……NOSn using Savitzky-Golay filter algorithm by difference in absorbanceAnd carrying out data smoothing processing, improving the signal-to-noise ratio, and inhibiting or eliminating random noise and high-frequency noise of the signals.
Step 2-2: the absorbance difference value series of the absorbance difference values of the pretreated n groups of different nitrate-artificial seawater solutions are arranged as a matrix:
。
step 3: extracting the absorbance difference matrix Sample composition corresponding to characteristic wavelength with highest medium weightMatrix, separating characteristic variable and target variable.
The step is the preparation before training, because the previous matrix has extremely high dimension and wide wavelength range, the three wavelengths with highest weight are searched by using the step, namely the characteristic wavelengths are extracted, the absorbance data of all samples at the three wavelengths are extracted, and the three wavelength spectrum data of each sample is oneIs a matrix of (a) in the matrix.
Step 3-1: usingTraining a linear regression model for all wavelength column vectors in (1): where x represents a characteristic variable, y represents a target variable, Representative wavelengthThe corresponding wavelength vector, b, is the offset.
Step 3-2: repeatedly calculating in the regression model by using different wavelengths to obtain the wavelengths corresponding to the feature vectors of the wavelengths with the highest absolute values of the weights。
Step 3-3: samples corresponding to three wavelengths are extracted: Matching with the corresponding nitrate concentration The sample data format is converted to DATAFRAME format, where 1 … … n represents the number of different nitrate-artificial seawater solutions. The DATAFRAME format was converted because this is better imported in python.
Step 3-4: separating a feature scalar x and a target variable y, wherein n groups of the feature variables x are expressed as: Wherein a, b, c represent wavelength numbers; n sets of said target variables y are expressed as: 。
And 4, adding a nitrate spectrum adjustment factor into the least square method model to obtain an improved cost function, wherein the cost function is as follows: wherein, the method comprises the steps of, wherein, ,,Is the coefficient of three characteristic variables, respectively corresponding toIs used for the coefficient of (a),Is composed of characteristic variable coefficients,,Is used for the column vectors of (a),Representing the optimal adjustment factor, T represents the matrix transpose,An index is represented for traversing all elements of the range 1-n. Obtaining a coefficient formula by deriving an extremum value from beta on the cost function: wherein I is an identity matrix. Through the coefficient calculation formula, the parameter value of beta can be obtained.
Substituting the characteristic variable and the target variable into a cost function, training for multiple times by using a ten-fold cross checking method, and selecting the value of the nitrate spectrum adjustment factor with the best effect as the optimal adjustment factor。
Step 4-1: and (3) standardizing the training set and the testing set, and processing the data by using a standardized method to ensure that the mean value is 0 and the standard deviation is 1.
Step 4-2: and adding the nitrate spectrum adjustment factor into a least square method model to obtain an improved cost function.
Step 4-3: substituting the characteristic variable and the target variable into the cost function, training all samples by adopting a ten-fold cross test method, and determining coefficients according to a regression modelDetermining optimal adjustment factors in regression models。
In the training process, in order to obtain the optimal adjustment factorA ten-fold cross test method is adopted for all samples, and the determination coefficients of the regression model are firstly usedAnd determining the super-parameters in the regression model. Specifically:
step a: dividing the power of 10 from minus 5 to power of 10 by logarithmic division to generate 50 parameters as optimal adjustment factors Is an alternative parameter to (a). The parameter is nitrate spectrum adjustment factor, and the super parameter is optimal adjustment factor.
Step b: the training set adopts ten-fold cross test, namely, nine folds are used for training respectively, and one fold is used for verification, ten times of training are carried out, and the decision coefficients of a regression model of the ten times of training are determinedAs a model of the behavior on the training set. Different nitrate spectrum adjustment factors are brought by a ten-fold cross checking method, and the determination coefficient of the regression model with the best selection effect isThe highest nitrate spectrum adjustment factor value is used as the optimal adjustment factor of the final formal trainingNumerical values.
Step c: verifying in step b for 50 alternative parameters, and selecting decision coefficients of regression modelThe corresponding alternative parameter at the highest is taken as the optimal adjustment factor。
And 5, dividing the absorption spectrum data into a training set and a testing set, substituting the super parameter gamma obtained in the step 4 into the cost function to perform machine learning model training, and evaluating the training result. The trained model is stored, and the concentration of the seawater nitrate can be obtained by importing spectral data when the model is used.
Step 5-1: carrying out training set and test set division on absorption spectrum data of n samples by using an SPXY algorithm;
Step 5-2: calculating the distance between samples by utilizing the spectrum data and the nitrate concentration to obtain the distribution of the characterization samples;
Step 5-3: dividing a training set and a testing set according to the proportion, and training a cost function by using the training set to obtain a regression coefficient;
step 5-4: and performing performance evaluation on the trained cost function by using the test set.
The training set and the test set are divided by using the SPXY algorithm to divide the previous samples, so that training is facilitated. Using an optimal adjustment factor determined before a cost functionSubstituting the optimal adjustment factor obtained in the step 4 as the finally determined parameter to carry out final training.
Finally training to obtain a regression coefficient beta, and constructing a regression modelThe concentration of nitrate in seawater can be predicted by introducing the absorbance difference.
Example 2
As shown in fig. 1,2, 3 and 4, the method for measuring the nitrate concentration of the seawater based on the improved least square method comprises the following steps:
Step one: and (3) preparing n groups of artificial seawater and nitrate-artificial seawater solution samples with different concentrations, measuring absorption spectrum data of each sample, and calculating absorbance data of each sample by using the absorption spectrum data, wherein the absorbance data S 0 to S n are n+1 groups of data, so as to obtain n groups of absorbance difference value arrays of nitrate-artificial seawater samples.
Step 1: an artificial seawater solution was prepared, which contained 35g of sodium chloride and 86g of sodium bromide per liter of artificial seawater solution.
Step 2: and adding potassium nitrate into a part of the artificial seawater solution to prepare a nitrate-artificial seawater standard solution, wherein each liter of the nitrate-artificial seawater standard solution contains 1g of potassium nitrate.
Step 3: in order to prepare nitrate sample solution in training set more accurately, taking out Vml nitrate-artificial seawater standard solution, transferring into volumetric flask, diluting to 100mL with artificial seawater.Wherein V is the volume (mL) of the standard solution of nitrate-artificial seawater to be taken; n is the number of the nth solution, namely the solution, and represents the concentration of the nitrate solution to be 0.3nmg/L, and the maximum value of n is 200; v 0 is the volume (mL) of the currently numbered solution formulated.
Step 4: repeating the step 3 to prepare nitrate-artificial seawater solutions with different concentrations; the nitrate content in the solution was set to a minimum concentration of 0.3mg/L, a maximum concentration of 60mg/L, and a step size of 0.3mg/L, for a total of 200 samples.
Step 5: calculating absorbance at each wavelength according to the absorption spectrum of the artificial seawater solution:
the ultraviolet absorption spectrum of pure water is measured by adopting a deuterium halogen lamp light source and an optical fiber spectrometer, wherein the spectrometer is provided with 2048 pixels, light intensity data of 136 pixels can be recorded in a wavelength range from 200nm to 240nm, the wavelength values corresponding to the pixels in the optical fiber spectrometer are recorded as a sequence from small to large: 。
the transmitted light intensity of pure water is recorded as a plurality of columns pixel by pixel: 。
The ultraviolet absorption spectrum of the artificial seawater solution records the transmitted light intensity as a series of numbers in the order from small wavelength to large wavelength: 。
taking a pure water sample as a reference, and adopting a lambert-beer law absorbance formula: 。
The absorbance at each wavelength is calculated one by one, and the absorbance data of the artificial seawater solution is recorded as a series according to the order of the wavelengths from small to large :。
Step 6: the ultraviolet absorption spectrum of 200 parts of nitrate-artificial seawater solution is measured by repeating the step 5, and the spectrum light intensity is recorded as a series of: wherein, the method comprises the steps of, wherein, Indicating that sample n is at wavelengthThe upper label n is the serial number of nitrate-artificial seawater solution, and the lower label is the serial number of wavelength.
And absorbance data of nitrate-artificial seawater solution was calculated: wherein the superscript n of A is the number of nitrate-artificial seawater solution.
Step two: and carrying out data preprocessing to construct a data matrix.
Step 7: respectively calculating the absorption luminosity difference value of the nitrate-artificial seawater solution and the artificial seawater solution at each corresponding wavelength:
Wherein, Sample n at wavelengthAbsorbance difference data at.
Step 8: pairs of filtering algorithms using Savitzky-GolayAnd carrying out data smoothing processing, improving the signal-to-noise ratio, and inhibiting or eliminating random noise and high-frequency noise of the signals.
Step 9: the absorbance difference value series of the absorbance difference values of the pretreated n groups of nitrate-artificial seawater solutions are arranged as a matrix: Wherein each row represents one sample and each column represents absorbance difference data for 200 samples at the same wavelength.
Step three: the characteristic variable and the target variable are separated.
Step 10: using matricesTraining a linear regression model for all wavelength column vectors in (1): wherein, the method comprises the steps of, wherein, Representative wavelengthThe corresponding wavelength vector, b, is the offset.
For a linear regression model, the importance of a feature can be measured by the absolute value of the weight: wherein, the method comprises the steps of, wherein, Representative wavelengthA corresponding wavelength vector.
Step 11: repeating the calculating step 10 to find out that the wavelength corresponding to the feature vector with the highest absolute value of the three weights in the set is。
Step 12: samples corresponding to the three wavelengths are extracted,Matching with the corresponding nitrate concentrationThe data is converted to DATAFRAME format.
Step 13: the separation characteristic variable and the target variable characteristic variable are x.
WhereinThe upper label n of the (E) is the number of nitrate-artificial seawater solution, and the lower label is the wavelength number.
The target variable is y.
Wherein, the method comprises the steps of, wherein,The nitrate concentration of the sample solution is expressed in mg/L.
Step 14: and (3) standardizing the training set and the testing set, and processing the data by using a standardized method to ensure that the mean value is 0 and the standard deviation is 1.
Step four: and constructing a cost function.
Step 15: the cost function for improving the least squares regression model is calculated by the following formula:
wherein, the method comprises the steps of, wherein, The subscript of (a) is the number of nitrate-artificial seawater solution,Is the coefficient of three characteristic variables, respectively corresponding toIs used for the coefficient of (a),Is composed of characteristic variable coefficientsIs used for the column vectors of (a),Is the optimal adjustment factor for the device,Characteristic vector representing nitrate-artificial seawater solution sample with number nIs to be used in the present invention,The actual solution concentration of the nitrate-artificial seawater solution sample numbered n is shown.
Step 16: obtaining a coefficient formula by deriving an extremum value from beta on the cost function: wherein I is an identity matrix. Through the coefficient calculation formula, the parameter value of beta can be obtained.
Step 17: the evaluation of the model selects the MSE and the decision coefficient. Mean square error (MSE, mean Squared Error): wherein, the method comprises the steps of, wherein, Is the actual value of the current,Is the predicted value and n is the number of samples.
Score (the determining coefficient of the regression model),WhereinIs the average of the actual values.
Step 18: in the training process, in order to obtain the optimal adjustment factorA ten-fold cross test method is adopted for all samples, and the coefficients are determined according to a regression modelDetermining optimal adjustment factors in regression models. Specifically, 50 parameters are generated as optimal adjustment factors by logarithmic division from minus 5 th power of 10 to 2 nd power of 10Is then cross-checked in a training set, i.e. training is performed using nine of them, and ten times of verification is performed using one of them as verification, and the decision coefficients of the regression model of the ten times of verification are determinedAs a model of the behavior on the training set. Such verification is carried out on all 50 candidate parameters, and then the parameter corresponding to the highest decision coefficient of the regression model is selected as the optimal adjustment factor。
Step five: and evaluating the training result.
Step 19: determining an optimal adjustment factorAnd then, carrying out training set and test set division on absorption spectrum data of 200 samples by using an SPXY algorithm. When the distance between samples is calculated, the X variable (spectrum data) and the Y variable (nitrate concentration) are taken into consideration simultaneously, and the distance between the samples is calculated by utilizing the two variables so as to ensure the maximum characterization of the sample distribution, so that the training set samples have the difference and the representativeness, and the stability of the model is improved. The training set (160 samples) and the test set (40 samples) were divided in a ratio of 8:2. Training the regression model by using the whole training set to obtain a regression coefficient beta, namely a coefficient containing characteristic variablesIs a column vector of (a). The prediction result is shown in the figure, and a regression model can be constructed according to the regression coefficient beta obtained by trainingThe concentration of nitrate in seawater can be predicted by introducing the absorbance difference. And finally, performing performance evaluation on the model by using the test set.
Example 3
The embodiment is the same as the above embodiment, and the details are not repeated, which are different from the above embodiments:
using decision coefficients And a mean square error MSE as a model evaluation criterion, the mean square error (Mean Squared Error, MSE) being an evaluation index measuring the difference between the model predicted value and the actual observed value. It calculates the average of the squares of the differences between the predicted and actual values, typically used to evaluate the predictive performance of the regression model. A smaller MSE indicates more accurate prediction of the model. Determining coefficients(Coefficient of Determination) is an index that measures how well the regression model fits the observed data.The ratio of the variance of the dependent variable (target variable) that can be interpreted by the independent variable (characteristic variable) is reflected, and the value thereof ranges from 0 to 1. In particular, whenWhen=1, the model can perfectly account for all variations of the dependent variable; when (when)At =0, the model cannot account for any variation in the dependent variable, and the predictive effect is equivalent to random guess. In the case of a practical case of a car,The closer to 1, the better the model fits the data; the closer to 0, the worse the fitting of the model. After training the model by using the data of the whole training set, the evaluation results obtained according to the test set are shown in table 1.
Table 1: evaluation results
Mean Square Error (MSE) | Determining coefficient (R 2) | |
Training set | 2.8776992859 | 0.980878803 |
Test set | 4.4262397690 | 0.966265766 |
As can be seen from Table 1, the determination coefficients of the regression model of 0.96Indicating that the model is able to account for 96% variation in the dependent variable, in other words that the model fits the data very well. A smaller MSE indicates a smaller prediction error for the model, with a MSE of 4.4 indicating a square of the average prediction error of 4.4. In the view of the combination of all things,At 0.96, an MSE of 4.4 indicates that this regression model is able to fit and predict the concentration of the target variable, i.e., nitrate, at a very high level by the characteristic variable, i.e., spectrum.
The invention provides a seawater nitrate concentration measuring method for improving a least square method; the modeling method adopted in the technical scheme can effectively improve the adaptability of the model, can provide a new solution for detecting nitrate nitrogen in the water body, and can provide a certain reference for a rapid pollution-free water quality on-line monitoring scene.
The above embodiments are only for illustrating the inventive concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement it accordingly, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the essence of the present invention should be included in the scope of the present invention.
Claims (8)
1. The seawater nitrate concentration measuring method based on the improved least square method is characterized by comprising the following steps of:
Step 1: preparing n groups of artificial seawater and nitrate-artificial seawater solution samples with different concentrations, measuring the absorption spectrum data of each sample, and calculating the absorbance data of each sample by utilizing the absorption spectrum data to obtain n groups of absorbance difference value arrays of nitrate-artificial seawater samples;
step 2: preprocessing the absorbance difference value sequence by adopting a filtering algorithm, and constructing the processed data into an absorbance difference value matrix ;
Step 3: extracting the absorbance difference matrixSample composition corresponding to characteristic wavelength with highest medium weightA matrix separating the feature variable and the target variable;
Step 4, adding the nitrate spectrum adjustment factor into a least square method model to obtain an improved cost function, substituting the characteristic variable and the target variable into the improved cost function, training for multiple times by using a ten-fold cross test method, and selecting the value of the nitrate spectrum adjustment factor after training as an optimal adjustment factor The improved cost function is as follows: wherein, the method comprises the steps of, wherein, ,,Is a coefficient of three characteristic variables,Is composed of characteristic variable coefficients,,Is used for the column vectors of (a),Representing the optimal adjustment factor, T represents the matrix transpose,The representation index is used for traversing all elements in the range 1-n, and specifically comprises:
step 4-1: the method comprises the steps of (1) standardizing a training set and a testing set, and processing data by using a standardized method to enable the mean value of the data to be 0 and the standard deviation to be 1;
step 4-2: adding a nitrate spectrum adjustment factor into the least square method model to obtain an improved cost function;
step 4-3: substituting the characteristic variable and the target variable into the improved cost function, training all samples by adopting a ten-fold cross test method, and determining coefficients according to the improved cost function Determining an optimal adjustment factor in the improved cost function;
Step 5, dividing the training set and the test set of the absorption spectrum data, and obtaining the optimal adjustment factor in the step 4Substituting the model into the improved cost function to perform machine learning model training, evaluating a training result, storing the trained model, and importing spectral data when in use to obtain the predicted concentration of the seawater nitrate.
2. The method for measuring the nitrate concentration of the seawater based on the improved least square method according to claim 1, wherein the step 1 comprises:
step 1-1: preparing artificial seawater solution and different nitrate-artificial seawater solutions;
Step 1-2: spectral data of the artificial seawater solution and the different nitrate-artificial seawater solutions were measured, and absorbance at each wavelength was calculated, and recorded as S 0 and S n, ,Wherein, the method comprises the steps of, wherein,Is sample 0 at wavelengthAbsorbance data at the point(s) where the absorbance data,The superscript of (1) represents the number of the sample, and the subscript represents the serial number of the wavelength;
step 1-3: respectively calculating the absorbance difference value of the different nitrate-artificial seawater solution and the artificial seawater solution at each corresponding wavelength Obtaining n groups of absorption luminosity difference value series of different nitrate-artificial seawater solutions,Wherein, the method comprises the steps of, wherein,Is the sample n at the wavelengthAbsorbance difference data at.
3. The method for measuring the nitrate concentration of the seawater based on the improved least square method according to claim 2, wherein the step 2 comprises:
Step 2-1: using a filtering algorithm on the absorbance difference Performing data smoothing processing, improving the signal-to-noise ratio, and inhibiting or eliminating random noise and high-frequency noise of signals;
Step 2-2: the absorbance difference value series of the absorbance difference values of the treated n groups of different nitrate-artificial seawater solutions are arranged as a matrix:
。
4. the method for measuring the nitrate concentration of the seawater based on the improved least square method according to claim 3, wherein the step 3 comprises the following steps:
Step 3-1: using Training a linear regression model for all wavelength column vectors in (1): where x represents a characteristic variable, y represents a target variable, Representative wavelengthThe corresponding wavelength vector, b, is the offset;
step 3-2: repeatedly calculating in the regression model by using different wavelengths to obtain the wavelengths corresponding to the feature vectors of the wavelengths with the highest absolute values of the weights ;
Step 3-3: samples corresponding to three wavelengths are extracted: Matching with the corresponding nitrate concentration Converting the sample data format, wherein 1 … … n represents the numbers of different nitrate-artificial seawater solutions;
step 3-4: separating a feature scalar x and a target variable y, wherein n groups of the feature variables x are expressed as: Wherein a, b, c represent wavelength numbers; n sets of said target variables y are expressed as: 。
5. The method for measuring the nitrate concentration of the seawater based on the improved least square method as claimed in claim 1, wherein the coefficient formula is obtained by deriving an extremum for beta on a cost function: wherein I is an identity matrix.
6. The method for measuring the nitrate concentration of the seawater based on the improved least square method according to claim 5, wherein the step 4-3 comprises:
step a: dividing the power of 10 from minus 5 to power of 10 by logarithmic division to generate 50 parameters as optimal adjustment factors Is an alternative parameter to (a);
step b: the training set adopts ten-fold cross test, namely, nine folds are used for training respectively, and one fold is used as verification, ten times of training are carried out, and the decision coefficients of the ten times of training are determined As a model on the training set;
Step c: verifying in step b for 50 alternative parameters, and selecting decision coefficients The corresponding alternative parameter at the highest is taken as the optimal adjustment factor。
7. The method for measuring nitrate concentration in seawater based on the improved least square method as claimed in claim 6, wherein in said step b, coefficients are determinedWhereinIs the actual value of the current,Is a predicted value of the current value,Is the average of the actual values.
8. The method for measuring the nitrate concentration of the seawater based on the improved least square method according to claim 1, wherein the step 5 comprises:
step 5-1: carrying out training set and test set division on absorption spectrum data of n samples by using an SPXY algorithm;
Step 5-2: calculating the distance between samples by utilizing the spectrum data and the nitrate concentration to obtain the distribution of the characterization samples;
Step 5-3: dividing a training set and a testing set according to the proportion, and optimizing a cost function by using the training set to obtain a regression coefficient;
step 5-4: and performing performance evaluation on the trained cost function by using the test set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410976960.5A CN118549367B (en) | 2024-07-22 | 2024-07-22 | Seawater nitrate concentration measurement method based on improved least square method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410976960.5A CN118549367B (en) | 2024-07-22 | 2024-07-22 | Seawater nitrate concentration measurement method based on improved least square method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118549367A CN118549367A (en) | 2024-08-27 |
CN118549367B true CN118549367B (en) | 2024-10-18 |
Family
ID=92448358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410976960.5A Active CN118549367B (en) | 2024-07-22 | 2024-07-22 | Seawater nitrate concentration measurement method based on improved least square method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118549367B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101059429A (en) * | 2007-03-02 | 2007-10-24 | 内蒙古伊利实业集团股份有限公司 | Food nitrite and nitrate quantitative detection method |
JP2008145297A (en) * | 2006-12-11 | 2008-06-26 | Osaka Prefecture Univ | Concentration measurement method for nitrate ion and nitrite ion in seawater, measurement system, measurement program, and its recording medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110018134A (en) * | 2019-04-25 | 2019-07-16 | 中国农业科学院作物科学研究所 | A kind of method of near infrared spectroscopy measurement soybean water-soluble protein content |
CN110850020B (en) * | 2019-11-11 | 2022-03-29 | 中国药科大学 | Traditional Chinese medicine identification method based on artificial intelligence |
CN111239071B (en) * | 2020-02-19 | 2023-03-31 | 中国科学院烟台海岸带研究所 | Method for detecting concentration of nitrate in seawater by spectrometry |
CN111766210B (en) * | 2020-07-09 | 2023-03-14 | 中国科学院烟台海岸带研究所 | Near-shore complex seawater nitrate nitrogen multispectral measurement method |
CN112946156A (en) * | 2021-03-10 | 2021-06-11 | 安徽农业大学 | Method for rapidly judging grade of black tea |
WO2024052897A1 (en) * | 2022-09-05 | 2024-03-14 | B.G. Negev Technologies And Applications Ltd., At Ben-Gurion University | System for monitoring and controlling ammonium or ammonia concentration in soil and water |
-
2024
- 2024-07-22 CN CN202410976960.5A patent/CN118549367B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008145297A (en) * | 2006-12-11 | 2008-06-26 | Osaka Prefecture Univ | Concentration measurement method for nitrate ion and nitrite ion in seawater, measurement system, measurement program, and its recording medium |
CN101059429A (en) * | 2007-03-02 | 2007-10-24 | 内蒙古伊利实业集团股份有限公司 | Food nitrite and nitrate quantitative detection method |
Also Published As
Publication number | Publication date |
---|---|
CN118549367A (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111239071B (en) | Method for detecting concentration of nitrate in seawater by spectrometry | |
CN102004088B (en) | Method for measuring coal property on line based on neural network | |
CN103712939B (en) | A kind of pollutant levels approximating method based on uv-vis spectra | |
CN109187392B (en) | Zinc liquid trace metal ion concentration prediction method based on partition modeling | |
CN111965140B (en) | Wavelength point recombination method based on characteristic peak | |
CN105486655A (en) | Rapid detection method for organic matters in soil based on infrared spectroscopic intelligent identification model | |
CN108830253B (en) | Screening model establishing method, spectrum screening device and method | |
CN110749565A (en) | Method for rapidly identifying storage years of Pu' er tea | |
CN114646606A (en) | Spectrum water quality detection method | |
CN112504983A (en) | Nitrate concentration prediction method based on turbidity chromaticity compensation | |
CN112414957A (en) | Marine testing system, method and device | |
CN118549367B (en) | Seawater nitrate concentration measurement method based on improved least square method | |
CN110658162B (en) | Method for predicting crude oil concentration in rock debris extraction liquid by three-dimensional fluorescence spectrum | |
CN116953184A (en) | Construction method of residual chlorine online detection model and residual chlorine concentration online detection method | |
CN111141809A (en) | Soil nutrient ion content detection method based on non-contact type conductivity signal | |
CN105954206B (en) | The measurement method and system of purple maize leaf anthocyanin content | |
CN112326574B (en) | Spectrum wavelength selection method based on Bayesian classification | |
CN116399836A (en) | Cross-talk fluorescence spectrum decomposition method based on alternating gradient descent algorithm | |
CN114970722A (en) | Pollutant identification method and device, electronic equipment and storage medium | |
CN112595706A (en) | Laser-induced breakdown spectroscopy variable selection method and system | |
CN116660207B (en) | Method for determining characteristic spectrum in oil product quick detection and octane content detection system | |
JPH02290537A (en) | Method for estimating eating taste value by near infrared ray | |
CN112067577A (en) | Method, device and equipment for identifying overproof cream pigment based on support vector machine | |
Peng et al. | SPXY sample classification method and successive projections algorithm combined with near-infrared spectroscopy for the determination of total sugar content of southern xinjiang jujube | |
Aguilera et al. | PLS and PCR methods in the assessment of coastal water quality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |