CN103544392B

CN103544392B - Medical science Gas Distinguishing Method based on degree of depth study

Info

Publication number: CN103544392B
Application number: CN201310503402.9A
Authority: CN
Inventors: 刘启和; 陈雷霆; 蔡洪斌; 邱航; 蒲晓蓉; 胡晓楠
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-10-23
Filing date: 2013-10-23
Publication date: 2016-08-24
Anticipated expiration: 2033-10-23
Also published as: CN103544392A

Abstract

The invention discloses a kind of medical science Gas Distinguishing Method based on degree of depth study, specifically used original frequency response signal, it is carried out simple normalization, then input stack autoencoder network, by successively extracting, final study obtains the abstract characteristics of initial data, whole network externally shields the processes such as extraction feature, dimensionality reduction, suppression drift, finally add a classification layer at network so that these features can be directly entered grader and classify simultaneously.Training process is divided into two steps of pre-training and fine setting, can be effectively improved the learning capacity of network, and after having trained, new samples input network can directly obtain the classification of prediction.The method of the present invention can automatically extract the effective distinguishing characteristic of medical science gas, the steps such as feature extraction, feature selection and suppression drift is merged into one, greatly simplifies the complexity of traditional method, improve the efficiency of gas detecting and identification.

Description

Medical gas identification method based on deep learning

Technical Field

The invention belongs to the technical field of biomedicine, and particularly relates to a medical gas identification method.

Background

The machine olfaction is an artificial intelligence system, and the basic principle is as follows: the odor molecules are adsorbed by the sensor array to generate electric signals, then various signal processing technologies are used for extracting characteristics, and judgment is made through a computer mode identification system to finish the work of gas identification, concentration measurement and the like. The electronic nose system is a typical application of machine olfaction, and plays a very important role in the medical field, such as diagnosing certain diseases, identifying bacterial species in blood, detecting gases harmful to the respiratory system, and the like.

Sensing gas detection and identification have important applications in the medical field, for example, electronic nose equipment can be used for collecting sample data in oral cavity, chest cavity and blood, various signal processing technologies are used for analysis and processing, and a computer mode identification system can make judgment so as to complete tasks such as disease diagnosis, germ identification, drug concentration determination and the like.

The conventional gas detection and identification method generally comprises the steps of feature extraction, feature selection and the like, and finally achieves a preset target through means of classification, regression, clustering and the like. For devices that require long term use, effective sensor drift compensation techniques must also be used to suppress the effects of drift. In medical applications, a compromise between accuracy and real-time is often required due to the complexity and inefficiency of these conventional methods.

The data sampled by the sensor can be regarded as a time series signal which is complex in structure, difficult to interpret and often high in dimensionality. For better identification, it is usually necessary to design features according to various attributes of the signal, then select features, such as dimension reduction, and classify them as input of a classification algorithm, such as a support vector machine.

Sensor drift is the slow and random variation in the response of a sensor over time. The change causes that the mode obtained by the mode recognition system at present by learning cannot be well applied to the subsequent sample to be detected, so that the accuracy of gas detection and recognition is gradually reduced. In medical applications, in order to suppress the influence of sensor drift, there are generally two measures: the method develops an effective drift compensation technology. The process is often independent of feature extraction, and the operation is complex and the efficiency is very low; due to the fact that the drift degree in a short time is small, the electronic nose device is maintained and updated regularly to guarantee that sampling data are stable and reliable, but the cost is undoubtedly greatly increased, and the service life of the device is shortened.

In fact, some well-designed features are very robust to drift. From this perspective, the two processes can be fused together simply by extracting better features to suppress sensor drift. The deep learning technology is used for analyzing, learning and interpreting data by establishing an artificial neural network comprising a plurality of hidden layers and simulating the human brain, can obtain a highly abstract representation of the data, is good at finding potential modes in the data, and is very suitable for solving the problems.

In the literature "m.trincolvilleli, s.coradeschi, a.loutfi, B.A method for effectively identifying pathogenic bacteria in a blood culture specimen is provided in P.thunberg, Direct identification of bacteria in blood culture using an electronic nose device, IEEE Trans biological Engineering57(12), 2884. sub.2890, 2010. the method comprises the steps of firstly obtaining sample data by sampling through an electronic nose device, then carrying out feature extraction and dimension reduction, and finally completing classification by using a support vector machine, wherein in a feature extraction part, two feature extraction methods of steady-state response and response derivative are adopted aiming at a signal overall waveform.

In order to obtain a higher recognition accuracy in a complicated problem, it is sometimes necessary to analyze a signal waveform more finely and extract a feature having a higher dimension. The literature "a.vergara, s.vembua, t.ayhanb, m.a.r.vitae, m.l.h.vitae and r.huertaa.chemical gas sensor drift compensation using classifier entities, Sensors and detectors B: Chemical, vol.166-167, pp.320-329, May 2012" studies how to improve the recognition accuracy of gases such as ethanol under drift, and 8 different features were designed.

In the case of classification algorithm determination, the recognition accuracy of the gas depends only on how good the features are. Compared with the frequency response value in the original signal, the well-designed features can greatly reduce the dimension redundancy, simultaneously highlight the difference between different categories, and generally can obtain better identification accuracy.

However, the characteristics of manual design are usually specific to some specific applications (gas type, sensor type, external environment, etc.), and thus the purpose is very strong and the universality is very poor. And due to the cross sensitivity of the sensor, the dimension of the finally extracted feature is still very high, and an efficient dimension reduction algorithm, such as PCA, LDA and the like, is usually required to be searched. If the recognition accuracy using existing features is not satisfactory in a new application, better features need to be designed, which undoubtedly further increases the complexity of the task.

The most effective way to suppress drift is to compensate for drift by periodic recalibration, the general idea being to find a linear transformation to normalize the sensor response so that the classifier can be applied directly to the transformed data.

CN1514239A discloses a method for realizing gas sensor drift detection and correction. The method improves the sensitivity and accuracy of the sensor drift detection by comprehensively utilizing the principal component analysis and wavelet transform technology. The detected drift sensor is corrected on line by adopting a correction method based on an adaptive drift model, and the drift model can be updated on line, so that the purposes of improving the reliability of a sensor system and prolonging the service life of the system are achieved.

In the literature "T.I.P.M.Holmberg, "Drift correction for gas sensors using multivariable methods," j.chemim., vol.14, No. 5-6, pp.711-723,2000, "approximate the Drift direction with a reference gas, and then modify the response of the gas to be analyzed as follows.

However, these methods assume that the drift law of the sensor is linear, which is not proven, and often require a reference gas whose chemical properties are relatively stable over time and highly similar to the gas to be analyzed in terms of sensor behavior, which is certainly quite demanding in practical applications. In addition, these methods are complicated and inefficient in practical applications.

Disclosure of Invention

The invention aims to simplify the complexity of the traditional gas detection and identification method, and develop a simpler, more efficient and more robust gas detection and identification method for sensor drift.

The method uses an original frequency response signal to simply normalize the original frequency response signal, then inputs the frequency response signal into a stack type self-coding network, finally learns to obtain abstract characteristics of original data through layer-by-layer extraction, externally shields the processes of extracting the characteristics, reducing dimension, inhibiting drift and the like for the whole network, and simultaneously adds a classification layer at the last of the network so that the characteristics can directly enter a classifier for classification. The training process is divided into two steps of pre-training and fine-tuning, the learning capacity of the network can be effectively improved, and after the training is finished, a new sample is input into the network to directly obtain the predicted category.

The technical scheme of the invention is as follows: a medical gas identification method based on deep learning comprises the following specific steps:

step 1, data normalization, wherein m samples are set, each sample is organized according to the following form, and v ═ s₁,s₂···,s_t]Wherein s is_iIs the ith frequency response value, for a total of t response values, the entire gas data set and corresponding label can be represented as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y＝[y₁,y₂,…,y_i,…,y_m]^T

t represents the transposition of the vector, the ith column vi in the matrix V represents the ith sample, and the ith element in Y is the category label of the corresponding sample;

utilizing typeNormalizing a dataset to [0,1]，

Wherein, V_i,jDenotes the ith frequency response value of the jth sample, L is the normalized lower bound of 0, U is the normalized upper bound of 1, max_iAnd min_iFor the maximum and minimum values of each row in the matrix, the normalized data set is denoted by a⁽⁰⁾Represents;

step 2, pre-training a stacked self-coding network, wherein v, h and y respectively represent an input layer, a hidden layer and an output layer, and W⁽ⁱ⁾To connect the weight matrices of the layers, b⁽ⁱ⁾A bias vector for the hidden layer;

step 2.1, train the first layer, namely the first automatic coding machine, the objective function is:

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ \log \frac{ρ}{p_{j}} + (1 - ρ) \log \frac{1 - ρ}{1 - p_{j}}];

wherein the first term is a reconstruction error term representing the degree of difference between the input and the output, wherein v_iRepresents the ith input sample after normalization in step 1,representing a sample v_iOutput at the output layer after passing through the network; the second term is called weight attenuation term, and is used for reducing the amplitude of the weight and preventing overfitting, wherein W_ijRepresenting a weight value between the jth unit of the current layer and the ith unit of the next layer, wherein the third item is a sparse penalty item, pj represents the average excitation of the hidden layer unit J, and lambda, β and rho are preset parameters;

optimizing an objective function, wherein for an automatic coding machine with n layers, the specific optimization steps are as follows:

step 2.1.1. random initialization parameter W⁽ⁱ⁾、b⁽ⁱ⁾Initializing a matrix or vector of all zeros, i.e. AW⁽ⁱ⁾＝0，Δb⁽ⁱ⁾＝0；

Step 2.1.2. orderFor each sample, the partial derivatives are calculated using a back propagation algorithmThe specific process is as follows:

feedforward calculation to obtain each layer excitation a⁽ⁱ⁾The calculation formula is a⁽ⁱ⁾＝σ(W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾) Whereinis sigmoid function with output range of [0,1 ]]；

For the output layer, the residual is calculated:⁽ⁿ⁾＝-(v-a⁽ⁿ⁾)·σ′(z⁽ⁿ⁾) Wherein "·" represents a vector dot product, wherein z is⁽ⁿ⁾＝W^(n-1)a^(n-1)+b^(n-1)σ' denotes the derivative of σ (x);

for each layer of l ═ n-1, n-2,.., 2, the following is calculated:

calculating a partial derivative value:

whereinAndboth represent J (W, b) to W⁽ⁱ⁾The partial derivatives of (a) are,andall represent J (W, b) to b⁽ⁱ⁾Partial derivatives of (a);

step 2.1.3, respectively adding the obtained partial derivatives to delta W⁽ⁱ⁾,Δb⁽ⁱ⁾To do so, i.e.

Step 2.1.4, update parameter W⁽ⁱ⁾,b⁽ⁱ⁾，Wherein α is the learning rate;

step 2.1.5, repeating the steps 2.1.2 to 2.1.4, gradually reducing the value of the target function until the set threshold value is reached to obtain the parameters (W, b) of the coding layer and the parameters of the decoding layer

Step 2.2, discarding the decoding layer after trainingTaking the parameters (W, b) of the coding layer as initial parameters of the corresponding layer in the stacked self-coding network, namely W⁽¹⁾＝W，b⁽¹⁾＝b；

Step 2.3, calculating hidden layer excitation of the current automatic coding machine: a is⁽¹⁾＝σ(W⁽¹⁾a⁽⁰⁾+b⁽¹⁾)；

Step 2.4. at excitation a⁽¹⁾Training a second layer, i.e. a second automatic coding machine, wherein the hidden layer of the first automatic coding machine is used as the input layer of the second automatic coding machine, the training process is the same as the process of training the first layer, but the input is changed into a⁽¹⁾. The initial parameter W of the second layer of the network is obtained after the training is finished⁽²⁾，b⁽²⁾And hidden layer excitation a⁽²⁾；

Step 2.5 for the third layer to the nth layer, the process of steps 2.1 to 2.4 is repeated to obtain the initial parameters of each hidden layer, and finally obtain the excitation a of the nth hidden layer⁽ⁿ⁾The excitation of this layer is also used as input for the softmax layer, denoted a_S；

Step 2.6 Using a obtained in step 2.5_SAnd training the last layer of the network, namely the softmax classifier by using the label Y to obtain the initial parameter W of the last layer_S；

A is represented by x and theta_SAnd W_SAnd assuming that there are k types, for the ith sample, the probability that the predicted class is labeled as the jth class is:

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

wherein,the jth line in theta is represented as a row vector connecting the weights between the jth output cell and all input cells. l is a constant variable, l is more than or equal to 1 and less than or equal to k, and k is input a of the softmax layer_SAnd initial parameters W of the softmax classifier_SNumber of classes of (2), x_iThe input value of softmax layer for the ith sample. The final output is a probability column vector P, the jth component represents the probability that the prediction sample is judged as the jth class, and the weight matrix theta is trained by using a minimization loss function:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

wherein logP (y)_iJ) represents a certain probability value P (y)_iJ) is the natural logarithm of the number of the pairs,is an indication function, and takes value as 1 when the condition in the brackets is true, and takes value as 0 if the condition in the brackets is not true; m is the number of samples, and n is the number of layers of the automatic coding machine;

step 3, fine adjustment, namely regarding the network as a whole, calculating the partial derivative of each layer parameter by using back propagation, and then performing iterative optimization by using a gradient descent method, wherein the specific process is as follows:

step 3.1. use formula a⁽ⁱ⁾＝σ(W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾) Performing feedforward calculation to obtain the excitation a of each layer⁽ⁱ⁾；

Step 3.2, calculating a softmax layer parameter W_SPartial derivatives of (a):wherein, P is the conditional probability vector calculated in the step 2.6;

and 3.3, calculating the residual error of the last hidden layer as follows:wherein Denotes J (W, b) vs. a⁽ⁿ⁾Partial derivatives of (a)⁽ⁿ⁾Excitation for the nth hidden layer;

step 3.4. for each layer of l ═ n-1, n-2.

Step 3.5, calculating partial derivatives of all hidden layers:

and 3.6, updating parameters of each layer by using the partial derivatives obtained in the step:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

whereinRepresents J (W, b) to W_SPartial derivatives of (W)_SAs an initial parameter of the softmax classifier,represents J (W, b) to W⁽ⁱ⁾The partial derivatives of (a) are,represents J (W, b) to b⁽ⁱ⁾Partial derivatives of (a);

step 3.7, repeating the steps, and reducing the value of the target function through iteration until the value reaches a set threshold value;

and 4, predicting the category of the prediction sample, wherein the specific process is as follows:

step 4.1. predict sample v_pNormalized to [0,1 ]]；

Step 4.2. for hidden layers, formula a is used⁽ⁱ⁾＝σ(W⁽ⁱ⁾a^(i-1)+b⁽ⁱ⁾) Performing feedforward calculation layer by layer to obtain input a of the softmax layer_S；

And 4.3, calculating a conditional probability vector P according to the probability calculation formula in the step 2.5, wherein the class corresponding to the maximum component is the class to which the sample is predicted to belong.

Each of i and j in the above formulas represents a constant parameter for counting.

The invention has the beneficial effects that: the invention designs a network structure suitable for medical gas signal processing, and performs layer-by-layer feature extraction on an input sample, and finally the feature dimension entering a classification layer is lower and has good robustness on drift. Compared with the traditional feature extraction method, the method can automatically extract the medical gas to effectively distinguish the features, integrates the steps of feature extraction, feature selection, drift inhibition and the like, greatly simplifies the complexity of the traditional method, and improves the efficiency of gas detection and identification. The concrete aspects are as follows:

except for training the softmax classifier in the step 2, class labels are not needed in other processes, so that the process of extracting the features is unsupervised; if the samples are scarce, a large number of unlabelled samples can be used for training all layers before the classification layer, and finally, a small number of labeled samples are used for fine adjustment;

secondly, as seen from the network structure, the number of units in each layer is less than that of the units in the previous layer, so that the input dimension finally entering the classifier is lower and far smaller than the original input, and the method can be regarded as a dimension reduction process;

the extraction features are automatically completed, and manual intervention is not introduced, so that the complexity of manual feature design is eliminated, and the method has wide applicability;

the extracted features have good robustness to drift, the gas detection and identification accuracy under drift can be effectively improved, and the service life of the device is prolonged.

Drawings

Fig. 1 is a flow chart of a medical gas identification method according to an embodiment of the invention.

Fig. 2 is a stacked self-encoding network for medical gas identification according to an embodiment of the present invention.

FIG. 3 is a block diagram of an exemplary embodiment of an automatic coding machine including a hidden layer.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

The gas identification method of the present invention has a general flow as shown in fig. 1:

step 1, data normalization, wherein m samples are set, each sample is organized according to the following form, and v ═ s₁,s₂..·,s_t]Wherein s is_iIs the ith frequency response value, for a total of t response values, the entire gas data set and corresponding label can be represented as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y＝[y₁,y₂,…,y_i,…,y_m]^T

t denotes the transpose of the vector, V being the ith column in the matrix V_iRepresenting the ith sample, wherein the ith element in Y is the category label of the corresponding sample;

utilizing typeNormalizing a dataset to [0,1]，

Wherein, V_i,jDenotes the ith frequency response value of the jth sample, L is the normalized lower bound of 0, U is the normalized upper bound of 1, max_iAnd min_iFor the maximum and minimum values of each row in the matrix, the normalized data set is denoted by a⁽⁰⁾And (4) showing.

Step 2, pre-training a stacked self-coding network, wherein v, h and y respectively represent an input layer, a hidden layer and an output layer, and W⁽ⁱ⁾To connect the weight matrices of the layers, b⁽ⁱ⁾Is the bias vector of the hidden layer.

The present invention uses a network structure similar to that shown in fig. 2, and the number of layers of the network and the number of units in each layer can be changed according to different specific tasks, so that the corresponding parameter forms can also be changed.

Such networks are often very deep in hierarchy, have many parameters, and are difficult to directly train, so a pre-training method is firstly adopted to train the parameters of each layer by layer. Compared with random initialization, pre-training can enable the parameters of each layer to be located at better positions in the parameter space.

The rest of the network, except for the softmax layer for classification, can be seen as a stack of several single-hidden-layer autocodes, where the output of the previous layer is connected to the input of the next layer. This kind of automatic coding machine acquires the excitation of hidden layer units by reconstructing (symbolized by the symbol ^) the input and represents as the characteristics of the original input, as shown in FIG. 3.

After the training of each automatic coding machine is completed, only the parameters of the coding layer, namely W and b, are reserved as the initial parameters of the corresponding layer in the stacked self-coding network, and the specific process is as follows:

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ l o g \frac{ρ}{p_{j}} + (1 - ρ) l o g \frac{1 - ρ}{1 - p_{j}}];

wherein the first term is a reconstruction error term representing the degree of difference between the input and the output, wherein v_iRepresents the ith input sample after normalization in step 1,representing a sample v_iThrough the netOutput at the output layer after being wound; the second term is called weight attenuation term, and is used for reducing the amplitude of the weight and preventing overfitting, wherein W_ijRepresenting the weight between the jth unit of the current layer and the ith unit of the next layer, and the third item is a sparse penalty item, wherein p_jThe mean excitation of hidden layer unit J is represented by a preset parameter, and λ, β and ρ are parameters, which are used to make the mean excitation of all hidden layer units approach a very small number ρ, i.e., only a few hidden layer units are activated.

The objective function is optimized by using a gradient descent method, wherein the partial derivative of each parameter needs to be calculated during iteration, and the calculation process is completed by a back propagation algorithm (back propagation).

for each layer of l ═ n-1, n-2,.., 2, the following is calculated:

calculating a partial derivative value:

whereinAndboth represent J (W, b) to W⁽ⁱ⁾The partial derivatives of (a) are,andall represent J (W, b) to b⁽ⁱ⁾The partial derivatives of (1).

Step 2.1.4, update parameter W⁽ⁱ⁾,b⁽ⁱ⁾，Where α is the learning rate.

Step 2.1.5. repeat stepStep 2.1.2 to step 2.1.4, the value of the objective function is gradually decreased until a set threshold value. At this time, parameters (W, b) of the coding layer and parameters of the decoding layer can be obtained

Step 2.2, discarding the decoding layer after trainingTaking the parameters (W, b) of the coding layer as initial parameters of the corresponding layer in the stacked self-coding network, namely W⁽¹⁾＝W，b⁽¹⁾＝b。

Step 2.3, calculating hidden layer excitation of the current automatic coding machine: a is⁽¹⁾＝σ(W⁽¹⁾a⁽⁰⁾+b⁽¹⁾)。

Step 2.4. at excitation a⁽¹⁾Training a second layer, i.e. a second automatic coding machine, wherein the hidden layer of the first automatic coding machine is used as the input layer of the second automatic coding machine, the training process is the same as the process of training the first layer, but the input is changed into a⁽¹⁾. The initial parameter W of the second layer of the network is obtained after the training is finished⁽²⁾，b⁽²⁾And hidden layer excitation a⁽²⁾。

Step 2.5 for the third layer to the nth layer, the process of steps 2.1 to 2.4 is repeated to obtain the initial parameters of each hidden layer, and finally obtain the excitation a of the nth hidden layer⁽ⁿ⁾The excitation of this layer is also used as input for the softmax layer, denoted a_S(ii) a The auto-encoder in this embodiment is 3 layers.

softmax regression is a generalization of Logistics regression over multi-classification problems. For convenience of representation, a is represented by x and θ, respectively_SAnd W_SAnd assume that there are k classes in common. For the firstThe probability of the predicted class marked as the jth class of the i samples is as follows:

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

wherein,a j line in theta is represented, a line vector connecting weight values between the j output unit and all input units is represented, l is a constant variable, l is more than or equal to 1 and less than or equal to k, and k is input a of the softmax layer_SAnd initial parameters W of the softmax classifier_SNumber of classes of (2), x_iThe input value of softmax layer for the ith sample. The final output is a probability column vector P, the jth component represents the probability that the prediction sample is judged as the jth class, and the weight matrix theta is trained by using a minimization loss function:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

wherein logP (y)_iJ) represents the natural logarithm of some probability value,is an indication function, and takes value as 1 when the condition in the brackets is true, and takes value as 0 if the condition in the brackets is not true; the loss function is a strict convex function, and a global optimal solution can be obtained by using an optimization algorithm such as gradient descent or lbfgs. m is the number of samples and n is the number of layers of the automatic coding machine.

Here, the parameters connecting the two layers are all weight matrices, W_SI.e. theta is the weight matrix connecting the last two layers.

The specific process of training the weight matrix θ by using the minimization loss function in this embodiment is as follows:

step 2.6.1, initializing a parameter matrix theta randomly;

step 2.6.2 direct calculation of the derivative of J (θ), where θ_jRepresents the jth column in the matrix;

{&dtri;}_{θ_{j}} J (θ) = - \frac{1}{m} Σ_{i = 1}^{m} [x_{i} (1 {y_{i} = j} - P (y_{i} = j | x_{i}; θ))] + {λθ}_{j}

step 2.6.3 updates the parameter θ:wherein α is the learning rate;represents J (theta) to theta_jPartial derivatives of (a);

step 2.6.4 repeating steps 2.6.2 to 2.6.3, gradually reducing the value of J (theta) until reaching a set threshold value, wherein the obtained theta is the final weight matrix, namely W_S。

And step 3, fine adjustment, namely regarding the network as a whole, calculating the partial derivative of each layer parameter by using back propagation, and then performing iterative optimization by using a gradient descent method.

After the pre-training is completed, the initial parameters of each layer in the network are determined, and at this time, fine tuning needs to be performed on all the parameters once to improve the classification capability of the network. The fine tuning process is to regard the network as a whole, calculate the partial derivative of each layer parameter by back propagation, and then iteratively optimize by a gradient descent method. The network is no longer in order to be reconstructed, so the objective function is the same as that of the softmax layer, which is regarded as an additional layer and processed separately, and the optimization process of each hidden layer is basically the same as that described in step 2.1.

The specific process is as follows:

step 3.4. for each layer of l ═ n-1, n-2.^(l)＝((W^(l))^T ^(l+1))·σ′(z^(l))；

Step 3.5, calculating partial derivatives of all hidden layers:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

step 4.1. predict sample v_pNormalized to [0,1 ]]；

The core content of the invention is to design a network structure suitable for medical gas signal processing, and process the patient gas data sampled by the electronic nose by adopting deep learning, thereby automatically extracting the more universal characteristic and the more robust characteristic to the sensor drift, finally simply and effectively completing the task of gas detection and identification, and the invention has great application value in the medical field requiring accuracy and real-time.

Claims

1. A medical gas identification method based on deep learning comprises the following specific steps:

step 1, data normalization, wherein m samples are set, each sample is organized according to the following form, and v ═ s₁,s₂...,s_t]Wherein s is_iIs the ith frequency response value, for a total of t response values, the entire gas data set and corresponding label can be represented as:

V = [v_{1}^{T}, v_{2}^{T}, ..., v_{i}^{T}, ..., v_{m}^{T}]

Y＝[y₁,y₂,…,y_i,…,y_m]^T

utilizing typeNormalizing a dataset to [0,1]，

Wherein v is_i,jDenotes the ith frequency response value of the jth sample, L is the normalized lower bound of 0, U is the normalized upper bound of 1, max_iAnd min_iFor the maximum and minimum values of each row in the matrix, the normalized data set is denoted by a⁽⁰⁾Represents;

J = \frac{1}{2 m} \underset{i}{Σ} (v_{i} - {\hat{v}}_{i}) + \frac{λ}{2} \underset{i}{Σ} \underset{j}{Σ} {W_{i j}}^{2} + β \underset{j}{Σ} [ρ l o g \frac{ρ}{p_{j}} + (1 - ρ) l o g \frac{1 - ρ}{1 - p_{j}}];

wherein the first term is a reconstruction error term representing the degree of difference between the input and the output, wherein v_iRepresents the ith input sample after normalization in step 1,representing a sample v_iOutput at the output layer after passing through the network; the second term is called weight attenuation term, and is used for reducing the amplitude of the weight and preventing overfitting, wherein W_ijRepresenting the weight between the jth unit of the current layer and the ith unit of the next layer, and the third item is a sparse penalty item, wherein p_jRepresenting the average excitation of the hidden layer unit J, wherein lambda, β and rho are preset parameters, m is the number of samples, and J represents the objective function of the first automatic coding machine;

for each layer of l ═ n-1, n-2,.., 2, the following is calculated:^(l)＝((W^(l))^T ^(l+1))·σ′(z^(l))；

calculating a partial derivative value:

Step 2.4. at excitation a⁽¹⁾Training a second layer, i.e. a second automatic coding machine, wherein the hidden layer of the first automatic coding machine is used as the input layer of the second automatic coding machine, the training process is the same as the process of training the first layer, but the input is changed into a⁽¹⁾And obtaining initial parameters W of the second layer of the network after training⁽²⁾，b⁽²⁾And hidden layer excitation a⁽²⁾；

Step 2.6 Using a obtained in step 2.5_SAnd label Y training the last layer of the network, namely the softmax classifier to obtainInitial parameter W to the last layer_S；

P (y_{i} = j | x_{i}; θ) = \frac{{expθ}_{j}^{T} x_{i}}{Σ_{l = 1}^{k} {expθ}_{l}^{T} x_{i}}

wherein,denotes the jth line in theta, denotes a line vector connecting weight values between the jth output cell and all input cells, and k is the input a of the softmax layer_SAnd initial parameters W of the softmax classifier_SThe number of categories of (2); the final output is a probability column vector P, the jth component represents the probability that the prediction sample is judged as the jth class, and the weight matrix theta is trained by using a minimization loss function:

J (θ) = - \frac{1}{m} [Σ_{i = 1}^{m} Σ_{j = 1}^{k} 1 {y_{i} = j} \log P (y_{i} = j)] + \frac{λ}{2} Σ_{i = 1}^{m} Σ_{j = 1}^{n} {θ_{i j}}^{2}

Step 3.5, calculating partial derivatives of all hidden layers:

\begin{matrix} {W_{S}}^{'} = W_{S} - α (\frac{1}{m} Σ {&dtri;}_{W_{S}} J + λ θ) \\ W^{{(i)}^{'}} = W^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{W^{(i)}} J) \\ b^{{(i)}^{'}} = b^{(i)} - α (\frac{1}{m} Σ {&dtri;}_{b^{(i)}} J) \end{matrix};

whereinRepresents J (W, b) to W_SPartial derivatives of (W)_SAs an initial parameter of the softmax classifier,represents J (W, b) to W⁽ⁱ⁾The partial derivatives of (a) are,represents J (W, b) to b⁽ⁱ⁾Partial derivatives of (a); step 3.7, repeating the steps, and reducing the value of the target function through iteration until the value reaches a set threshold value;

step 4.1. predict sample v_pNormalized to [0,1 ]]；

2. The deep learning-based medical gas identification method according to claim 1, wherein the process of training the weight matrix θ in step 2.6 by using the minimization loss function is as follows:

step 2.6.1, initializing a parameter matrix theta randomly;

{&dtri;}_{θ_{j}} J (θ) = - \frac{1}{m} Σ_{i = 1}^{m} [x_{i} (1 {y_{i} = j} - P (y_{i} = j | x_{i}; θ))] + {λθ}_{j}