Disclosure of Invention
The invention aims to solve the problems and provides a heart sound diagnosis system for mixed deep learning and low-difference forests.
According to some embodiments, the invention adopts the following technical scheme:
a heart sound diagnostic system that mixes deep learning and low-variance forests, comprising:
the preprocessing module is configured to preprocess the acquired heart sound signals, and sequentially normalize, filter and downsample the heart sound signals;
the data conversion module is configured to extract audio features from the downsampled data and generate second-order spectral data;
the deep learning module is configured to perform feature extraction on the second-order spectral data by using the trained deep learning model;
and the feature classification module is configured to classify the extracted features by using the trained low-difference forest classifier to obtain a classified diagnosis result.
As an alternative embodiment, an acquisition device is further included for acquiring and storing the heart sound signals.
As a further limitation, the collection device is an electronic stethoscope having a recording module, or a stethoscope and sound pick-up.
As an alternative embodiment, the deep learning model is a lightweight AOCT (Automated Optical Coherence Tomography) convolutional neural network, which includes four convolutional layers, a BN (Batch Normalization) layer, an activation function integrated convolutional block, and a fully connected layer, which are connected in sequence.
As an alternative implementation, the training process of the deep learning model includes acquiring a training data set, sequentially performing normalization, filtering and downsampling on each sample, extracting audio features from the downsampled audio data, inputting second-order spectral data of all the samples into the deep learning model, storing a training data set classification label to form a storage file, reducing dimensions of all feature columns to one column, storing the feature columns to the last column of the storage file, rearranging all the samples in the storage file in sequence according to the size of a T column value, and inputting data of the rearranged and deleted T column to a low-difference forest classifier.
As an alternative embodiment, the deep learning model and the low-variance forest classifier are cascaded, with the low-variance forest classifier as the last layer of the deep learning model.
As an alternative implementation, the low-difference forest classifier takes K decision trees as basic classifiers, and a combined classifier is obtained after ensemble learning, and when a sample to be classified is given, the classification result output by the low-difference forest classifier is determined by simple voting according to the classification result of each decision tree.
As an alternative embodiment, the training process of the low-difference forest classifier includes:
(a) generating a low-difference degree sequence based on a low-difference degree sequence sampling method based on the number of samples of the training data set;
(b) according to the principle of ascending or descending, acquiring the sequence numbers of all elements of the low-difference sequence, and generating a sequence number sequence;
(c) taking samples with set sizes of the training data set as training sets of corresponding decision trees;
(d) setting the number of decision trees of the forest with low diversity;
(e) random integers in the sample number range are used as initial sample indexes of a decision tree;
(f) starting from the random integer, continuously taking a plurality of continuous elements with set sizes in a sequence number sequence;
(g) taking out a corresponding number of samples from the input rearranged and deleted T-column data according to the taken-out elements as sample serial numbers to form a training sample of a decision tree;
(h) constructing a decision tree, and training the decision tree by using the training sample;
(i) and (e) repeating the steps (e) - (h), and constructing and training the decision trees according with the number of the decision trees.
As an alternative, the filtering process is performed by butterworth band-pass filtering.
As an alternative embodiment, the downsampling process may be performed by using a second-order spectral analysis method.
Compared with the prior art, the invention has the beneficial effects that:
the invention uses heart sounds as a diagnosis basis, uses a light-weight and high-accuracy deployable model, is convenient to carry, has real-time diagnosis, can enable a patient to enjoy accurate medical services without going out of home, has simple operation process, does not need special professional knowledge, and brings great convenience to the patient.
The invention adopts high-order spectrum analysis method in feature processing, the extracted features are obviously superior to the results of low-order feature extraction methods such as short-time Fourier transform, wavelet transform and the like, the phase relation in signals can be well inhibited, and the phase coupling with quantized non-Gaussian signals is detected and is often used for non-stable medical signals.
The deep learning method of the low-difference degree series forest is used, the requirement on computing resources is low, the accuracy is high, and the deep learning method can be deployed in embedded equipment and provides health protection for patients.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
In the embodiment, features are extracted by using an AOCT convolutional neural network, and a low-diversity Forest (BDSForest) is used as a mixed model of a final classifier in the last layer to realize the diagnosis of the heart sound signal.
The specific scheme comprises the following steps:
a heart sound diagnostic system that mixes deep learning and low-variance forests, comprising:
the preprocessing module is configured to preprocess the acquired heart sound signals, and sequentially normalize, filter and downsample the heart sound signals;
the data conversion module is configured to extract audio features from the downsampled data and generate second-order spectral data;
the deep learning module is configured to perform feature extraction on the second-order spectral data by using the trained deep learning model;
and the feature classification module is configured to classify the extracted features by using the trained low-difference forest classifier to obtain a classified diagnosis result.
Wherein the deep learning module and the feature classification module form a hybrid model.
The system implementation, as shown in fig. 1(a) and fig. 1(b), mainly includes two parts, model training and model deployment. In this embodiment, the system can be deployed in a raspberry pi card computer to make a portable heart disease diagnosis device. However, in other embodiments, the system may be deployed on other terminal devices, and is not limited to the above-mentioned devices.
In this embodiment, the training part includes using one public data set, performing model training using a five-fold cross validation method, and after training, performing validation using three additional data sets to obtain an optimal model structure and coefficients.
Specifically, in the training model section, this example uses a published heart sound data set collected from Yaseen Gui-Young Son, and Sonil Kwon in Korea, and the sample statistics are shown in Table 1.
TABLE 1 Heart Sound data set used
Training by adopting a five-fold cross validation method. I.e. samples of the original dataset are randomly shuffled and then evenly divided into five. And selecting 1 part (200 samples) as a test set every time, using the remaining 4 parts (800 samples in total) as training data sets for neural network learning and low-diversity forest characteristics, and sending the test set into the test set after training is finished to obtain the accuracy of the test set. A total of five cycles. And finally, taking the average accuracy as the prediction accuracy of the model.
The flow of model training is shown in fig. 3 (a):
training:
step (1): a training data set D is read in and the number of samples of this data set is calculated to be N (i.e. 800).
Step (2): the first sample (i.e., a heart sound audio file) is normalized so that each audio data point is within the interval [ -1,1] (see fig. 4).
And (3): the sample normalized audio is filtered. This is because the audio signal collected by the stethoscope contains noise, which interferes with the final output of the model. For this purpose, it is necessary to filter out the heart sound part and the noise part in the audio file. In this embodiment, according to the frequency of the found heart sound signal of the human body, butterworth band-pass filtering is applied to the collected audio data, so as to fully filter the interference between the direct current signal and the high-frequency noise and retain the heart sound part (see fig. 5). The filter coefficients are shown in table 2.
TABLE 2 Butterworth band-pass filter coefficients
And (4): and downsampling the filtered audio data. This is because the model needs to satisfy the requirement of real-time calculation to obtain the diagnosis result, and therefore, the algorithm complexity needs to be reduced. The present embodiment downsamples the filtered heart sound data using the nyquist sampling method (as shown in fig. 6).
And (5): and extracting audio features from the down-sampled audio data. In the embodiment, a high-order spectral analysis method in the field of modern digital signal processing is adopted to extract audio features of the audio data after down-sampling. The features extracted by the high-order spectrum analysis method are obviously superior to the results of low-order feature extraction methods such as short-time Fourier transform, wavelet transform and the like. This embodiment employs a widely used higher-order spectral analysis method, i.e., a second-order spectral analysis method. It can well suppress the phase relation in the signal, detect and quantify the phase coupling of non-Gaussian signal, often used for non-stationary medical signals such as EEG, ECG, EMG. In other words, experiments prove that the analysis method is suitable for the heart sound signals, and in the process of feature extraction, useful features in the signals are kept as much as possible, and noise is reduced.
The second order spectral analysis can be represented by the following notations:
(
in order to perform a second-order fourier transform,
third order cumulant).
It can be seen from the formula that the generated second order spectrum is a two-dimensional matrix, which is used as input for the underlying neural network.
And (6): and (5) repeating the steps (2) to (5) to complete second-order spectrum conversion of all samples of the training data set, and storing the second-order spectrum conversion into a temporary file D1.
And (7): read in D1, extract features using AOCT convolutional neural network, the structure of which is shown in fig. 2. The present embodiment improves the original AOCT convolutional neural network, i.e. uses 4 convolutional blocks with convolutional layer, BN layer and activation function as a whole (see table 3 for parameters). Finally, the feature results are passed into the fully connected layer (i.e., 16384 neurons in one dimension).
TABLE 3 parameters of AOCT convolutional neural network
And (8): the N × 16384 neuron data of the previous step, and the classification label of the original training set data are stored in the temporary file D2.
And (9): all 16384 feature columns are reduced to a column T using a dimension reduction technique (e.g., PCA, FA, kPCA, tSVD, etc.). T is appended to the last column of D2.
Step (10): all samples of D2 are rearranged in ascending or descending order, depending on the size of the T column values.
Step (11): the T column is deleted and stored as a new temporary file D3.
Step (12): and using a low-diversity Forest (BDS Forest) as a mixed model of a final classifier at the last layer. The structure is shown in fig. 2. The embodiment improves the original AOCT convolutional neural network, i.e. uses 4 convolutional blocks with convolutional layer, BN layer and activation function as a whole (see table 2 for parameters). Then, the result is transmitted into a full connection layer (i.e. 16384 neurons in one dimension), and a final result is obtained by combining a low-diversity forest algorithm.
In this embodiment, a low-diversity forest model is used as the final classifier. The Forest with low diversity (Best diversity Sequence Forest) is K decision trees
And the combined classifier is obtained after ensemble learning is carried out on the basic classifier. When a sample to be classified is given, the classification result output by the low-diversity forest is simply voted to decide by the classification result of each decision tree. Herein, the
Is a random variable sequence. The principle is shown in fig. 7.
The low-diversity forest algorithm flow is as follows:
step (a): from the number of samples N of the training data set D, a low disparity sequence BDS is generated with the following formula, with N elements:
i.e., the elements in each low disparity sequence are equal to the fractional part of the product of a natural number and the circumference ratio (i.e., 3.141592653589793238462), leaving 21 bits after the decimal point. N is a continuous natural number starting from 1 and is taken to the number N of samples in the training data set D. For example, BDS = {0.142, 0.283, 0.425, … } (note: only the last 3 decimal places are reserved here for convenience of illustration).
Step (b): and acquiring the sequence numbers of all elements of the BDS sequence according to the ascending or descending principle, and generating a sequence number sequence R. For example, the first number in the low disparity sequence is 0.142, in the N (i.e., 800) elements, in ascending order, at 114 bits, the second number is 0.283, at 228 bits, and the third number is 0.425, at 341 bits. Therefore, the generated low difference degree number series R is [114, 228, 341, … ].
Step (c): as is customary with conventional integration methods, each decision tree uses 65% of the sample of the dataset as a training set for the tree. In the training data set of the present embodiment, the number of samples with d =65% × N is 520.
Step (d): the number K of the low-diversity forests including the decision trees is set, where K =30 in this embodiment, that is, 30 decision trees are used to form one low-diversity forest.
A step (e): a random integer x is generated between 1 and N as an initial sample index of a decision tree.
Step (f): starting from x, d consecutive elements are taken (i.e. 520 in total) in R generated in step (b).
Step (g): and (4) taking D samples from the D3 data set generated in the step (11) according to the taken elements as sample serial numbers to form a temporary data set serving as a training sample of a decision tree.
A step (h): a decision tree is constructed (ID 3, CART, C4.5, etc. can be used). When splitting each node of the decision tree, a feature column subset is extracted randomly from all the features with equal probability, and is usually taken
Where m is the total number of features (i.e., 16384), and then an optimal attribute is selected from this subset to split the node. Training with the temporary data set of step (g) to obtain the trained parameters of the decision treeThe number is stored in a temporary container P.
Step (i): and (e) circularly executing the steps (e) to (h), constructing and training K decision trees, and storing the P and the parameters of the AOCT convolutional neural network in the step (7) into an ONNX pre-training model file.
At this point, one round of training is finished, a reserved test data set is used for classification testing, and then the rest four rounds of training and testing are carried out according to a five-fold cross validation mode. The experimental result shows that the accurate classification rate of the heart sound diseases in the model constructed by the embodiment is about 97%.
To further determine the predictive power of the above model, the present embodiment was tested using a PhysioNet-CinC 2016 Challenge dataset (665 heart sound normal samples, 2575 heart sound abnormal samples), a Kaggle Kinguistics 2016 dataset (231 heart sound normal samples, 100 heart sound abnormal samples), and a PeterJ Bentley dataset (351 heart sound normal samples, 129 heart sound abnormal samples). The test procedure is shown in fig. 3 (b). The audio data of each sample is converted into second-order spectral data after being subjected to audio signal normalization, filtering and downsampling, the second-order spectral data enter a pre-trained mixed model, and the accuracy rate of feature prediction on the heart abnormity is over 96%.
System deployment follows. An electronic stethoscope (a 3.5mm earphone hole electronic stethoscope which is commonly used in the market can be used; or a sound pick-up is added at the head part of a common stethoscope) is used for collecting the heart sound signals of a human body and converting the heart sound signals into electric signals;
the electronic stethoscope is connected to a raspberry pi 4B card computer, digital filtering is carried out on the signal through an algorithm, noise in the signal is filtered, a heart sound part is reserved, then the denoised audio signal is sampled, the calculated amount of a model is reduced, and the diagnosis speed is increased.
And extracting features of the preprocessed audio signal by using a high-order spectrum method, sending the feature map into a trained model combining a low-difference forest and a deep learning technology, and calculating to obtain a final recognition result to complete preliminary self-diagnosis.
Specifically, a shell is added on a raspberry pi 4B card computer to serve as a portable heart sound auscultation device. The shell is provided with 2 buttons (respectively a recording button and a detection button) and 1 LED display screen which can display recorded audio waveforms and diagnosis results (namely, normal, cusp prolapse, aortic stenosis, mitral stenosis and mitral regurgitation).
The pre-trained model file ONNX is introduced into the raspberry pi 4B card computer, and an electronic stethoscope with a 3.5mm joint is connected into the raspberry pi 4B card computer (a homemade stethoscope head and an electronic sound pick-up can also be used). In a quiet environment, the user places the electronic stethoscope on his chest, presses the record button, and the system automatically records the sound for a period of time (15 seconds in this embodiment), and then automatically or manually stops recording, storing it as a temporary wav file. If the user feels the recording valid, the test button may be pressed. The system automatically normalizes audio signals, filters and down-samples of temporary wav files, converts the signals into second-order spectral data, loads a pre-trained ONNX model, extracts features by using an AOCT convolutional neural network, makes a final classification result on the last layer by using a low-diversity Forest (BDS Forest) and displays the final classification result on an LED screen. The deployment flow is shown in fig. 8. Through actual tests, the model diagnosis result is completely consistent with the diagnosis result of a professional doctor, and the accuracy rate reaches 100%.
Of course, the parameters in the above embodiments may be replaced or modified according to specific situations, which are easily conceivable by those skilled in the art and are considered to fall within the scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.