CN111242038B - Dynamic tongue fibrillation detection method based on frame prediction network - Google Patents
Dynamic tongue fibrillation detection method based on frame prediction network Download PDFInfo
- Publication number
- CN111242038B CN111242038B CN202010040375.6A CN202010040375A CN111242038B CN 111242038 B CN111242038 B CN 111242038B CN 202010040375 A CN202010040375 A CN 202010040375A CN 111242038 B CN111242038 B CN 111242038B
- Authority
- CN
- China
- Prior art keywords
- size
- input
- output
- convolution
- channels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 23
- 206010061592 cardiac fibrillation Diseases 0.000 title claims abstract description 16
- 230000002600 fibrillogenic effect Effects 0.000 title claims abstract description 16
- 230000003287 optical effect Effects 0.000 claims abstract description 36
- 206010044565 Tremor Diseases 0.000 claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 33
- 230000004913 activation Effects 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 10
- 238000013461 design Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 6
- 238000003475 lamination Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000011084 recovery Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 10
- 238000012545 processing Methods 0.000 abstract description 4
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000007499 fusion processing Methods 0.000 abstract 1
- 210000002105 tongue Anatomy 0.000 description 32
- 230000008569 process Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A dynamic tongue fibrillation detection method based on a frame prediction network relates to the fields of computer vision, pattern recognition and medical engineering. The invention provides a new P-net network aiming at the processing problem of data with strong time-space correlation such as dynamic video, which not only considers the characteristics of multiple scales, but also adds ConvGRU modules for time modeling, thereby realizing the fusion processing of time-space information. In order to accurately improve judgment performance of tongue tremor and generalization capability of a network, the invention executes optical flow extraction operation before network input so as to further acquire tremor information among frames. The invention adopts a prediction thought, uses the generated countermeasure model to carry out joint discrimination training on the network, simultaneously uses a space-time sliding window operation when calculating the tremor fraction, and finally realizes a high-precision dynamic tongue tremor detection algorithm based on the P-net network.
Description
Technical Field
The invention belongs to the field of computer vision, pattern recognition and medical engineering, and relates to a dynamic tongue fibrillation detection method based on a frame prediction network.
Background
Along with the improvement of the living standard of people and the development of technology, big data analysis and artificial intelligence are widely focused, and the combination of medicine and artificial intelligence is an important direction. Medical science can provide powerful assistance to people, and artificial intelligence can also provide analysis tasks for medical data. The two are mutually blended, so that the method has a very good prospect, and a plurality of enterprises and institutions perform multiple investment in the direction. Among these, machine learning is an important method for realizing artificial intelligence, and an automatic analysis process is realized by analyzing and organizing collected data and creating a corresponding model by using an algorithm of machine learning. If we can construct some systems which can assist doctors in diagnosing the illness state, the doctor can be greatly facilitated to diagnose, and meanwhile, the condition of missed detection of the illness can be avoided to a certain extent.
The tongue diagnosis is one of the main contents of four diagnostic methods in TCM, wherein tongue tremors refer to unstable and uncontrollable tongue tremors. By observing the degree of abnormality in tongue tremor, we can get some useful information from the "argue heatedly animals, the spleen of which is affected", and provide more data reference for doctors. However, this field also faces many difficulties, and tongue fibrillation belongs to a dynamic information, and how to model the dynamic characteristics better is a problem to be solved urgently. Therefore, the intelligent tongue fibrillation detection method is designed to help patients and doctors, and powerful objective data is provided for quick and better diagnosis of the patients.
Disclosure of Invention
The related work in the field is quite blank, and therefore the invention provides a high-precision dynamic tongue-shake detection algorithm based on a frame prediction network (P-net). The structure of Context and ConvGRU is added into the U-net network, so that the characteristic of multi-scale modeling of the network on input data can be enhanced, and meanwhile, the ConvGRU module can be used for modeling the input data in time. The P-net network provided by the invention has a good modeling effect on data with strong time-space characteristics such as video, and can better distinguish whether the tongue shakes or not.
The invention provides a high-precision dynamic tongue fibrillation detection algorithm based on a P-net network, wherein the general idea is that after normal dynamic tongue pretreatment, the dynamic tongue fibrillation detection algorithm is input into a designed network for training, and a model capable of predicting the next frame state of the normal tongue better is expected to be established; during testing, the trained model cannot correctly predict the state of the tongue with tremor due to the difference of the movement of the tongue with tremor and the normal tongue, so that the tongue with tremor can be automatically detected when the tongue with tremor occurs through the ST-Pscore scoring mechanism designed by us. The method mainly comprises the following characteristics:
(1) Optical flow extraction preprocessing based on tongue tremor information
After the original acquired tongue dynamic data are extracted one by one into separate pictures, every two adjacent pictures are input into an optical flow network which is pre-trained on a large-scale data set to extract optical flow information, so that interference of static information is reduced, and tongue vibration information is better obtained.
(2) New design of P-net network
The network construction structure specifically proposed is as follows:
p-net consists of three parts: the method comprises an encoding stage, a multi-scale feature and time information fusion stage and a decoding stage. The coding stage is characterized in that the initial coding of the input tongue dynamic information is finished through the structural design of three '2-layer convolution layers and 1-layer downsampling layer'; the multi-scale feature and time information fusion stage is characterized in that the features extracted in the previous stage can be subjected to space-time fusion through a 'Context module and ConvGRU module' which are designed in a combined mode, so that better feature expression is provided for the next stage; the decoding stage is characterized in that the recovery of the bottom layer characteristics is realized through the structural design of three '2-layer convolution layers and 1-layer up-sampling layers', so that preparation is made for detecting the tongue tremor state. In summary, we have designed a P-net that is a network structure that encodes and fuses multi-scale features and temporal features.
The Context module consists of four branches, wherein the three branches consist of 3 convolution layers, and one branch consists of 2 convolution layers; the front and back convolution layers have the functions of dimension increasing and dimension decreasing, and the middle convolution layer achieves the purpose of multi-scale feature extraction by adopting different atrous rate designs; the ConvGRU module consists of two convgru-cells, and the time characteristics of the input tongue dynamic information are extracted by inputting continuous T characteristic diagrams and updating two state information of h1 and h2, wherein the extracted characteristics are the last output.
(3) Process for predicting tongue movement state and detecting tongue tremors by using P-net network
Characteristics during training:
The mode of inputting the optical flow diagrams to the P-net is continuous T optical flow diagrams, each optical flow diagram is independently encoded, and finally, the T+1st optical flow diagram is obtained through prediction; the actual T+1th optical flow diagram is taken as ground truth, ground truth and the predicted pictures are input into various loss functions to perform network optimization, a generated countermeasure network model is used, the predicted images and the corresponding ground truth are simultaneously input into a discriminator to perform joint discrimination, and the discrimination model cannot distinguish whether the predicted images or the original ground truth are input, so that the prediction process is further optimized, and the detection result is improved.
Characteristics at the time of prediction:
The test set also extracts the optical flow, and takes continuous T frames to be respectively input into the trained network, and finally the predicted picture of each frame is obtained.
Characteristics of tongue tremor detection:
The predicted picture and the original picture are input into an ST-Pscore scoring frame designed by us, the evaluation method combines sliding window operation on time and space, space-time content is further fused, a final tremor score is obtained, and whether the tongue in the input dynamic data belongs to a tremor state is judged by comparing the relation between the tremor score and a set threshold value (the threshold value is taken as 0.432).
Advantageous effects
1. The invention realizes the dynamic tongue vibration detection based on prediction by using the P-net network
To our knowledge, this is the first invention in the past that combines deep learning techniques with dynamic tongue fibrillation detection. The invention provides the method for realizing the prediction of dynamic information by using the P-net network, and further realizing the judgment of whether the tongue shakes or not.
2. The invention extracts the optical flow of the input dynamic data through the pre-training network
Because of the dynamic nature of tongue trembling, time information among different frames needs to be analyzed, so that the trembling information among the tongues is obtained by a method for extracting optical flow, key important time characteristics can be extracted for processing, interference of static information on results can be reduced, and the accuracy of an algorithm is improved by combining the two aspects.
3. The invention provides a P-net network
The common U-net network is characterized in that in the process of up-sampling, the output of the same scale corresponding to a feature extraction part (namely, a down-sampling process) is fused every time of up-sampling, so the U-net network has a good modeling effect on an input image in space, the considered feature size is single, and the consideration of time information is not added, so the input data such as video has a problem of strong time correlation, and a good effect can be obtained only by setting a time module.
4. The invention provides a method for calculating a tremor score (ST-Pscore) based on space-time sliding window operation
In order to further improve the accuracy, the invention performs space-time sliding window operation after obtaining the corresponding dithering score of each frame, not only performs weighting processing on the previous and subsequent frames in time, but also performs average value processing on the dithering score of each region on the small region image on each frame. The combination of the two can further establish the space-time connection, and a better detection effect is obtained.
Description of the drawings:
FIG. 1 is an extracted light flow chart of the present invention.
Fig. 2 is a P-net network diagram of the present invention.
Fig. 3 is a Context structure diagram in the present invention.
Fig. 4 is a diagram showing the structure of ConvGRU in the present invention.
Fig. 5 is a training flow for implementing tongue fibrillation detection by using P-net according to the present invention.
FIG. 6 is a test flow chart of the present invention.
Detailed Description
The following describes the embodiments of the present invention in detail with reference to the accompanying drawings.
1. Pretreatment of
After the original collected tongue dynamic data are extracted one by one into separate pictures, every two adjacent pictures are input into an optical flow network which is pre-trained on a large-scale data set to extract optical flow information, and the specific network structure is shown in figure 1.
2. Construction of P-net network
As shown in fig. 2. The specific parameters of each layer of the P-net network constructed by the invention are as follows:
① Two convolution layers C1 and C2: the input size is 256×256, the number of input channels is 3, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function LeaklyRelu, the output size is 256×256, and the number of output channels is 64.
② P1 downsampling layer: the input size is 256×256, the number of input channels is 64, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 128×128.
③ C3, C4 two convolution layers: the input size is 128×128, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128.
④ P2 downsampling layer: the input size is 128×128, the number of input channels is 128, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', the output size is 64×64,
⑤ Two convolution layers of C5 and C6: the input size is 64×64, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256. ⑥ P3 downsampling layer: the input size is 64×64, the number of input channels is 256, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 32×32.
⑦ CT1 Context layer: the input size is 32×32, the number of input channels is 256, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512 by four channel addition.
⑧ Layer L1, L2 two ConvGRU: the input size is 32×32, the number of input channels is 512, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512.
⑨ U1 deconvolution lamination: the input size is 32×32, the number of input channels is 512, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 64×64.
⑩ And splicing the result after deconvolution of U1 with the result of the C6 convolution with the corresponding size, wherein the splicing dimension is 3.
C7, C8 two convolution layers: the input size is 64×64, the number of input channels is 512, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256. /(I)U2 deconvolution lamination: the input size is 64×64, the number of input channels is 256, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 128×128.
And splicing the result after the deconvolution of U2 with the result of the convolution of C4 with the corresponding size, wherein the splicing dimension is 3.
C9, C10 two convolution layers: the input size is 128×128, the number of input channels is 256, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128.
U3 deconvolution lamination: the input size is 128×128, the number of input channels is 128, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 256×256.
And splicing the result after the deconvolution of U3 with the result of the convolution of C2 with the corresponding size, wherein the splicing dimension is 3.
C11, C12 two convolution layers: the input size is 256×256, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64.
Output layer: the input size is 256×256, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'same', the activation mode is tanh, the output size is 256×256, and the number of output channels is 3.
3. Context build
As shown in fig. 3. The specific parameters of each layer of the Context structure constructed by the invention are as follows:
① L1 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 5, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
② L2 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 3, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
③ L3 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 1, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
④ L4 branch: the two convolution structures are composed of two convolution structures, wherein the input size is 32 multiplied by 32, the number of input channels is 256, the two convolution kernels are respectively 1 multiplied by 1, the step length is 1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1, the activation function is LeaklyRelu, the output size is 32 multiplied by 32, and the number of output channels is 512.
4. ConvGRU construction
As shown in fig. 4. The concrete arrangement of ConvGRU structure constructed by the invention is as follows:
The input of the module is T feature graphs output by a Context module, the module consists of two ConvGRU-cells, the convolution input size of each layer is 32×32, the number of input channels is 512, the convolution kernel size is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 32×32, and the number of output channels is 512. The last output is selected as the input feature for the next stage.
5. Training process for tongue tremor detection
As shown in fig. 5, the specific training process of the dynamic tongue fibrillation detection algorithm based on prediction implemented by using the P-net network is as follows: ① And cutting the long video in the training set into single frame images, and inputting two adjacent frames into an optical flow network to extract an optical flow.
② Four frames of optical flow images are sequentially input into a P-net network, the four frames of images are firstly encoded, four feature images are obtained after passing through a Context module in the network, then the four feature images are used as input of ConvGRU modules in the network, one feature image comprising the previous four frames of information is obtained, and all decoding operations in the network are carried out on the feature images, so that a final output predicted optical flow image is obtained.
③ And comparing the difference between the predicted optical flow and the true optical flow to perform network optimization. The invention adopts the intensity loss function and the gradient loss function, msssim structure loss function to compare the difference between the predicted optical flow and the true optical flow, and adjusts the network parameters by minimizing the global loss function through the Adam algorithm in the training process. The specific calculation formula is as follows: (where I is true optical flow and I * is predicted optical flow)
Lgrad(I,I*)=|||gd(I)|-|gd(I*)|||1
Lmsssim(I,I*)=|||msssim(I)|-|msssim(I*)|||1
④ We have introduced a generative model (GAN) consisting of a discriminant and a generator. The predicted optical flow and the original optical flow are input into the same judging model (namely a judging device D), and the characteristics of the judging model are extracted and whether the input is true or the predicted optical flow is judged. The discrimination model and the P-net (serving as a generator G) are trained together, so that the discrimination model cannot distinguish whether an image predicted by the P-net is consistent with an original truth image or not, and the prediction capability of the network is further improved. The discriminant model consists of three convolution layers and one output layer. The training process loss function corresponding to GAN is calculated as follows: (where D is a discriminator and G is a P-net generator)
⑤ In summary, we have the loss function when training P-net: (in the experiment, lambda a was set to 2e-4, lambda s was set to 0.5, lambda g was set to 1, and lambda i was set to 1)
The loss function in training the arbiter is: (lambda d was set in the experiment to 2 e-5)
6. Test process based on P-net tongue fibrillation detection algorithm
As shown in fig. 6, the specific test process of the dynamic tongue fibrillation anomaly detection algorithm based on prediction implemented by using the P-net network is as follows:
① And cutting the long video in the test set into single frame images, and inputting two adjacent frames into an optical flow network to extract an optical flow.
② For the test optical flow frame, taking the I t frame as an example, respectively inputting four continuous frames consisting of an I t frame and an I t-1、It-2、It-3 frame into a trained P-net prediction network to finally obtain a predicted future frameWill/>Comparing with the input original true value I t+1, obtaining a predicted peak signal-to-noise ratio (PSNR) between two images by a space-time sliding window evaluation method (ST-Pscore) designed by us, and calculating the predicted peak signal-to-noise ratio to obtain a predicted tongue tremor score value of the dynamic frame, wherein a specific calculation formula of ST-Pscore is as follows: ( Where k is the first k minimum PSNR indexes in all selected small area images, M, N is the size of the small area image, λ is the weight of the current frame, and p is the number of selected preceding and following frames. At the time of detection, k is set to 5, M, N is set to 32, λ is set to 5, and p is set to 3 )
③ After the spatiotemporal jitter score (ST-Pscore) of test frame I t is determined, the value is compared with a set threshold (threshold). If ST-Pscore is greater than or equal to threshold, the test frame I t is a tongue fibrillation frame. If ST-Pscore < threshold, test frame I t is a non-lingual fibrillation frame. The present invention takes threshold=0.432.
④ Repeating the step ②③ for all the extracted optical flow frames, detecting all the tongue tremor frames in the test data, and outputting tremor fraction of each frame to realize real-time dynamic tongue tremor detection.
Claims (4)
1. A dynamic tongue fibrillation detection method based on P-net is characterized in that:
(1) Optical flow extraction preprocessing based on tongue tremor information
After the original collected tongue dynamic data are extracted one by one into separate pictures, inputting every two adjacent pictures into a pre-trained optical flow network on a large-scale data set to extract optical flow information;
(2) Designing P-net network
The network construction structure specifically proposed is as follows:
p-net consists of three parts: the device comprises an encoding module, a multi-scale feature and time information fusion module and a decoding module; the coding module completes the primary coding of the input tongue dynamic information through the structural design of three '2-layer convolution layers and 1-layer downsampling layer'; the multi-scale feature and time information fusion module performs space-time fusion on the features extracted in the previous stage through a 'Context module and ConvGRU module' which are designed in a combined mode, and provides better feature expression for the next stage; the decoding module realizes the recovery of the bottom layer characteristics through the structural design of three '2-layer convolution layers and 1-layer up-sampling layer', thereby preparing for detecting the tongue trembling state; the Context module consists of four branches, wherein the three branches consist of 3 convolution layers, and one branch consists of 2 convolution layers; the front and back convolution layers have the functions of dimension increasing and dimension decreasing, and the middle convolution layer achieves the purpose of multi-scale feature extraction by adopting different atrous rate designs; the ConvGRU module consists of two convgru-cells, and extracts time characteristics of input tongue dynamic information by inputting continuous T feature graphs and updating two state information of h1 and h 2;
(3) Training a P-net network
The form input to the P-net is continuous T light flow graphs, each light flow graph is independently encoded, and finally, the T+1st light flow graph is obtained through prediction; the actual T+1th optical flow diagram is taken as ground truth, ground truth and the predicted pictures are input into various loss functions to perform network optimization, a generated countermeasure network model is used, the predicted images and the corresponding ground truth are simultaneously input into a discriminator to perform joint discrimination, and the discrimination model cannot distinguish whether the predicted images or the original ground truth are input;
(4) Prediction of tongue movement state using P-net network
For the test optical flow frame I t, respectively inputting four continuous frames consisting of an I t frame and an I t-1、It-2、It-3 frame into a trained P-net prediction network to finally obtain a predicted future frameWill/>Comparing with the input original true value I t+1, obtaining a predicted peak signal-to-noise ratio PSNR between two images through a space-time sliding window evaluation method ST-Pscore, and calculating the predicted tongue tremor score value of the dynamic frame through the predicted peak signal-to-noise ratio, wherein the specific calculation formula ST-Pscore is as follows:
Wherein k is the number of PSNR indexes arranged in front after PSNR indexes in all selected small-area images are arranged from small to large; m, N is the size of the small area image, lambda is the weight of the current frame, and p is the number of the selected front and rear multi-frames; k is set to 5, M, N is set to 32, λ is set to 5, and p is set to 3;
a relationship between the tremor score and a set threshold is compared to determine whether the tongue in the input dynamic data is in a tremor state, and if the tongue is greater than or equal to the threshold, the tongue is considered to be in the tremor state, and the threshold is taken to be 0.432.
2. The method of claim 1, wherein the specific parameters of each layer of the P-net network are as follows:
① Two convolution layers C1 and C2: the input size is 256×256, the number of input channels is 3, the convolution kernel is 3×3, the step length is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64;
② P1 downsampling layer: the input size is 256×256, the number of input channels is 64, the pooling core is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 128×128;
③ C3, C4 two convolution layers: the input size is 128×128, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128;
④ P2 downsampling layer: the input size is 128×128, the number of input channels is 128, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', the output size is 64×64,
⑤ Two convolution layers of C5 and C6: the input size is 64×64, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256;
⑥ P3 downsampling layer: the input size is 64×64, the number of input channels is 256, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 32×32;
⑦ CT1 Context layer: the input size is 32×32, the number of input channels is 256, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512 by adding four channels;
⑧ Layer L1, L2 two ConvGRU: the input size is 32×32, the number of input channels is 512, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512;
⑨ U1 deconvolution lamination: the input size is 32×32, the number of input channels is 512, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 64×64;
⑩ Splicing the result after deconvolution of U1 with the result of C6 convolution with the corresponding size, wherein the splicing dimension is 3;
C7, C8 two convolution layers: the input size is 64×64, the number of input channels is 512, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256; /(I) U2 deconvolution lamination: the input size is 64×64, the number of input channels is 256, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 128×128;
Splicing the result after the deconvolution of U2 with the result of the C4 convolution with the corresponding size, wherein the splicing dimension is 3;
C9, C10 two convolution layers: the input size is 128×128, the number of input channels is 256, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128;
U3 deconvolution lamination: the input size is 128×128, the number of input channels is 128, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 256×256;
splicing the result after the deconvolution of U3 with the result of the C2 convolution with the corresponding size, wherein the splicing dimension is 3;
C11, C12 two convolution layers: the input size is 256×256, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64;
Output layer: the input size is 256×256, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'same', the activation mode is tanh, the output size is 256×256, and the number of output channels is 3.
3. The method according to claim 1, wherein the specific parameters of each layer of the Context structure are as follows:
① L1 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 5, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
② L2 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 3, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
③ L3 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 1, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
④ L4 branch: the two convolution structures are composed of an input size of 32 multiplied by 32, an input channel number of 256, two convolution kernels of 1 multiplied by 1, a step length of 1, an edge filling mode of 'valid', a cavity convolution rate of 1, an activation function of LeaklyRelu, an output size of 32 multiplied by 32 and an output channel number of 512.
4. The method of claim 1, wherein the concrete arrangement of the ConvGRU structures constructed is as follows:
The input of the module is T feature graphs output by a Context module, the module consists of two ConvGRU-cells, the convolution input size of each layer is 32×32, the number of input channels is 512, the convolution kernel size is 3×3, the step length is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 32×32, and the number of output channels is 512; the last output is selected as the input feature for the next stage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010040375.6A CN111242038B (en) | 2020-01-15 | 2020-01-15 | Dynamic tongue fibrillation detection method based on frame prediction network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010040375.6A CN111242038B (en) | 2020-01-15 | 2020-01-15 | Dynamic tongue fibrillation detection method based on frame prediction network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111242038A CN111242038A (en) | 2020-06-05 |
CN111242038B true CN111242038B (en) | 2024-06-07 |
Family
ID=70877922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010040375.6A Active CN111242038B (en) | 2020-01-15 | 2020-01-15 | Dynamic tongue fibrillation detection method based on frame prediction network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242038B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112053375A (en) * | 2020-08-26 | 2020-12-08 | 上海眼控科技股份有限公司 | Method and equipment for predicting prediction based on improved network convolution model |
CN113129226B (en) * | 2021-03-24 | 2023-06-23 | 西安理工大学 | ConvGRU-U-Net-based computing ghost imaging reconstruction algorithm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018098986A1 (en) * | 2016-11-29 | 2018-06-07 | 深圳市易特科信息技术有限公司 | Automatic detection system and method for tongue images in traditional chinese medicine |
CN109543606A (en) * | 2018-11-22 | 2019-03-29 | 中山大学 | A kind of face identification method that attention mechanism is added |
CN109740654A (en) * | 2018-12-26 | 2019-05-10 | 华东师范大学 | A kind of tongue body automatic testing method based on deep learning |
CN110033002A (en) * | 2019-04-19 | 2019-07-19 | 福州大学 | Detection method of license plate based on multitask concatenated convolutional neural network |
CN110251084A (en) * | 2019-06-21 | 2019-09-20 | 福州数据技术研究院有限公司 | A kind of detection of tongue picture and recognition methods based on artificial intelligence |
CN110599463A (en) * | 2019-08-26 | 2019-12-20 | 依脉人工智能医疗科技(天津)有限公司 | Tongue image detection and positioning algorithm based on lightweight cascade neural network |
CN110619319A (en) * | 2019-09-27 | 2019-12-27 | 北京紫睛科技有限公司 | Improved MTCNN model-based face detection method and system |
-
2020
- 2020-01-15 CN CN202010040375.6A patent/CN111242038B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018098986A1 (en) * | 2016-11-29 | 2018-06-07 | 深圳市易特科信息技术有限公司 | Automatic detection system and method for tongue images in traditional chinese medicine |
CN109543606A (en) * | 2018-11-22 | 2019-03-29 | 中山大学 | A kind of face identification method that attention mechanism is added |
CN109740654A (en) * | 2018-12-26 | 2019-05-10 | 华东师范大学 | A kind of tongue body automatic testing method based on deep learning |
CN110033002A (en) * | 2019-04-19 | 2019-07-19 | 福州大学 | Detection method of license plate based on multitask concatenated convolutional neural network |
CN110251084A (en) * | 2019-06-21 | 2019-09-20 | 福州数据技术研究院有限公司 | A kind of detection of tongue picture and recognition methods based on artificial intelligence |
CN110599463A (en) * | 2019-08-26 | 2019-12-20 | 依脉人工智能医疗科技(天津)有限公司 | Tongue image detection and positioning algorithm based on lightweight cascade neural network |
CN110619319A (en) * | 2019-09-27 | 2019-12-27 | 北京紫睛科技有限公司 | Improved MTCNN model-based face detection method and system |
Also Published As
Publication number | Publication date |
---|---|
CN111242038A (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Mlcvnet: Multi-level context votenet for 3d object detection | |
CN110276316B (en) | Human body key point detection method based on deep learning | |
CN111242238B (en) | RGB-D image saliency target acquisition method | |
CN113724880A (en) | Abnormal brain connection prediction system, method and device and readable storage medium | |
CN111476805A (en) | Cross-source unsupervised domain adaptive segmentation model based on multiple constraints | |
CN112329780B (en) | Depth image semantic segmentation method based on deep learning | |
CN111242038B (en) | Dynamic tongue fibrillation detection method based on frame prediction network | |
CN115861616A (en) | Semantic segmentation system for medical image sequence | |
CN117351363A (en) | Remote sensing image building extraction method based on transducer | |
CN117333908A (en) | Cross-modal pedestrian re-recognition method based on attitude feature alignment | |
CN117238034A (en) | Human body posture estimation method based on space-time transducer | |
Liu et al. | MRL-Net: multi-scale representation learning network for COVID-19 lung CT image segmentation | |
CN109766918A (en) | Conspicuousness object detecting method based on the fusion of multi-level contextual information | |
CN111008570B (en) | Video understanding method based on compression-excitation pseudo-three-dimensional network | |
CN116091885A (en) | RAU-GAN-based lung nodule data enhancement method | |
Liu et al. | Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection | |
CN114693953B (en) | RGB-D significance target detection method based on cross-mode bidirectional complementary network | |
CN117152630A (en) | Optical remote sensing image change detection method based on deep learning | |
CN116912727A (en) | Video human behavior recognition method based on space-time characteristic enhancement network | |
CN111881979B (en) | Multi-modal data annotation device and computer-readable storage medium containing program | |
Shu et al. | Aft-net: Active fusion-transduction for multi-stream medical image segmentation | |
CN115063352A (en) | Salient object detection device and method based on multi-graph neural network collaborative learning architecture | |
CN114463844A (en) | Fall detection method based on self-attention double-flow network | |
CN113706650A (en) | Image generation method based on attention mechanism and flow model | |
CN113256497A (en) | Image reconstruction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |