Nothing Special   »   [go: up one dir, main page]

CN111242038B - Dynamic tongue fibrillation detection method based on frame prediction network - Google Patents

Dynamic tongue fibrillation detection method based on frame prediction network Download PDF

Info

Publication number
CN111242038B
CN111242038B CN202010040375.6A CN202010040375A CN111242038B CN 111242038 B CN111242038 B CN 111242038B CN 202010040375 A CN202010040375 A CN 202010040375A CN 111242038 B CN111242038 B CN 111242038B
Authority
CN
China
Prior art keywords
size
input
output
convolution
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010040375.6A
Other languages
Chinese (zh)
Other versions
CN111242038A (en
Inventor
蔡轶珩
刘嘉琦
郭雅君
胡绍斌
张新峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202010040375.6A priority Critical patent/CN111242038B/en
Publication of CN111242038A publication Critical patent/CN111242038A/en
Application granted granted Critical
Publication of CN111242038B publication Critical patent/CN111242038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A dynamic tongue fibrillation detection method based on a frame prediction network relates to the fields of computer vision, pattern recognition and medical engineering. The invention provides a new P-net network aiming at the processing problem of data with strong time-space correlation such as dynamic video, which not only considers the characteristics of multiple scales, but also adds ConvGRU modules for time modeling, thereby realizing the fusion processing of time-space information. In order to accurately improve judgment performance of tongue tremor and generalization capability of a network, the invention executes optical flow extraction operation before network input so as to further acquire tremor information among frames. The invention adopts a prediction thought, uses the generated countermeasure model to carry out joint discrimination training on the network, simultaneously uses a space-time sliding window operation when calculating the tremor fraction, and finally realizes a high-precision dynamic tongue tremor detection algorithm based on the P-net network.

Description

Dynamic tongue fibrillation detection method based on frame prediction network
Technical Field
The invention belongs to the field of computer vision, pattern recognition and medical engineering, and relates to a dynamic tongue fibrillation detection method based on a frame prediction network.
Background
Along with the improvement of the living standard of people and the development of technology, big data analysis and artificial intelligence are widely focused, and the combination of medicine and artificial intelligence is an important direction. Medical science can provide powerful assistance to people, and artificial intelligence can also provide analysis tasks for medical data. The two are mutually blended, so that the method has a very good prospect, and a plurality of enterprises and institutions perform multiple investment in the direction. Among these, machine learning is an important method for realizing artificial intelligence, and an automatic analysis process is realized by analyzing and organizing collected data and creating a corresponding model by using an algorithm of machine learning. If we can construct some systems which can assist doctors in diagnosing the illness state, the doctor can be greatly facilitated to diagnose, and meanwhile, the condition of missed detection of the illness can be avoided to a certain extent.
The tongue diagnosis is one of the main contents of four diagnostic methods in TCM, wherein tongue tremors refer to unstable and uncontrollable tongue tremors. By observing the degree of abnormality in tongue tremor, we can get some useful information from the "argue heatedly animals, the spleen of which is affected", and provide more data reference for doctors. However, this field also faces many difficulties, and tongue fibrillation belongs to a dynamic information, and how to model the dynamic characteristics better is a problem to be solved urgently. Therefore, the intelligent tongue fibrillation detection method is designed to help patients and doctors, and powerful objective data is provided for quick and better diagnosis of the patients.
Disclosure of Invention
The related work in the field is quite blank, and therefore the invention provides a high-precision dynamic tongue-shake detection algorithm based on a frame prediction network (P-net). The structure of Context and ConvGRU is added into the U-net network, so that the characteristic of multi-scale modeling of the network on input data can be enhanced, and meanwhile, the ConvGRU module can be used for modeling the input data in time. The P-net network provided by the invention has a good modeling effect on data with strong time-space characteristics such as video, and can better distinguish whether the tongue shakes or not.
The invention provides a high-precision dynamic tongue fibrillation detection algorithm based on a P-net network, wherein the general idea is that after normal dynamic tongue pretreatment, the dynamic tongue fibrillation detection algorithm is input into a designed network for training, and a model capable of predicting the next frame state of the normal tongue better is expected to be established; during testing, the trained model cannot correctly predict the state of the tongue with tremor due to the difference of the movement of the tongue with tremor and the normal tongue, so that the tongue with tremor can be automatically detected when the tongue with tremor occurs through the ST-Pscore scoring mechanism designed by us. The method mainly comprises the following characteristics:
(1) Optical flow extraction preprocessing based on tongue tremor information
After the original acquired tongue dynamic data are extracted one by one into separate pictures, every two adjacent pictures are input into an optical flow network which is pre-trained on a large-scale data set to extract optical flow information, so that interference of static information is reduced, and tongue vibration information is better obtained.
(2) New design of P-net network
The network construction structure specifically proposed is as follows:
p-net consists of three parts: the method comprises an encoding stage, a multi-scale feature and time information fusion stage and a decoding stage. The coding stage is characterized in that the initial coding of the input tongue dynamic information is finished through the structural design of three '2-layer convolution layers and 1-layer downsampling layer'; the multi-scale feature and time information fusion stage is characterized in that the features extracted in the previous stage can be subjected to space-time fusion through a 'Context module and ConvGRU module' which are designed in a combined mode, so that better feature expression is provided for the next stage; the decoding stage is characterized in that the recovery of the bottom layer characteristics is realized through the structural design of three '2-layer convolution layers and 1-layer up-sampling layers', so that preparation is made for detecting the tongue tremor state. In summary, we have designed a P-net that is a network structure that encodes and fuses multi-scale features and temporal features.
The Context module consists of four branches, wherein the three branches consist of 3 convolution layers, and one branch consists of 2 convolution layers; the front and back convolution layers have the functions of dimension increasing and dimension decreasing, and the middle convolution layer achieves the purpose of multi-scale feature extraction by adopting different atrous rate designs; the ConvGRU module consists of two convgru-cells, and the time characteristics of the input tongue dynamic information are extracted by inputting continuous T characteristic diagrams and updating two state information of h1 and h2, wherein the extracted characteristics are the last output.
(3) Process for predicting tongue movement state and detecting tongue tremors by using P-net network
Characteristics during training:
The mode of inputting the optical flow diagrams to the P-net is continuous T optical flow diagrams, each optical flow diagram is independently encoded, and finally, the T+1st optical flow diagram is obtained through prediction; the actual T+1th optical flow diagram is taken as ground truth, ground truth and the predicted pictures are input into various loss functions to perform network optimization, a generated countermeasure network model is used, the predicted images and the corresponding ground truth are simultaneously input into a discriminator to perform joint discrimination, and the discrimination model cannot distinguish whether the predicted images or the original ground truth are input, so that the prediction process is further optimized, and the detection result is improved.
Characteristics at the time of prediction:
The test set also extracts the optical flow, and takes continuous T frames to be respectively input into the trained network, and finally the predicted picture of each frame is obtained.
Characteristics of tongue tremor detection:
The predicted picture and the original picture are input into an ST-Pscore scoring frame designed by us, the evaluation method combines sliding window operation on time and space, space-time content is further fused, a final tremor score is obtained, and whether the tongue in the input dynamic data belongs to a tremor state is judged by comparing the relation between the tremor score and a set threshold value (the threshold value is taken as 0.432).
Advantageous effects
1. The invention realizes the dynamic tongue vibration detection based on prediction by using the P-net network
To our knowledge, this is the first invention in the past that combines deep learning techniques with dynamic tongue fibrillation detection. The invention provides the method for realizing the prediction of dynamic information by using the P-net network, and further realizing the judgment of whether the tongue shakes or not.
2. The invention extracts the optical flow of the input dynamic data through the pre-training network
Because of the dynamic nature of tongue trembling, time information among different frames needs to be analyzed, so that the trembling information among the tongues is obtained by a method for extracting optical flow, key important time characteristics can be extracted for processing, interference of static information on results can be reduced, and the accuracy of an algorithm is improved by combining the two aspects.
3. The invention provides a P-net network
The common U-net network is characterized in that in the process of up-sampling, the output of the same scale corresponding to a feature extraction part (namely, a down-sampling process) is fused every time of up-sampling, so the U-net network has a good modeling effect on an input image in space, the considered feature size is single, and the consideration of time information is not added, so the input data such as video has a problem of strong time correlation, and a good effect can be obtained only by setting a time module.
4. The invention provides a method for calculating a tremor score (ST-Pscore) based on space-time sliding window operation
In order to further improve the accuracy, the invention performs space-time sliding window operation after obtaining the corresponding dithering score of each frame, not only performs weighting processing on the previous and subsequent frames in time, but also performs average value processing on the dithering score of each region on the small region image on each frame. The combination of the two can further establish the space-time connection, and a better detection effect is obtained.
Description of the drawings:
FIG. 1 is an extracted light flow chart of the present invention.
Fig. 2 is a P-net network diagram of the present invention.
Fig. 3 is a Context structure diagram in the present invention.
Fig. 4 is a diagram showing the structure of ConvGRU in the present invention.
Fig. 5 is a training flow for implementing tongue fibrillation detection by using P-net according to the present invention.
FIG. 6 is a test flow chart of the present invention.
Detailed Description
The following describes the embodiments of the present invention in detail with reference to the accompanying drawings.
1. Pretreatment of
After the original collected tongue dynamic data are extracted one by one into separate pictures, every two adjacent pictures are input into an optical flow network which is pre-trained on a large-scale data set to extract optical flow information, and the specific network structure is shown in figure 1.
2. Construction of P-net network
As shown in fig. 2. The specific parameters of each layer of the P-net network constructed by the invention are as follows:
① Two convolution layers C1 and C2: the input size is 256×256, the number of input channels is 3, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function LeaklyRelu, the output size is 256×256, and the number of output channels is 64.
② P1 downsampling layer: the input size is 256×256, the number of input channels is 64, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 128×128.
③ C3, C4 two convolution layers: the input size is 128×128, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128.
④ P2 downsampling layer: the input size is 128×128, the number of input channels is 128, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', the output size is 64×64,
⑤ Two convolution layers of C5 and C6: the input size is 64×64, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256. ⑥ P3 downsampling layer: the input size is 64×64, the number of input channels is 256, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 32×32.
⑦ CT1 Context layer: the input size is 32×32, the number of input channels is 256, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512 by four channel addition.
⑧ Layer L1, L2 two ConvGRU: the input size is 32×32, the number of input channels is 512, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512.
⑨ U1 deconvolution lamination: the input size is 32×32, the number of input channels is 512, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 64×64.
⑩ And splicing the result after deconvolution of U1 with the result of the C6 convolution with the corresponding size, wherein the splicing dimension is 3.
C7, C8 two convolution layers: the input size is 64×64, the number of input channels is 512, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256. /(I)U2 deconvolution lamination: the input size is 64×64, the number of input channels is 256, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 128×128.
And splicing the result after the deconvolution of U2 with the result of the convolution of C4 with the corresponding size, wherein the splicing dimension is 3.
C9, C10 two convolution layers: the input size is 128×128, the number of input channels is 256, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128.
U3 deconvolution lamination: the input size is 128×128, the number of input channels is 128, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 256×256.
And splicing the result after the deconvolution of U3 with the result of the convolution of C2 with the corresponding size, wherein the splicing dimension is 3.
C11, C12 two convolution layers: the input size is 256×256, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64.
Output layer: the input size is 256×256, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'same', the activation mode is tanh, the output size is 256×256, and the number of output channels is 3.
3. Context build
As shown in fig. 3. The specific parameters of each layer of the Context structure constructed by the invention are as follows:
① L1 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 5, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
② L2 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 3, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
③ L3 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step size is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 1, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512.
④ L4 branch: the two convolution structures are composed of two convolution structures, wherein the input size is 32 multiplied by 32, the number of input channels is 256, the two convolution kernels are respectively 1 multiplied by 1, the step length is 1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1, the activation function is LeaklyRelu, the output size is 32 multiplied by 32, and the number of output channels is 512.
4. ConvGRU construction
As shown in fig. 4. The concrete arrangement of ConvGRU structure constructed by the invention is as follows:
The input of the module is T feature graphs output by a Context module, the module consists of two ConvGRU-cells, the convolution input size of each layer is 32×32, the number of input channels is 512, the convolution kernel size is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 32×32, and the number of output channels is 512. The last output is selected as the input feature for the next stage.
5. Training process for tongue tremor detection
As shown in fig. 5, the specific training process of the dynamic tongue fibrillation detection algorithm based on prediction implemented by using the P-net network is as follows: ① And cutting the long video in the training set into single frame images, and inputting two adjacent frames into an optical flow network to extract an optical flow.
② Four frames of optical flow images are sequentially input into a P-net network, the four frames of images are firstly encoded, four feature images are obtained after passing through a Context module in the network, then the four feature images are used as input of ConvGRU modules in the network, one feature image comprising the previous four frames of information is obtained, and all decoding operations in the network are carried out on the feature images, so that a final output predicted optical flow image is obtained.
③ And comparing the difference between the predicted optical flow and the true optical flow to perform network optimization. The invention adopts the intensity loss function and the gradient loss function, msssim structure loss function to compare the difference between the predicted optical flow and the true optical flow, and adjusts the network parameters by minimizing the global loss function through the Adam algorithm in the training process. The specific calculation formula is as follows: (where I is true optical flow and I * is predicted optical flow)
Lgrad(I,I*)=|||gd(I)|-|gd(I*)|||1
Lmsssim(I,I*)=|||msssim(I)|-|msssim(I*)|||1
④ We have introduced a generative model (GAN) consisting of a discriminant and a generator. The predicted optical flow and the original optical flow are input into the same judging model (namely a judging device D), and the characteristics of the judging model are extracted and whether the input is true or the predicted optical flow is judged. The discrimination model and the P-net (serving as a generator G) are trained together, so that the discrimination model cannot distinguish whether an image predicted by the P-net is consistent with an original truth image or not, and the prediction capability of the network is further improved. The discriminant model consists of three convolution layers and one output layer. The training process loss function corresponding to GAN is calculated as follows: (where D is a discriminator and G is a P-net generator)
⑤ In summary, we have the loss function when training P-net: (in the experiment, lambda a was set to 2e-4, lambda s was set to 0.5, lambda g was set to 1, and lambda i was set to 1)
The loss function in training the arbiter is: (lambda d was set in the experiment to 2 e-5)
6. Test process based on P-net tongue fibrillation detection algorithm
As shown in fig. 6, the specific test process of the dynamic tongue fibrillation anomaly detection algorithm based on prediction implemented by using the P-net network is as follows:
① And cutting the long video in the test set into single frame images, and inputting two adjacent frames into an optical flow network to extract an optical flow.
② For the test optical flow frame, taking the I t frame as an example, respectively inputting four continuous frames consisting of an I t frame and an I t-1、It-2、It-3 frame into a trained P-net prediction network to finally obtain a predicted future frameWill/>Comparing with the input original true value I t+1, obtaining a predicted peak signal-to-noise ratio (PSNR) between two images by a space-time sliding window evaluation method (ST-Pscore) designed by us, and calculating the predicted peak signal-to-noise ratio to obtain a predicted tongue tremor score value of the dynamic frame, wherein a specific calculation formula of ST-Pscore is as follows: ( Where k is the first k minimum PSNR indexes in all selected small area images, M, N is the size of the small area image, λ is the weight of the current frame, and p is the number of selected preceding and following frames. At the time of detection, k is set to 5, M, N is set to 32, λ is set to 5, and p is set to 3 )
③ After the spatiotemporal jitter score (ST-Pscore) of test frame I t is determined, the value is compared with a set threshold (threshold). If ST-Pscore is greater than or equal to threshold, the test frame I t is a tongue fibrillation frame. If ST-Pscore < threshold, test frame I t is a non-lingual fibrillation frame. The present invention takes threshold=0.432.
④ Repeating the step ②③ for all the extracted optical flow frames, detecting all the tongue tremor frames in the test data, and outputting tremor fraction of each frame to realize real-time dynamic tongue tremor detection.

Claims (4)

1. A dynamic tongue fibrillation detection method based on P-net is characterized in that:
(1) Optical flow extraction preprocessing based on tongue tremor information
After the original collected tongue dynamic data are extracted one by one into separate pictures, inputting every two adjacent pictures into a pre-trained optical flow network on a large-scale data set to extract optical flow information;
(2) Designing P-net network
The network construction structure specifically proposed is as follows:
p-net consists of three parts: the device comprises an encoding module, a multi-scale feature and time information fusion module and a decoding module; the coding module completes the primary coding of the input tongue dynamic information through the structural design of three '2-layer convolution layers and 1-layer downsampling layer'; the multi-scale feature and time information fusion module performs space-time fusion on the features extracted in the previous stage through a 'Context module and ConvGRU module' which are designed in a combined mode, and provides better feature expression for the next stage; the decoding module realizes the recovery of the bottom layer characteristics through the structural design of three '2-layer convolution layers and 1-layer up-sampling layer', thereby preparing for detecting the tongue trembling state; the Context module consists of four branches, wherein the three branches consist of 3 convolution layers, and one branch consists of 2 convolution layers; the front and back convolution layers have the functions of dimension increasing and dimension decreasing, and the middle convolution layer achieves the purpose of multi-scale feature extraction by adopting different atrous rate designs; the ConvGRU module consists of two convgru-cells, and extracts time characteristics of input tongue dynamic information by inputting continuous T feature graphs and updating two state information of h1 and h 2;
(3) Training a P-net network
The form input to the P-net is continuous T light flow graphs, each light flow graph is independently encoded, and finally, the T+1st light flow graph is obtained through prediction; the actual T+1th optical flow diagram is taken as ground truth, ground truth and the predicted pictures are input into various loss functions to perform network optimization, a generated countermeasure network model is used, the predicted images and the corresponding ground truth are simultaneously input into a discriminator to perform joint discrimination, and the discrimination model cannot distinguish whether the predicted images or the original ground truth are input;
(4) Prediction of tongue movement state using P-net network
For the test optical flow frame I t, respectively inputting four continuous frames consisting of an I t frame and an I t-1、It-2、It-3 frame into a trained P-net prediction network to finally obtain a predicted future frameWill/>Comparing with the input original true value I t+1, obtaining a predicted peak signal-to-noise ratio PSNR between two images through a space-time sliding window evaluation method ST-Pscore, and calculating the predicted tongue tremor score value of the dynamic frame through the predicted peak signal-to-noise ratio, wherein the specific calculation formula ST-Pscore is as follows:
Wherein k is the number of PSNR indexes arranged in front after PSNR indexes in all selected small-area images are arranged from small to large; m, N is the size of the small area image, lambda is the weight of the current frame, and p is the number of the selected front and rear multi-frames; k is set to 5, M, N is set to 32, λ is set to 5, and p is set to 3;
a relationship between the tremor score and a set threshold is compared to determine whether the tongue in the input dynamic data is in a tremor state, and if the tongue is greater than or equal to the threshold, the tongue is considered to be in the tremor state, and the threshold is taken to be 0.432.
2. The method of claim 1, wherein the specific parameters of each layer of the P-net network are as follows:
① Two convolution layers C1 and C2: the input size is 256×256, the number of input channels is 3, the convolution kernel is 3×3, the step length is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64;
② P1 downsampling layer: the input size is 256×256, the number of input channels is 64, the pooling core is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 128×128;
③ C3, C4 two convolution layers: the input size is 128×128, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128;
④ P2 downsampling layer: the input size is 128×128, the number of input channels is 128, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', the output size is 64×64,
⑤ Two convolution layers of C5 and C6: the input size is 64×64, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256;
⑥ P3 downsampling layer: the input size is 64×64, the number of input channels is 256, the pooling kernel is 2×2, the step size is 1, the edge filling mode is 'same', and the output size is 32×32;
⑦ CT1 Context layer: the input size is 32×32, the number of input channels is 256, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512 by adding four channels;
⑧ Layer L1, L2 two ConvGRU: the input size is 32×32, the number of input channels is 512, the convolution kernel is 3×3, the output size is 32×32, and the number of output channels is 512;
⑨ U1 deconvolution lamination: the input size is 32×32, the number of input channels is 512, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 64×64;
⑩ Splicing the result after deconvolution of U1 with the result of C6 convolution with the corresponding size, wherein the splicing dimension is 3;
C7, C8 two convolution layers: the input size is 64×64, the number of input channels is 512, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 64×64, and the number of output channels is 256; /(I) U2 deconvolution lamination: the input size is 64×64, the number of input channels is 256, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 128×128;
Splicing the result after the deconvolution of U2 with the result of the C4 convolution with the corresponding size, wherein the splicing dimension is 3;
C9, C10 two convolution layers: the input size is 128×128, the number of input channels is 256, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 128×128, and the number of output channels is 128;
U3 deconvolution lamination: the input size is 128×128, the number of input channels is 128, the convolution kernel is 2×2, the step size is 2, the edge filling mode is 'same', and the output size is 256×256;
splicing the result after the deconvolution of U3 with the result of the C2 convolution with the corresponding size, wherein the splicing dimension is 3;
C11, C12 two convolution layers: the input size is 256×256, the number of input channels is 128, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 256×256, and the number of output channels is 64;
Output layer: the input size is 256×256, the number of input channels is 64, the convolution kernel is 3×3, the step size is 1, the edge filling mode is 'same', the activation mode is tanh, the output size is 256×256, and the number of output channels is 3.
3. The method according to claim 1, wherein the specific parameters of each layer of the Context structure are as follows:
① L1 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 5, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
② L2 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 3, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
③ L3 branch: the three convolution kernels are respectively 1×1,3×3 and 1×1, the step length is1, the edge filling mode is 'valid', the cavity convolution rate is respectively 1 and 1, the activation function is LeaklyRelu, the output size is 32×32, and the output channel number is 512;
④ L4 branch: the two convolution structures are composed of an input size of 32 multiplied by 32, an input channel number of 256, two convolution kernels of 1 multiplied by 1, a step length of 1, an edge filling mode of 'valid', a cavity convolution rate of 1, an activation function of LeaklyRelu, an output size of 32 multiplied by 32 and an output channel number of 512.
4. The method of claim 1, wherein the concrete arrangement of the ConvGRU structures constructed is as follows:
The input of the module is T feature graphs output by a Context module, the module consists of two ConvGRU-cells, the convolution input size of each layer is 32×32, the number of input channels is 512, the convolution kernel size is 3×3, the step length is 1, the edge filling mode is 'valid', the activation function is LeaklyRelu, the output size is 32×32, and the number of output channels is 512; the last output is selected as the input feature for the next stage.
CN202010040375.6A 2020-01-15 2020-01-15 Dynamic tongue fibrillation detection method based on frame prediction network Active CN111242038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010040375.6A CN111242038B (en) 2020-01-15 2020-01-15 Dynamic tongue fibrillation detection method based on frame prediction network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010040375.6A CN111242038B (en) 2020-01-15 2020-01-15 Dynamic tongue fibrillation detection method based on frame prediction network

Publications (2)

Publication Number Publication Date
CN111242038A CN111242038A (en) 2020-06-05
CN111242038B true CN111242038B (en) 2024-06-07

Family

ID=70877922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010040375.6A Active CN111242038B (en) 2020-01-15 2020-01-15 Dynamic tongue fibrillation detection method based on frame prediction network

Country Status (1)

Country Link
CN (1) CN111242038B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053375A (en) * 2020-08-26 2020-12-08 上海眼控科技股份有限公司 Method and equipment for predicting prediction based on improved network convolution model
CN113129226B (en) * 2021-03-24 2023-06-23 西安理工大学 ConvGRU-U-Net-based computing ghost imaging reconstruction algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018098986A1 (en) * 2016-11-29 2018-06-07 深圳市易特科信息技术有限公司 Automatic detection system and method for tongue images in traditional chinese medicine
CN109543606A (en) * 2018-11-22 2019-03-29 中山大学 A kind of face identification method that attention mechanism is added
CN109740654A (en) * 2018-12-26 2019-05-10 华东师范大学 A kind of tongue body automatic testing method based on deep learning
CN110033002A (en) * 2019-04-19 2019-07-19 福州大学 Detection method of license plate based on multitask concatenated convolutional neural network
CN110251084A (en) * 2019-06-21 2019-09-20 福州数据技术研究院有限公司 A kind of detection of tongue picture and recognition methods based on artificial intelligence
CN110599463A (en) * 2019-08-26 2019-12-20 依脉人工智能医疗科技(天津)有限公司 Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN110619319A (en) * 2019-09-27 2019-12-27 北京紫睛科技有限公司 Improved MTCNN model-based face detection method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018098986A1 (en) * 2016-11-29 2018-06-07 深圳市易特科信息技术有限公司 Automatic detection system and method for tongue images in traditional chinese medicine
CN109543606A (en) * 2018-11-22 2019-03-29 中山大学 A kind of face identification method that attention mechanism is added
CN109740654A (en) * 2018-12-26 2019-05-10 华东师范大学 A kind of tongue body automatic testing method based on deep learning
CN110033002A (en) * 2019-04-19 2019-07-19 福州大学 Detection method of license plate based on multitask concatenated convolutional neural network
CN110251084A (en) * 2019-06-21 2019-09-20 福州数据技术研究院有限公司 A kind of detection of tongue picture and recognition methods based on artificial intelligence
CN110599463A (en) * 2019-08-26 2019-12-20 依脉人工智能医疗科技(天津)有限公司 Tongue image detection and positioning algorithm based on lightweight cascade neural network
CN110619319A (en) * 2019-09-27 2019-12-27 北京紫睛科技有限公司 Improved MTCNN model-based face detection method and system

Also Published As

Publication number Publication date
CN111242038A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
Xie et al. Mlcvnet: Multi-level context votenet for 3d object detection
CN110276316B (en) Human body key point detection method based on deep learning
CN111242238B (en) RGB-D image saliency target acquisition method
CN113724880A (en) Abnormal brain connection prediction system, method and device and readable storage medium
CN111476805A (en) Cross-source unsupervised domain adaptive segmentation model based on multiple constraints
CN112329780B (en) Depth image semantic segmentation method based on deep learning
CN111242038B (en) Dynamic tongue fibrillation detection method based on frame prediction network
CN115861616A (en) Semantic segmentation system for medical image sequence
CN117351363A (en) Remote sensing image building extraction method based on transducer
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN117238034A (en) Human body posture estimation method based on space-time transducer
Liu et al. MRL-Net: multi-scale representation learning network for COVID-19 lung CT image segmentation
CN109766918A (en) Conspicuousness object detecting method based on the fusion of multi-level contextual information
CN111008570B (en) Video understanding method based on compression-excitation pseudo-three-dimensional network
CN116091885A (en) RAU-GAN-based lung nodule data enhancement method
Liu et al. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection
CN114693953B (en) RGB-D significance target detection method based on cross-mode bidirectional complementary network
CN117152630A (en) Optical remote sensing image change detection method based on deep learning
CN116912727A (en) Video human behavior recognition method based on space-time characteristic enhancement network
CN111881979B (en) Multi-modal data annotation device and computer-readable storage medium containing program
Shu et al. Aft-net: Active fusion-transduction for multi-stream medical image segmentation
CN115063352A (en) Salient object detection device and method based on multi-graph neural network collaborative learning architecture
CN114463844A (en) Fall detection method based on self-attention double-flow network
CN113706650A (en) Image generation method based on attention mechanism and flow model
CN113256497A (en) Image reconstruction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant