CN110599463B

CN110599463B - Tongue image detection and positioning algorithm based on lightweight cascade neural network

Info

Publication number: CN110599463B
Application number: CN201910789517.6A
Authority: CN
Inventors: 周鹏; 黄俊人; 张海娇
Original assignee: Yi Mai Artificial Intelligence Medical Technology Tianjin Co ltd
Current assignee: Yi Mai Artificial Intelligence Medical Technology Tianjin Co ltd
Priority date: 2019-08-26
Filing date: 2019-08-26
Publication date: 2024-09-03
Anticipated expiration: 2039-08-26
Also published as: CN110599463A

Abstract

The invention provides a tongue image detection and positioning algorithm based on a lightweight cascade neural network, which comprises the following steps: inputting an acquired tongue image picture; cutting the marked tongue picture to obtain positive and negative samples with classification labels, and putting the positive and negative samples into a first layer network; extracting features of sample pictures input into the first layer network, judging and classifying, storing coordinate information of candidate frames classified as positive, and cutting original pictures according to the output candidate frames to obtain input samples of the second layer network; training and classifying sample pictures in the second-layer network, outputting candidate frames classified as positive, and correspondingly cutting an original picture to obtain an input sample of the third-layer network; training the sample pictures in the third layer network, and obtaining the coordinate information of the classification labels and the candidate frames after training. The invention can screen out the candidate frames with low confidence and accurately position the tongue image.

Description

Tongue image detection and positioning algorithm based on lightweight cascade neural network

Technical Field

The invention relates to the technical field of image detection algorithms, in particular to a tongue image detection and positioning algorithm based on a lightweight cascade neural network.

Background

Tongue diagnosis is one of the important diagnostic methods of traditional Chinese medicine. In recent years, with the development of image processing and machine learning techniques, research on a computer tongue diagnosis system has been receiving more and more attention. The complete computer tongue diagnosis system is divided into three parts of tongue image acquisition, tongue body segmentation and tongue image recognition analysis, and the recognition and classification of tongue image information are completed by utilizing related technologies of image processing and machine learning, so that a diagnosis result is finally obtained.

The complete computer tongue diagnosis system is generally divided into three parts of tongue image acquisition, tongue image segmentation and tongue image characteristic recognition and analysis, and the recognition and analysis of tongue image information are completed by utilizing image processing and pattern recognition technology. Researchers have focused on quantification and standardization of tongue diagnosis and suggested capturing canonical tongue images with standard tongue image devices to reduce the effects of environmental factors such as differences in lighting conditions, tongue position, and image size. Although, the constant illumination and fixed position can reduce the difficulty of subsequent tongue segmentation and tongue image recognition. However, given that computer tongue systems should be able to function on more platforms, represented by the internet, algorithms proposed in the past for standard tongue images acquired in stationary tongue instruments may face significant challenges. The existing vgg, master_ rcnn and other detection algorithms have complex network structure and large model, and are not beneficial to the transplantation of actual algorithms.

Aiming at the situation that the traditional Chinese medicine diagnosis and treatment is more complex, the background of pictures to be analyzed is more diversified, the premise of tongue image analysis is that the position of a tongue image in an image is accurately positioned, the tongue image positioning with fixed coordinates at present cannot realize the requirement of accurately positioning a tongue region, whether the tongue image exists in the image is detected, and the accurate positioning of the tongue image is a realistic function which is provided by the algorithm.

Disclosure of Invention

In view of this, the problem to be solved by the present invention is to provide a tongue image detection and positioning algorithm based on a lightweight cascade neural network.

In order to solve the technical problems, the invention adopts the following technical scheme: a tongue image detection and positioning algorithm based on a lightweight cascade neural network comprises the following steps of 1) marking the coordinate position of a target area of a tongue image sample picture;

Step 2) randomly cutting tongue photo sample pictures to obtain positive and negative samples with classification labels, and putting the positive and negative samples into a first layer network;

step 3) carrying out feature extraction, classification and judgment on the sample picture input in the step 2), outputting coordinate information of candidate frames classified as positive, correspondingly cutting an original picture to obtain a carefully selected sample, and scaling the sample to a specified size and inputting the sample into a second-layer network;

Step 4) training the sample pictures in the step 3) to obtain classified candidate frames, outputting coordinate information of the candidate frames classified as positive, correspondingly cutting the original picture to obtain carefully selected samples, scaling the samples to a specified size (larger than the size of the step 3), and putting the samples into a third layer network;

and 5) training the sample pictures in the step 4), and obtaining the coordinate information of the classification labels and the candidate frames after training.

In the present invention, preferably, the step 3) is performed before being performed on the second layer network and the step 4) is performed on the third layer network, and frame regression is performed on the candidate frames, so as to reject the candidate frames with low confidence.

In the present invention, preferably, the step 2) performs differentiation by the iou value to implement positive and negative local sample classification.

In the present invention, preferably, the algorithm step of the training of the step 4) and the step 5) includes:

step S1), preparing a tongue picture sample picture scaled to a window size as a positive sample;

step S2) sampling a negative sample on a tongue picture sample picture without a positive sample (without a tongue picture);

Step S3) calculating the values of 10 integration channels of the positive and negative samples;

step S4), randomly generating a feature pool F, and calculating the error rate of a sampled sample;

Step S5), initializing a sample set D, and carrying out maximum iteration number k _max;

Step S6) initializing the iterative weights W _k (i) =1/n, i=1 … n, and performing an accumulated summation from 1 to k _max;

Step S7), selecting a structure from the feature pool according to the weight W _k (i) of the sample set D, and forming a two-layer decision tree C _k;

Step S8), calculating training errors, wherein the errors of C _k when the weight W _k (i) of the sample set D is calculated;

step S9), calculating the weight of the weak classifier, and updating the weight of the sample;

step S10) returns the weak classifier C _k and its corresponding weights, combining the weak classifiers into a strong classifier.

In the present invention, preferably, the method for screening candidate frames includes:

Step T1), extracting a candidate frame list W, and sorting the candidate frame list W according to the score;

Step T2) initializing a filtered candidate box list W _l to be empty;

step T3) judging whether l (W) >0 is true, if so, executing an nms algorithm to screen candidate frames; if not, return to step T2).

In the present invention, preferably, the nms algorithm includes:

Step U1), selecting a first candidate frame W in a candidate frame list W, adding a score s into the candidate frame list W _l, and removing the candidate frame from the candidate frame list W;

step U2), calculating the coincidence ratio of the first candidate frame w and the divided first candidate frame w _i;

step U3) removes all candidate boxes in the candidate box list W that overlap with the first candidate box W by more than a threshold of 0.3.

In the present invention, preferably, after eliminating a candidate frame with low confidence coefficient, the method for correcting the coordinate information of the candidate frame includes: defining a three-dimensional coordinate correction vectorGiven a candidate frame (x, y, w, h), the upper left corner coordinates (x, y), the candidate frame length and width (w, h), the candidate frame after coordinate correction is:

Confidence vector { c ₁,c₂,...,c_n}I(c_n > t), 1 is taken when (c _n > t), otherwise 0 is taken: and taking the average value of the correction modes larger than the threshold t as the coordinate information of the corrected candidate frame.

In the present invention, preferably, in the step 2), the same character is sent to the classification network for classification multiple times by adopting a random clipping mode.

In the present invention, preferably, the sample size of the second layer network is larger than the sample size of the first layer network, and the sample size of the third layer network is larger than the sample size of the second layer network.

In the invention, preferably, the size of the input tongue picture sample picture needs to be subjected to multi-scale scaling, and the multi-scale sample test can effectively improve the robustness and the accuracy of the classifier.

The invention has the advantages and positive effects that: the deep learning network in the algorithm is composed of three small convolutional neural networks connected in series, and a classifier with better classifying effect is obtained through cascade of weak classifiers to detect tongue images. Firstly, randomly cutting a marked tongue picture sample picture to obtain positive and negative samples with classification labels, putting the positive and negative samples into a first layer network, extracting, classifying and judging the characteristics of the input samples, storing the coordinate information of the sample which is identified as positive in an original picture as candidate frame information, rejecting the candidate frame with lower confidence level into a second layer network for further training through difficult case identification operation, wherein the size of the second layer sample is larger than that of the first layer sample, carrying more picture information for training, obtaining more accurately classified candidate frames, reducing the number of the candidate frames into a third layer through confidence level screening, and obtaining the coordinate information of the classification labels and frames after training is completed, thereby realizing the function of detecting tongue pictures and accurately positioning; the method can realize the judgment of whether the tongue image exists or not and the framing of the tongue image interested area for the sample with complex leading-in background and complex picture structure, the accuracy of color tongue image detection in actual detection is higher than 90%, and the accurate positioning is higher than 80%.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of a sample image of a tongue image detection and localization algorithm based on a lightweight cascaded neural network of the present invention;

Fig. 2 is a schematic diagram of feature extraction classification judgment of tongue image detection and positioning algorithm based on lightweight cascade neural network.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When a component is considered to be "connected" to another component, it can be directly connected to the other component or intervening components may also be present. When an element is referred to as being "disposed on" another element, it can be directly on the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

As shown in fig. 1 and 2, the present invention provides a tongue image detection and positioning algorithm based on a lightweight cascade neural network, comprising:

Step 1) labeling the coordinate position of a target area of a tongue photo sample picture;

In this embodiment, further, the step 3) is performed to screen the candidate frames and perform frame regression before the step 4) is performed to the second layer network and the step 4) is performed to the third layer network, so as to reject the candidate frames with low confidence.

In this embodiment, further, the step 2) performs distinction by using the iou value to implement positive and negative local sample classification, where the iou value is defined as a ratio of an intersection area of the segmented region and the target region to an intersection area, and a larger iou value corresponds to a better object segmentation, and for a group of tongue photo sample pictures, a mean value of the iou value of each tongue photo sample picture is adopted as an iou segmentation result of the group. Given a group of tongue picture sample pictures, the experiment adopts the iou average value of all the tongue picture sample pictures to represent the iou value of the group of tongue picture sample pictures. Successful segmentation corresponds to a larger iou value, whereas the iou value is smaller.

In this embodiment, further, the algorithm steps of the training in the step 4) and the step 5) include:

In this embodiment, further, the method for screening the candidate frame includes:

Step T2) initializing a filtered candidate box list W _l to be empty;

In this embodiment, further, the nms algorithm includes:

In this embodiment, further, after eliminating the candidate frame with low confidence coefficient, the method for correcting the coordinate information of the candidate frame includes: defining a three-dimensional coordinate correction vectorGiven a candidate frame (x, y, w, h), the upper left corner coordinates (x, y), the candidate frame length and width (w, h), the candidate frame after coordinate correction is:

Confidence vector { c1, c2,.,. Cn } I (cn > t), takes 1 when (cn > t), otherwise takes 0: and taking the average value of the correction modes larger than the threshold t as the coordinate information of the corrected candidate frame.

In this embodiment, further, in the step 2), the same character is sent to the classification network for classification multiple times by adopting a random clipping manner. The random cutting mode can effectively improve the classification effect of the difficult tongue picture sample pictures, and the difficult tongue picture sample pictures, namely the tongue picture sample pictures which are locally similar, can be distinguished by paying attention to local information. Therefore, the random clipping mode enables the classifier to learn global information, pay attention to local information, and is beneficial to improving the effect of the classifier.

In this embodiment, further, the sample size of the second layer network is larger than the sample size of the first layer network, and the sample size of the third layer network is larger than the sample size of the second layer network.

In this embodiment, further, the size of the input tongue picture sample needs to be scaled in multiple scales, and the multi-scale sample test can effectively improve the robustness and accuracy of the classifier.

The working principle and working process of the invention are as follows: in the algorithm, a deep learning network is cascaded by weak classifiers obtained by three small convolutional neural networks to obtain a classifier with better classification effect. Firstly, randomly cutting marked tongue picture sample pictures to obtain positive and negative samples with classification labels, putting the positive and negative samples into a first layer network, extracting, classifying and judging the characteristics of the input samples, storing the coordinate information of the sample which is identified as positive in an original picture as candidate frame information, rejecting the candidate frames with lower confidence level into a second layer network for further training through difficult case identification operation, wherein the size of the second layer sample is larger than that of the first layer sample, carrying more picture information for training, obtaining more accurately classified candidate frames, reducing the number of the candidate frames into a third layer through confidence level screening, and obtaining the coordinate information of the classification labels and frames through the training.

The thinking of the algorithm for feature selection is that a strong classifier with better effect is obtained through combination of weak classifiers, iterative training is performed during training, each round of training can be self-adaptive, the weight of samples with the wrong classifier in the previous round of iteration is increased, a new weak classifier is trained by taking the weight as a reference, a set of weak classifiers is added, g (x) is calculated during testing to judge whether the g (x) is larger than a given threshold value theta, positive and negative samples are classified, and the training specifically comprises the following steps: preparing a tongue picture sample picture scaled to a window size as a positive sample; sampling a negative sample on a tongue picture sample picture without a positive sample (without a tongue image) to calculate values of 10 integration channels of the positive and negative samples; randomly generating a feature pool F, and calculating the error rate of a sampled sample; initializing a sample set D, and enabling the maximum iteration number k _max; initializing the weight W _k (i) =1/n, i=1 … n of the iteration, and performing accumulated summation from 1 to k _max; selecting a structure from the feature pool according to the weight W _k (i) of the sample set D, and forming a two-layer decision tree C _k; calculating a training error, C _k, an error in the weight W _k (i) of the sample set D; calculating the weight of the weak classifier and updating the weight of the sample; returning the weak classifier C _k and the weight corresponding to the weak classifier C _k, and combining the weak classifier into a strong classifier. The algorithm is very sensitive to abnormality and noise, if the labeling data has false labeling, the influence on the training model is easy to be caused, and in order to alleviate the influence, the labeling of the whole data set is rechecked, and the labeling data is corrected; in addition, the algorithm that the weak classifier obtained through the three small convolutional neural networks is cascaded to obtain the classifier with better classifying effect is adopted, and the used convolutional neural network is smaller, and candidate frames are filtered layer by layer through 3 series networks with different sizes in a cascading mode, so that the speed is ensured, and meanwhile, the model precision is high.

The foregoing describes the embodiments of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by this patent.

Claims

1. A tongue image detection and positioning algorithm based on a lightweight cascade neural network is characterized by comprising the following steps:

Step 4) training the sample pictures in the step 3) to obtain classified candidate frames, outputting coordinate information of the candidate frames classified as positive, correspondingly cutting the original picture to obtain carefully selected samples, and scaling the samples to a size larger than that of the step 3) to input the samples into a third-layer network;

Wherein the sample size of the second layer network is greater than the sample size of the first layer network, and the sample size of the third layer network is greater than the sample size of the second layer network;

Step 5) training the sample pictures in the step 4), and obtaining coordinate information of the classification labels and the candidate frames after training;

the algorithm steps of the training of the step 4) and the step 5) comprise:

step S2) sampling a negative sample on a tongue picture sample picture without a positive sample and a tongue picture;

2. The tongue image detection and positioning algorithm based on the lightweight cascading neural network according to claim 1, wherein the step 3) is performed before the second layer network is input and the step 4) is performed on the third layer network, and frame regression is performed on the candidate frames, so that the candidate frames with low confidence coefficient are removed.

3. The tongue image detection and positioning algorithm based on the lightweight cascade neural network according to claim 1, wherein the step 2) is characterized in that positive and negative local sample classification is realized by distinguishing through the iou value.

4. The tongue detection and localization algorithm based on a lightweight cascaded neural network of claim 2, wherein the method for screening candidate frames comprises:

Step T2) initializing a filtered candidate box list W _l to be empty;

5. The tongue detection and localization algorithm based on a lightweight cascaded neural network of claim 4, wherein the nms algorithm comprises:

6. The tongue image detection and positioning algorithm based on the lightweight cascade neural network according to claim 1, wherein the step 2) adopts a random clipping mode to send the same character into the classification network for classification.

7. The tongue image detection and positioning algorithm based on the lightweight cascade neural network as claimed in claim 1, wherein the size of the input tongue image sample picture needs to be scaled in multiple scales, and the multi-scale sample test can effectively improve the robustness and accuracy of the classifier.