Marking

ORIGINAL RESEARCH
published: 12 April 2022

doi: 10.3389/fphys.2022.847267
Weakly Supervised Deep Learning for

Tooth-Marked Tongue Recognition
Jianguo Zhou 1†, Shangxuan Li 1†, Xuesong Wang 2, Zizhu Yang 1, Xinyuan Hou 1, Wei Lai 3,
Shifeng Zhao 2, Qingqiong Deng 2* and Wu Zhou 1*
1
School of Medical Information Engineering, Guangzhou University of Chinese Medicine, Guangzhou, China, 2School of Artificial
Intelligence, Beijing Normal University, Beijing, China, 3Beijing Yikang Medical Technology Co., Ltd., Beijing, China
The recognition of tooth-marked tongues has important value for clinical diagnosis of
Edited by:
Eun Bo Shim, traditional Chinese medicine. Tooth-marked tongue is often related to spleen deficiency,
Kangwon National University, cold dampness, sputum, effusion, and blood stasis. The clinical manifestations of patients
South Korea
with tooth-marked tongue include loss of appetite, borborygmus, gastric distention, and
Reviewed by:
Hee-Jeong Jin,
loose stool. Traditional clinical tooth-marked tongue recognition is conducted subjectively
Korea Institute of Oriental Medicine based on the doctor’s visual observation, and its performance is affected by the doctor’s
(KIOM), South Korea
subjectivity, experience, and environmental lighting changes. In addition, the tooth marks
Bob Zhang,
University of Macau, China typically have various shapes and colors on the tongue, which make it very challenging for
*Correspondence: doctors to identify tooth marks. The existing methods based on deep learning have made
Qingqiong Deng great progress for tooth-marked tongue recognition, but there are still shortcomings such
qqdeng@bnu.edu.cn
Wu Zhou
as requiring a large amount of manual labeling of tooth marks, inability to detect and locate
zhouwu@gzucm.edu.cn the tooth marks, and not conducive to clinical diagnosis and interpretation. In this study,
†
These authors have contributed we propose an end-to-end deep neural network for tooth-marked tongue recognition
equally to this work and share first
based on weakly supervised learning. Note that the deep neural network only requires
authorship
image-level annotations of tooth-marked or non-tooth marked tongues. In this method, a
Specialty section: deep neural network is trained to classify tooth-marked tongues with the image-level
This article was submitted to annotations. Then, a weakly supervised tooth-mark detection network (WSTDN) as an
Computational Physiology and
Medicine, architecture variant of the pre-trained deep neural network is proposed for the tooth-
a section of the journal marked region detection. Finally, the WSTDN is re-trained and fine-tuned using only the
Frontiers in Physiology
image-level annotations to simultaneously realize the classification of the tooth-marked
Received: 04 January 2022
tongue and the positioning of the tooth-marked region. Experimental results of clinical
Accepted: 08 March 2022
Published: 12 April 2022 tongue images demonstrate the superiority of the proposed method compared with
Citation: previously reported deep learning methods for tooth-marked tongue recognition. The
Zhou J, Li S, Wang X, Yang Z, Hou X, proposed tooth-marked tongue recognition model may provide important syndrome
Lai W, Zhao S, Deng Q and Zhou W
(2022) Weakly Supervised Deep diagnosis and efficacy evaluation methods, and contribute to the understanding of
Learning for Tooth-Marked ethnopharmacological mechanisms.
Tongue Recognition.
Front. Physiol. 13:847267. Keywords: traditional Chinese medicine, tooth-marked tongue, deep learning, weakly supervised learning, tongue
doi: 10.3389/fphys.2022.847267 diagnosis, convolutional neural network
Frontiers in Physiology | www.frontiersin.org 1 April 2022 | Volume 13 | Article 847267

Zhou et al. Weakly Supervised Tooth-Marked Tongue Recognition
1 INTRODUCTION marked tongue generally extracts the entire tongue image, and
then directly classifies the tooth-marked tongue based on CNN
Tongue diagnosis is one of the most important diagnostic methods (Sun et al., 2019; Wang et al., 2020). Specifically, Sun et al. (2019)
in Chinese medicine. The characteristics of a tongue, such as shape proposed a 7-layer CNN model that takes the tongue image as the
and color, can reflect the internal health of the body, and the input to identify the tooth-marked tongue, with an accuracy of
severity or progression of the disease. By observing the 78.6%. Wang et al. (2020) classified the tooth-marked tongue
characteristics of a tongue, Chinese medicine can distinguish the based on a deeper CNN network, and showed that their method
clinical symptoms and choose appropriate treatment strategies could achieve promising results of the tooth-marked tongue
(Zhang et al., 2017). As one of the most important tongue recognition. However, since the center, tip, and base of the
features, tooth marks are generally formed by the compression tongue are not informative for the identification of the tooth-
of the fatter tongue by adjacent teeth. Figure 1 shows some marked tongue, incorporating these non-informative areas into
representative tooth-marked and non-tooth marked tongue the deep neural network for analysis may have a negative impact
images. Typically, tooth-marked tongue refers to a kind of on the performance of the model (Chong Wang et al., 2015). In
abnormal tongue shape in which the tongue body is fat in addition, such tooth-marked tongue classification only provides
different degrees and is compressed by the teeth, and the edge the image-level identification of the tooth-marked tongues, and
of the tongue body is formed with tooth marks that are serrated. does not provide the specific location of the tooth marks, which is
Non-tooth-marked tongue tends to be moderately fat and thin, and not conducive to assisting clinical diagnosis and interpretability.
the edges of the tongue are continuous and smooth. According to Since tooth marks are one of the symptoms of a tongue, the
the theory of traditional Chinese medicine, tooth-marked tongue is identification of tooth-marked tongues has been regarded as a
often related to spleen deficiency, cold dampness, sputum, effusion, fine-grained classification problem, and the classification of
and blood stasis. The clinical manifestations of patients with tooth- tooth-marked tongue can be conducted by multiple instance
marked tongue include loss of appetite, borborygmus, gastric learning (Li et al., 2019). Specifically, multiple candidates of
distention, and loose stool (Li and Dong, 2017). Hence, the tooth-marked areas are generated on the tongue body. If all
recognition of tooth-marked tongues has important value for candidates are non-tooth marks, it is a non-tooth marked tongue.
clinical diagnosis of Chinese medicine. However, the routine On the contrary, if one candidate on the tongue is a tooth mark, it is
clinical recognition of tooth-marked tongues is through the considered as a tooth-marked tongue. Li et al. (2019) first used
doctor’s visual observation, and its performance is limited by concavity information to generate candidates of tooth-marked areas,
the doctor’s subjectivity and experience, in addition to followed by extracting CNN deep features from these areas, and
environmental lighting changes. In addition, there are different finally classified the features based on a multi-instance support vector
types of tooth marks, including different colors and varied shapes, machine (MiSVM) to classify the tooth-marked tongue. Although
which make it challenging for doctors to identify. Therefore, the such pioneer work has obtained promising results for the tooth-
study of objective tooth-marked tongue recognition based on marked tongue recognition, it has several shortcomings. First, it uses
image data has important clinical value. concavity information to generate candidate areas of the tongue, non-
In recent years, researchers have been trying to establish an tooth marked tongues rarely have such concavity information, which
objective tooth-marked tongue recognition model based on digital makes it difficult to achieve a unified generation of candidate
image processing and analysis. Most studies are based on the local examples of tooth-marked and non-tooth marked tongues. Then,
color and unevenness of the tooth scar area. Hsu et al. (2010) it requires a large amount of tooth mark examples for feature
analyzed the RGB color composition based on the image of the extraction, which will bring a lot of labor costs. Furthermore,
tongue region and found that the G chromatogram of the tooth deep features are extracted through the CNN model for each
marks is lower than that of the tongue body and tongue surface. Lo candidate tooth-marked region, which will also bring a lot of
et al. (2012) found that different imaging angles or the degree of computational cost. Finally, this process is not an end-to-end
tongue extrusion would affect the judgment of tooth marks. Li et al. deep neural network and cannot provide a discriminative location
(2019) used concavity information to generate suspicious tooth- of the tooth-marked area, which is not conducive to assisting clinical
marked areas for the following classification of the tooth-marked diagnosis and interpretability. Recently, Tang et al. (2020) proposed a
tongue. However, due to the great color difference of tongues and tongue region detection and tongue landmark detection via deep
varied shape of tooth marks, the recognition based on color and learning for tooth-marked tongue recognition. However, it requires a
shape of tooth marks usually has low robustness and stability. lot of image annotation including tongue landmark annotation and
With the continuous development of artificial intelligence and tongue region annotation, which is a huge burden and tedious work
deep learning, the convolutional neural network (CNN) model is for clinic. Weng et al. (2021) proposed a weakly supervised tooth-
applied to tongue analysis. Xu et al. (2020) used multi-task mark detection method using the YOLO object detection model.
learning with deep learning to realize tongue segmentation However, it requires fully bounding-box level annotation of tooth
and tongue coating classification, and achieved better results marks in addition to coarse image-level annotation of tooth-marked
than single task. Jiang et al. (2021) proposed toassess the tongue images. These are very tedious work for clinical application.
tongue image quality based on a deep CNN. Hu et al. (2021) In this study, a weakly supervised object detection using deep
proposed a method for automatic construction of Chinese herbal learning is proposed for the tooth-marked tongue recognition,
prescriptions from tongue images using CNN and auxiliary latent where only image-level labels are used for model training. The
therapy topics. Typically, current identification of the tooth- proposed method is motivated by the work of weakly supervised

deep detection network in computer vision (Bilen and Vedaldi, Shanghai Daoshi Medical Technology Co., Ltd. (DS01-B) to
2016), in which a CNN pre-trained for image classification on a large obtain tongue images from patients in the local institute.
dataset ImageNet is modified to reason efficiently about regions, Then, we transferred the images to a workstation for clinical
branching off a recognition, and a detection data streams. The evaluation. Three Traditional Chinese Medicine (TCM)
resulting architecture can be fine-tuned on a target dataset to physicians with two to 5 years clinical experience distinguished
achieve state-of-the-art weakly supervised object detection using tongue images into tooth-marked tongue or non-tooth marked
only image-level annotations. Based on this consideration, the tongue. All professionals are well-trained and have normal vision.
tooth-marked tongue recognition can be naturally solved from the The TCM clinical criteria for diagnosing tooth-marked tongues
perspective of weakly supervised object detection for two reasons. are as follows: First, observe whether there are jagged tooth marks
First, the detection and localization of tooth-marked areas are caused by teeth pressing on the tongue on both sides of the
conducive to assisting clinical diagnosis and interpretability. tongue; secondly, for the tongue with inconspicuous jagged tooth
Second, only image-level annotation without tooth marks labeling marks, observe the color depth of the suspected area, in which the
can significantly reduce the cost of data annotation. In addition, Zhou compressed tooth scar area typically has a darker color (Tang
et al. proposed that even when there is no supervision of the target et al., 2020). The detailed evaluation procedure of this study
position, the convolution unit of the convolution layer can be consists of three steps. First, professionals discussed and
regarded as the target detector (Zhou et al., 2015). Therefore, the acknowledged the diagnostic criteria for tooth-marked tongue.
classification network of tooth-marked tongues with only image-level Secondly, a professional classified all 330 tongue images for
annotations makes it possible to locate tooth marks without identifying tooth-marked tongues, and two other professionals
providing tooth mark annotations. reviewed the classification results separately. In the case of
To this end, we propose an end-to-end deep neural network disagreement, three professionals would discuss and make a
for tooth-marked tongue recognition based on weakly supervised final decision. By dividing the number of inconsistent samples
learning. To improve the efficiency and reliability of the reviewed by experts by the total number of samples, the
generation of the candidate tooth-marked areas, we use the inconsistency rate among experts is 14.8%. The main reasons
prior knowledge of the tooth marks distribution to generate for the inconsistency are the shadows caused by the influence of
the candidate tooth-marked areas through the position the light and the inconspicuous tooth marks on the tongue.
information. In addition, to avoid the labeling of a large However, in the second judgment after the discussion,
number of examples of tooth marks for deep learning, we opinions often reach an agreement. Let experts generate
propose a weakly supervised learning method for tooth- inconsistent samples, which are difficult samples. Removing
marked tongue recognition. Specifically, we first train a deep these difficult samples from the training data reduces the
neural network model to classify tooth-marked tongues with the generalization performance of the model because it only
image-level annotations, and then we propose a weakly recognizes samples with obvious tooth marks. The data set after
supervised tooth-mark detection network (WSTDN) as an clinical screening contains 130 tooth-marked tongue images and
architecture variant of the pre-trained deep neural network for 200 non-tooth marked images. It should be noted that in this study,
the tooth-marked region detection, followed by fine-tuning the the clinic only needs to provide image-level labels for the tongue
WSTDN once again using only the image-level annotations to image samples as tooth-marked or non-tooth marked tongues, and
simultaneously realize the classification of the tooth-marked there is no need to provide the specific location and bounding
tongue and the positioning of the tooth-marked region. boxes annotations of tooth-marks on the tongue. Figure 2 shows
Compared to the existing works, the main contributions of the an overall pipeline for the proposed method.
present work are summarized as follows: 1) we propose an end-
to-end deep neural network for tooth-marked tongue recognition 2.2 Data Preprocessing
based on weakly supervised learning, avoiding manual labeling, First, we used the Labelme software (http://labelme.csail.mit.edu/
and screening of a large number of tooth-marked examples; 2) we Release3.0/) to outline the tongue area, and then performed the
propose a novel method for generating candidate regions based AND operation on this area with the original image to extract the
on prior knowledge of tooth mark distribution to improve the image of the entire tongue area. The purpose of extracting the
efficiency of tongue tooth-marked candidate region generation; tongue region was to shield the irrelevant face and the
3) in the case of only image-level labels, we propose the WSTDN interference of the surrounding background of the tongue, so
to realize the classification of the tooth-marked tongue and the as not to affect the recognition performance of the model. The
positioning of the tooth-marked area at the same time, which is delineated tongue images were resized to 224 × 224 before
convenient for assisting clinical diagnosis and interpretation. entering the network, which were used in training the deep
neural network. We adopted a data augmentation method of
random horizontal inversion, random rotation of 0–15°, and
2 MATERIALS AND METHODS random vertical inversion for the tongue image in order to
obtain more training data for training the deep neural network.
2.1 Clinical Data
The study was approved by the local ethics committee, and the 2.3 The Proposed Framework
patient signed the informed consent form (IRB: The proposed method mainly includes three stages, as shown in
2019BZHYLL0101). We used standard equipment designed by Figure 3. First, in the pre-trained CNN module, we pre-train a

FIGURE 1 | Representative tongue images. (A) Tooth-marked tongue with a very obvious contour distortion along both sides of the tongue, accompanied by the
color change of the extruded area of the tongue; (B) Non-tooth marked tongue. The tongue body is flat, theperipheral contour is regular, and there is no contour
distortion and color change area; (C) Suspicious tooth-marked tongue. The identification is controversial because the tongue body is flat and the peripheral contour is not
distorted. Finally, the tooth-marked tongue is determined by the color change of the extruded area of the tongue edge.
FIGURE 2 | Overview of the construction of the dataset and the main processing procedures of the proposed method. (A) Tongue images were captured with
standard equipment. (B) Classification of tooth-marked and non-tooth-marked samples to construct the original tongue image dataset. (C) Tongue region was
delineated to construct a tongue image dataset. (D) Pre-trained CNN model, fine-tuned WSTDN model with image-level labeled data, and image-level results output.
(E) Performance with validation metrics.
CNN model with the weight initialization of ImageNet using 2.3.1 Pre-Trained Network
image-level annotations to distinguish between tooth-marked Our study is based on the premise that pre-trained CNN can be
and non-tooth marked tongues. Subsequently, we propose the well generalized to a large number of tasks, as there is evidence
WSTDN that uses the pre-trained CNN model as the backbone that CNNs trained for image classification can bring proxies to
and add the spatial region proposal (SRP), spatial pyramid pool object detection (Zhou et al., 2015). It is worth noting that these
(SPP), Classification module, and Detection module to achieve concepts are obtained implicitly without providing the network
the weakly supervised tooth-marked tongue recognition. Finally, with information about the location of these structures in the
we fine-tune the WSTDN with only the image-level annotations, image. Correspondingly, the CNN trained in tongue image
simultaneously realizing the classification of the tooth-marked classification may already implicitly contain most of the
tongue and the positioning of the tooth-mark area. Each module information needed to perform tooth-marked area detection.
will be introduced in the following subsections. Therefore, we propose to train a CNN with the training data

output have different dimensions. Solid line ⊕ is calculated as

y f(x) + X, and the dotted line ⊕ is calculated as
y f(x) + Wx, where f(x) represents the feature calculated by
the yellow matrix, and W is the convolution operation, which is
used to adjust the channel dimension of X. ReLU (Nair and
Hinton, 2010) was used as the activation function. The pre-
trained ResNet34 model will be used as the backbone to build the
proposed WSTDN for the weakly supervised tooth-marked
tongue recognition in the following section.
2.3.2 Weakly Supervised Tooth-Marked Tongue

Detection Network
In order to achieve the objective of weakly supervised tooth-
marked tongue recognition, we have made certain improvements
based on the pre-trained ResNet34 model. First, we removed the
avgpool layer and fc layer behind the last BN layer in ResNet34
(that is, the classifier layer, which is only used for feature
extraction), and replaced it with a spatial pyramid pool (SPP)
FIGURE 3 | Framework of the proposed method. It includes a pre-
(He et al., 2014). We implemented SPP as a network layer
trained CNN module; an SRP module for generating tooth mark candidate (Girshick, 2015) to allow the system to be trained end-to-end
regions; an SPP-layer for obtaining and normalizing the deep features of and improve efficiency. By introducing SPP as the network layer,
tooth mark candidate regions; two weakly supervised branches, a we only need the original tongue image to pass through the CNN
classification module and a detection module; The hadamard operation of the network, and we can get a deep feature of (batchsize, 7, 7, 512). As
branch results and summation yields the image-level classification results.
shown in the following Figure 5 of the SPP layer, the candidate
area is mapped to find the corresponding candidate feature area
of tongue images and only image-level supervision (no bounding on the 7 × 7 feature map. If the size of the candidate area is 32 ×
box annotations) for further tooth-marks detection. Note that the 32, from the tongue image to the deep feature, a candidate area
CNN has been pre-trained on ImageNet ILSVRC 2012 data takes at least 1 × 1 grid feature and at most 2 × 2 grid features. The
(Russakovsky et al., 2015). In this study, we use the ResNet34 SPP network layer stretches candidate feature regions of different
network (He et al., 2016), which is consistent with the previous sizes to the same size, and then inputs them to the fully connected
study that has proved that ResNet34 is superior to other typical layer, so that feature maps are calculated first, and the results of
CNN models in tongue image classification (Wang et al., 2020). the feature maps can be shared when each candidate region is
As shown in Figure 4, the structure of ResNet34 is shown in represented, saving a lot of calculation time. At this time, in the
the green dotted box, in which the two convolution layers are a network structure, the regional-level features are further
group, and the residual calculation is conducted in the shortcut processed by two fully connected layers, and each layer
connection block as shown in the red dotted box. The size of contains a linear map and an activation function ReLU.
tongue images and deep features are expressed as (batch size, Inspired by the previous research on the weakly supervised
width, height, and channel). The solid line residual arrow detection network (Bilen and Vedaldi, 2016), we branched out
indicates that the input and output have the same dimension, from the output of the last layer of the SPP layer into two
and the dotted line residual arrow indicates that the input and modules, a classification module and a detection module.
FIGURE 4 | SPP layer. The green cuboid represents the deep feature of the tooth mark candidate area, and R represents the number of tooth mark candidate
areas. It transforms the (h, w, 512) features of the R tooth mark candidate regions into a unified (R, 4096) size, h and w separately represent the height and width of the
tooth mark candidate region.

FIGURE 5 | Architecture of the ResNet34 model. The size of tongue images and deep features are expressed as (batch size, width, height and channel). The solid
line residual arrow indicates that the input and output have the same dimension, and the dotted line residual arrow indicates that the input and output have different
dimensions.
2.3.3 Candidate Region Generation Method Based on generally the base and tip of the tongue, which are not in
Location Information (Spatial Region Proposal, SRP) the range of the tooth-marked tongue detection area. Finally,
The selection of tooth-marked candidate area is of great we obtain the candidate tooth-marked area on the color tongue
significance in tooth-marked tongue recognition. In order to image (Figure 6G).
generate candidate regions to use with our proposed network,
we propose a novel method to select candidate areas with 2.3.4 Classification Module
simple equidistant frames on both sides of the tongue. The As shown in the Classification module in Figure 3, in order to
proposed method comes from the doctor’s clinical discriminate the tooth-marked category of each candidate area,
observation. When the doctor judges whether it is a tooth- we make a linear mapping to the classification branch, and the
marked tongue, the main focus is on the areas of both sides of output of this mapping is the category number C. Its definition is
the tongue. This method avoids the large-area overlap of the as follows (Eq. 1):
candidate areas, and it is also simple and efficient to unify the C
exij
selection of the candidate area of the tooth-marked tongue and [σ class (xc )]ij xC
(1)
the non-tooth marked tongue. h candidate region is Ck1 e kj
represented, saving a lot of calculation time. where xC ∈ RC×|R| is the predicted scores on all classes in a
Figure 6 showed the process of candidate region generation certain area. Specifically, we calculate the index sum of C
method based on location information. First, we convert the categories of the same region box, and then divide the current
tongue image (Figure 6A) into a grayscale image (Figure 6B), element by the value (in our case, the number of categories
because we only need position information, so we do not need C 2. Conducting the corresponding softmax transformation
to consider color information. Converting into a grayscale on the data in each column of xc , which is equivalent to
image can greatly reduce our candidate region generation time. calculate the probability of tooth marks or non-tooth marks in
We fall from the top to the left side of the tongue, traverse from a certain area.
left to middle, and get the first non-zero point, which is
recorded as the midpoint. We save the minimum x, 2.3.5 Detection Module
minimum y, width, and height of the candidate region As shown in the Detection module in Figure 3, in order to obtain
according to the midpoint (Figure 6C). Then, we continue the scores of a certain class in all candidate regions, we make a
to traverse downward at equal intervals. This interval is set to linear mapping to the detection branch, and the mapping output
2/3 of the size of the candidate area, and the tongue is generally is also the number of classes C. Its definition is as follows Eq. 2:
curved. Such a curvature can also give our candidate frame a
d
certain horizontal displacement (Figure 6D). The right side of exij
the tongue is from top to bottom, from right to middle, using σ det xd ij d (2)
Rk1 exik
the same method to select candidate regions. We removed the
first and last candidate regions on the left and right sides where xd ∈ RC×|R| is the predicted scores on all regions of a
(Figures 6E,F), because these two candidate regions are certain category. Specifically, for the same category, we calculate

TABLE 1 | Performance comparison of different methods for tooth-marked tongue recognition.
Backbone Methods Accuracy Precision Recall F1-score
Resnet34 Image (Wang et al., 2020) 0.7228 ± 0.0147 0.7344 ± 0.1025 0.5091 ± 0.1181 0.5863 ± 0.0678
Image (IPW) (Wang et al., 2020) 0.7940 ± 0.0442 0.8188 ± 0.8000 0.6275 ± 0.1574 0.6961 ± 0.1020
Instance (Li et al., 2019) 0.9034 ± 0.0227 0.9185 ± 0.0344 0.8294 ± 0.3922 0.8711 ± 0.0298
instance_MiSVM (Li et al., 2019) 0.9349 ± 0.0255 0.9332 ± 0.0349 0.9002 ± 0.0550 0.9156 ± 0.0340
WSTDN 0.9197 ± 0.0759 0.8745 ± 0.1087 0.9427 ± 0.1197 0.9026 ± 0.0954
the score of the current element relative to different region 3 EXPERIMENTAL RESULTS
boxes. Corresponding softmax transformation is performed on
the data of each row of xd , which is equivalent to the score 3.1 Experimental Settings, Evaluation
probability of a certain class in all regions. Metrics, and Comparison Methods
We divided 330 tongue images into a training set and a
2.3.6 Image-Level Classification Score validation set, using 5 times four folded cross validation.
Since there is no real tooth-marked area and instance-level The performance of the model is evaluated by calculating
category labels for supervision, in the two branches of our the average value and variance of the evaluation metrics.
model, the classification module predicts the tooth-marked The experimental results are evaluated by the following four
category in a certain area, and the detection module selects metrics: (Eq. 4) Accuracy, (Eq. 5) Precision, (Eq. 6) Recall,
which areas are more likely to contain the tooth-marked area. (Eq. 7) F1 score.
Therefore, the final score for each area is obtained by taking the
product of the two score matrices (Hadamard) TP + TN
Accuracy (4)
xRcr σclass (xc ) × σdet (xd ). The score xR is summed to obtain TP + TN + FP + FN
the final classification score yc, which can be defined as TP
Precision (5)
follows Eq. 3: TP + FP
TP
|R| Recall (6)
yc xRcr (3) TP + FN
Precision × Recall
r1
F1 Score 2 × (7)
Precision + Recall
It is worth noting that yc is the sum of the product of the
elements of the softmax standardized value of the area |R|, so it is in where TP, FP, TN, FN represent true positive, false positive,
the range of (0, 1). Finally, we use cross-entropy loss to calculate the true negative, and false negative, respectively. Accuracy is the
loss of our predicted yc and the original image-level label. proportion of the sum of positive and negative cases correctly
classified to all samples. Precision is the proportion of positive
2.4 Implementation cases correctly classified to all positive cases predicted by the
The proposed weakly supervised tooth-marked tongue model. Recall is the proportion of positive cases correctly
detection model was implemented using “PyTorch” predicted by the model to all positive samples. F1 score is
(pytorch.org) and the Adam algorithm was used to the balance index used to measure the accuracy of the
minimize the objective function. Data augmentation was classification model, which takes into account both
implemented with torchvision and image augmentation precision and recall of the model and can be regarded as a
(github.com/aleju/imgaug). We used a NVIDIA TITAN harmonic average of model precision and recall. When the cost
RTX graphics card with 24G memory. The initialization of of false negative (FN) is very high (the consequences are very
the learning rate was set to 1e-4 and the weight decay was set to serious), and it needs to reduce FN as much as possible, so as to
1e-4, batchsize was set to 32. The performance metrics of the improve the recall index. Clinically, patients with tooth-
computer are as follows: CPU is Intel(R) Xeon(R) Gold 5118. marked tongues should be recognized for further treatment,
RAM is 64.0 GB. GPU is NVIDIA TITAN RTX. Since the so we want the model to have a higher recall value under
computation time per image is too short and inconsistent, we similar accuracy conditions.
selected 20 tongue images, in which there are 10 tooth-marked In addition, we output the candidate boxes whose scores of
images and 10 non-tooth-marked images, and the total time the model’s candidate box σdet are greater than (1/the number
for generating the tooth mark candidate area is calculated to of candidate boxes) for visual observation. Finally, we
obtain the time of each picture. To alleviate the problem of compared the proposed method with the tooth-marked
overfitting, if the validation accuracy did not increase for tongue recognition model using multi-instance SVM (Li
10 epoch, an early stop was used to stop the optimization et al., 2019) and end-to-end convolutional network (Wang
and save the model weight. Basic implementation code of the et al., 2020). We also conducted different candidate region
work can be available in GitHub, https://github.com/Lsx0802/ generation methods (Uijlings et al., 2013; Li et al., 2019) for
WSTMD. comparison.

FIGURE 6 | Candidate region generation method based on location information. The arrow indicates the traversal direction. Take the left as an example, from top to
bottom, from left to right, and the right side is the same symmetrical operation. The red point is the first non-zero value point traversed, as the midpoint of the tooth mark
candidate area, (xmin, ymin) is the upper left corner of the tooth mark candidate area, h and w represent the tooth mark candidate area respectively height and width.
TABLE 2 | Performance comparison of different region proposal methods.
Backbone Methods Accuracy Precision Recall F1-score Time per

image
WSTMD Selective search (Uijlings et al., 2013) 0.8708 ± 0.0979 0.8522 ± 0.1215 0.8068 ± 0.1495 0.8300 ± 0.1312 0.30
SPR (ours) 0.9197 ± 0.0759 0.8745 ± 0.1087 0.9427 ± 0.1197 0.9026 ± 0.0954 0.19
TABLE 3 | Performance of ablation study in the proposed WSTDN method.
Backbone Methods Accuracy Precision Recall F1-score
WSTDN IW 0.7460 ± 0.0569 0.8403 ± 0.1120 0.4795 ± 0.2397 0.5661 ± 0.1795

TL 0.8848 ± 0.0477 0.8777 ± 0.0610 0.8254 ± 0.0778 0.8506 ± 0.0607
IW + TL 0.9197 ± 0.0759 0.8745 ± 0.1087 0.9427 ± 0.1197 0.9026 ± 0.0954
3.2 Performance Comparison of Different performance improvement, which may be due to our further
Methods extraction of the tongue tooth-marked informative area.
As tabulated in Table 1, the method of (Wang et al., 2020) is However, compared with (Li et al., 2019) using MiSVM
directly based on ResNet34 network and tongue images for classification, our performance is not better than it. The reason
classification, and the distinguishing accuracy can reach may be that after the instance is generated, they have
72.28%. After the weights initialization of ImageNet for the performed manual screening to achieve higher performance.
method of image (IPW), the accuracy can reach 79.40%. The On the other hand, compared to the softmax classifier in the
method of (Li et al., 2019) can achieve the accuracy of 90.34% by proposed end-to-end network, the classification performance
extracting instance and using ResNet34 to directly classify. Using of SVM may be better than using softmax classification in the
ResNet34 to extract deep features followed by using MiSVM absence of training data (Tang, 2013; Girshick, 2015).
classification in the method of instance MiSVM (Li et al., 2019) However, (Li et al., 2019) with SVM is not an end-to-end
can further improve the performance, reaching 93.49%. network, while the proposed method of optimizing the
Compared with the method of (Wang et al., 2020), the softmax classifier can simplify the training and test process
proposed method, and (Li et al., 2019) have achieved a large (Girshick, 2015).

FIGURE 7 | Generation of different candidate region proposal methods. (A) Selective search (Uijlings et al., 2013), (B) convex defect detection (Li et al., 2019), (C)
SPR (ours). Note that the yellow box is the candidate box of the tooth mark region generated by different methods.
3.3 Performance Assessment of Candidate than the method in (Uijlings et al., 2013), probably because
Region Proposal they use the color information of the three channels of RGB,
For the method of generating candidate regions, we while our method uses Gray-scale image, and the traversal
comparatively experimented with selective search (Uijlings method on the left and right sides reduces a lot of
et al., 2013), edges box (Zitnick and Dollár, 2014), convex traversal time.
defect detection (Li et al., 2019), and our method SRP. The
definition of a candidate box is that if the number of edge 3.4 Ablation Study
contours that are completely contained in a box, then the As shown in Table 3, IW represents using ImageNet weights to
target has a high probability in this box (Zitnick and Dollár, initialize the WSTDN model, and TL means using transfer
2014). However, under the condition of tooth-marked tongue learning to directly train a model for tongue image
classification, it is difficult to frame the tooth-marked area with classification, and then initialize the WSTDN model with its
the method of (Zitnick and Dollár, 2014), because the tooth- weights. IW + TL is the method we proposed, using ImageNet
marked area and the tongue are connected. The proposed weights to initialize the ResNet34 model, followed by using the
method and (Li et al., 2019) both use the prior knowledge of tongue images and image-level labels to train the ResNet34
tooth marks. According to observations, the tooth-marked model, and finally using its weights to initialize the WSTMD
tongue does have convex defects on the edge of the tongue. model. It can be observed that, when we use the IW method to
Convexity detection can be used to frame the area, but the non- initialize our tooth-marked detection model, the effect is not as
tooth marked tongue convexity area is not obvious, and it is good as using the TL method. The TL weight is learned from the
difficult to achieve the unity of tooth-marked and non-tooth tongue image and has a certain ability to discriminate the tongue
marked tongues. image, so this transfer learning method achieves a better effect.
Figure 7 shows the comparison of candidate regions of The reason why it is inferior to the IW + TL method may be that
different candidate region generation methods. The method the weights are initialized randomly, and there is no good
(Uijlings et al., 2013) in Figure 7A can generate a large number discrimination ability, which may lead to relatively lower
of candidate frames, but there are many invalid candidate performance.
regions, and the candidate regions overlap seriously. The
method (Li et al., 2019) in Figure 7B is very good in 3.5 Visualization
selecting the tooth-marked candidate areas, but the convex Figure 8 showed some examples of tooth-marked tongue
defect of the tongue tip will be detected, and the part of the recognition by the proposed method. As shown in
tongue tip is not the informative area for the identification of Figure 8(A1) and (C1), the edges on both sides of the tongue
the tooth mark, so it needs to be manually screened and are flat without tooth marks and there are physiological defects at
removed later. In our method in Figure 7C, it can the root and tip of the tongues, which are not tooth marks. The
efficiently select the informative area for the identification proposed model avoids these physiological defect areas well, and
of tooth marks. the areas on both sides of the tongue are identified correctly.
From the comparison of the results in Table 2, we can find Furthermore, as shown in Figure 8(B1), the tooth-marked tongue
that our SPR method is better than the method of (Uijlings has distinctive characteristics, including tooth marks and color
et al., 2013). It may be because our method selects the changes in the tooth pressure area. The proposed model can
information area for identification of tooth marks, rather identify them correctly. As shown in Figure 8(D1), some small
than invalid areas such as the tip of the tongue and the color difference changes that are not easy to recognize or easily
base of the tongue, and our method does not have the ignored by human eyes can be accurately identified by the model.
large-area overlap of the candidate frames in (Uijlings et al., The side indicated by arrow in Figure 8(E2) is more obvious than
2013). In addition, the time consuming of SRP is much less that indicated by the arrow in Figure 8(E1), but the focus area of

FIGURE 8 | Representative cases of tooth-marked tongue recognition by the proposed method. A2, B2, C2, D2, E2, and F2 are original tongue images, while A1,
B1, C1, D1, E1 and F1 are corresponding prediction results of tooth marks with bounding boxes.
the model is single, and only one side identification area is 4 DISCUSSION
concerned. Even if the typical tooth mark area indicated by
the arrow in Figure 8(E2) is not focused, the tooth-marked The characteristics of a tongue can reflect the internal health of
tongue is correctly identified by the identified area. In the body and the severity or progression of the disease in
Figure 8(F1), the model recognition is incorrect. The tongue traditional Chinese medicine. Traditional Chinese medicine
image is squeezed on the edge of the tongue due to the tension of can distinguish the clinical symptoms and choose appropriate
the tongue when the patient extends the tongue. The model treatment strategies. As one of the most important tongue
mistakenly recognizes it as a tooth-marked tongue. Therefore, features, tooth-marked tongue has been used as an effective
high-quality tongue body imaging is very important for tooth- signature of health in traditional Chinese medicine. Our model
marked tongue recognition. may provide an important research paradigm for

distinguishing tongue features, diagnosing syndromes of labels they gave were scene labels without any object calibration.
traditional Chinese medicine, tracking disease progression, The network neurons naturally evolved into object detectors.
and evaluating intervention effects, showing its unique Therefore, we consider the tooth-marked tongue recognition as a
potential in clinical applications. Potentially, the proposed tooth-marked area detection problem, rather than an instance-
method can also be used to evaluate the efficacy of the drug level classification problem. Unlike other detection methods such
by detecting the tooth marks of the tongue for noninvasive as (Girshick, 2015), we do not have instance-level labels. Our
ethnopharmacological evaluation (Wang et al., 2022). The method is inspired by the weakly supervised deep target detection
pathological cause of tooth-marked tongue is the change of method (Bilen and Vedaldi, 2016), which uses image-level labels
microcirculation of the tongue due to the compression of the to classify and detect candidate regions. Based on the candidate
tongue by the teeth. For example, there are blood supply regions, the image-level prediction results are obtained. By
disorders, local hypoxia, insufficient nutrition, tissue edema, filtering the scores of our detection branches, we can better
etc. in the area of tooth compression, and eventually tooth locate the tooth-marked area predicted by the model. By
marks are formed (Wang et al., 2020). Previous studies have comparison, extracting candidate regions by filtering and
shown that tooth-marked tongue is closely related to human labeling examples requires a lot of labeling costs in (Li et al.,
health and disease. The tooth-marked tongue is related to 2019).
human gender and age, in which males have fewer tooth marks Finally, this study still has certain shortcomings. First, the
and women have more tooth marks, and the relationship amount of data in this study is not large enough. The data comes
between the increase of age and the reduction of tooth from the same center, and the study of multi-center data has not
marks is more obvious (Hsu et al., 2019). In addition, there been carried out, which will be conducted in the future. Secondly,
is a positive correlation between lung capacity and tooth- there may be uncertainties in the gold standard label of clinical
marked tongue. The occurrence rate of tooth-marked tooth-marked tongue by the TCM due to the challenging of the
tongue is higher in patients with moderate or higher recognition of tooth-marks. Providing uncertainty estimates is
abdominal force. The occurrence of tooth-marked tongue in not only important for a safe decision-making in high-risks fields,
hypertensive patients without anemia is significantly related to but also crucial in fields where the data sources are highly
the increase in hematocrit. Patients with hypoalbuminemia is inhomogeneous and labeled data is rare (Gal, 2016).
mostly pale with tooth-marked tongue (Jing, 2002). The Uncertainty research (Kendall and Gal, 2017; Gawlikowski
number of tongue features such as tooth marks, average et al., 2021) will be introduced in the follow-up. In addition,
coverage area, maximum coverage area, minimum coverage the proposed method is based on the segmented tongue body to
area, and organs corresponding to the coverage area can be distinguish, and the deviation of the tongue body segmentation
used as criteria for evaluating chronic kidney disease or breast may bring discriminant bias. The follow-up will consider the
cancer (Lo et al., 2013; Chen et al., 2021). Patients with construction of multi-task learning for segmentation and
subacute eczema have a higher incidence of tooth marks detection to make two tasks promote each other, thereby
than patients with acute eczema and patients with chronic further improving the detection accuracy. Finally, the
eczema (Yu et al., 2017). proposed method has not yet carried out prospective
The proposed SRP module is more in line with the observation experiments. Since tooth-marked tongues are less than non-
rules of traditional Chinese medicine physicians. First, according to tooth marked tongues in clinical practice, there is uneven
the results of (Sun et al., 2019; Wang et al., 2020), tooth marks exist sample distribution. This is the follow-up model that needs to
on both sides of the tongue, and the tip of the tongue and the center be considered for clinical prospective experiments.
of the tongue are not the main discriminating areas. We use the
method of equidistant selection on both sides of the tongue, which
can efficiently extract the candidate regions of tooth marks. In 5 CONCLUSION
contrast, the method of (Li et al., 2019) based on the convexity area
detection method can extract the tooth-marked candidate area on In this study, we proposed a weakly supervised learning method of
the tooth-marked tongue. However, there is little obvious concave tooth-marked tongue recognition, by pre-training a CNN model
and convex information on the non-tooth marked tongue, which that classifies tooth-marked tongues, and then transferring it to the
makes it are difficult to generate the candidate regions of the tooth- WSTDN with the utilization of only image-level labels (tooth-
marked and non-tooth marked tongues efficiently and uniformly. marked tongue/non-tooth marked tongue) for fine-tuning.
In addition, the method of (Uijlings et al., 2013) does not use the Experimental results demonstrate that the proposed method
prior knowledge of tooth marks. The generated candidate areas with only image-level label annotations is effective, and its
have a large number of invalid frames and a lot of area overlap. It performance is comparable to that of the deep neural network
can be seen from Tables 2, 3 that our method has advantages in the method that requires a large number of instance labels. In addition,
generation time of candidate tooth-mark areas and model this method uses the CNN network for end-to-end training, and
classification performance. the tooth-marked tongue classification is achieved while the tooth-
We initialize the CNN model based on ImageNet weights and marked areas is located, which is convenient for clinical diagnosis
use the transfer learning method to obtain better tooth-marked and interpretation. This method is expected to play an important
tongue detection results. Inspired by (Zhou et al., 2016), when role in the clinical diagnosis of traditional Chinese medicine,
they trained this scene classification convolutional network, the especially in noninvasive ethnopharmacological evaluation.

DATA AVAILABILITY STATEMENT Writing—original draft, and Writing—review and editing. XW:
Data curation, Writing—review and editing. ZY: Data curation,
The original contributions presented in the study are included in Writing—review and editing. XH: Data curation,
the article/Supplementary Material, further inquiries can be Writing—review and editing. WL: Data curation,
directed to the corresponding authors. Writing—review and editing. SZ: Data curation,
Writing—review and editing. QD: Conceptualization,
Writing—review and editing. WZ: Conceptualization, Formal
ETHICS STATEMENT analysis, Funding acquisition, and Writing—review and
editing. All authors have read and approved the current
The studies involving human participants were reviewed and version of the manuscript.
approved by the Guangzhou University of Chinese Medicine. The
patients/participants provided their written informed consent to
participate in this study. FUNDING
This work is supported by the National Key R&D Program of
AUTHOR CONTRIBUTIONS China (2017YFC1700106), the National Natural Science
Foundation of China (82174224), the Key Research
JZ: Formal analysis, Methodology, Writing—original draft, and Program of the Chinese Academy of Sciences (ZDRW-ZS-
Writing—review and editing. SL: Formal analysis, Methodology, 2021-1-2).
Jing, F. (2002). General Situation of Modern Research on Tooth-Marked

REFERENCES Tongue (Review). J. Beijing Univer. Tradit. Chin. Med. (1), 57–59.
Kendall, Alex., and Gal, Yarin. What Uncertainties Do We Need in Bayesian Deep
Bilen, H., and Vedaldi, A. (2016). “Weakly Supervised Deep Detection Networks,” Learning for Computer Vision? arXiv preprint arXiv:1703.04977, 2017.
in Proceedings of the IEEE Conference on Computer Vision and Pattern Li, F., and Dong, C. (2017). Diagnostics of Traditional Chinese Medicine. Beijing:
Recognition (Oxford: IEEE), 2846–2854. doi:10.1109/CVPR.2016.311 Science Press.
Chen, J. M., Chiu, P. F., Wu, F. M., Hsu, P. C., Deng, L. J., Chang, C. C., et al. (2021). Li, X., Zhang, Y., Cui, Q., Yi, X., and Zhang, Y. (2019). Tooth-Marked Tongue
The Tongue Features Associated with Chronic Kidney Disease. Medicine Recognition Using Multiple Instance Learning and CNN Features. IEEE Trans.
(Baltimore) 100 (9), e25037. doi:10.1097/MD.0000000000025037 Cybern. 49 (2), 380–387. doi:10.1109/TCYB.2017.2772289
Chong Wang, W., Kaiqi Huang, H., Weiqiang Ren, R., Junge Zhang, Z., and Lo, L.-c., Chen, Y.-F., Chen, W.-J., Cheng, T.-L., and Chiang, J. Y. (2012). The Study
Maybank, S. (2015). Large-scale Weakly Supervised Object Localization via on the Agreement between Automatic Tongue Diagnosis System and
Latent Category Learning. IEEE Trans. Image Process 24 (4), 1371–1385. doi:10. Traditional Chinese Medicine Practitioners. Evidence-Based Complement.
1109/TIP.2015.2396361 Altern. Med. 2012, 1–9. doi:10.1155/2012/505063
Gal, Yarin. (2016). Uncertainty in Deep Learning. Lo, L. C., Cheng, T. L., Chiang, J. Y., and Damdinsuren, N. (2013). Breast Cancer
Gawlikowski, J., Njieutcheu Tassi, C. R., Ali, M., Lee, J., Humt, M., Feng, J., et al. A Index: A Perspective on Tongue Diagnosis in Traditional Chinese Medicine.
Survey of Uncertainty in Deep Neural Networks. arXiv preprint arXiv: J. Tradit. Complement. Med. 3 (3), 194–203. doi:10.4103/2225-4110.114901
2107.03342, 2021. Nair, V., and Hinton, G. E. (2010). “Rectified Linear Units Improve Restricted
Girshick, R. (2015). “Fast R-Cnn,” in Proceedings of the IEEE international Boltzmann Machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML-10), Haifa,
conference on computer vision (Microsoft Research: IEEE), 1440–1448. Isr., June 21-24, 2010.
doi:10.1109/iccv.2015.169 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
He, K. M., Zhang, X. Y., Ren, S. Q., and Sun, J. (2016). Deep Residual Learning for Imagenet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115,
Image Recognition. IEEE Conf. Comput. Vis. Pattern Recognit. 2016, 770–778. 211–252. doi:10.1007/s11263-015-0816-y
doi:10.1007/s11263-015-0816-y10.1109/cvpr.2016.90 Sun, Y., Dai, S., Li, J., Zhang, Y., and Li, X. (2019). Tooth-Marked Tongue
He, K., Zhang, X., Ren, S., and Sun, J. (2014). “Spatial Pyramid Pooling in Deep Recognition Using Gradient-Weighted Class Activation Maps. Future
Convolutional Networks for Visual Recognition,” in Proc. ECCV (Beijing: Internet 11, 45. doi:10.3390/fi11020045
IEEE), 346–361. doi:10.1007/978-3-319-10578-9_23 Tang, W., Gao, Y., Liu, L., Xia, T., He, L., Zhang, S., et al. (2020). An Automatic
Hsu, P. C., Wu, H. K., Huang, Y. C., Chang, H. H., Chen, Y. P., Chiang, J. Y., Recognition of Tooth- Marked Tongue Based on Tongue Region Detection and
et al. (2019). Gender- and Age-Dependent Tongue Features in a Tongue Landmark Detection via Deep Learning. IEEE Access 8,
Community-Based Population. Medicine (Baltimore) 98 (51), e18350. 153470–153478. doi:10.1109/ACCESS.2020.3017725
doi:10.1097/MD.0000000000018350 Tang, Y. (2013). Deep Learning Using Linear Support Vector Machines in CML
Hsu, Y., Chen, Y., Lo, L., and Chiang, J. Y. (2010). “Automatic Tongue 2013 Challenges in Representation Learning Workshop (Ontario: arXiv).
Feature Extraction,” in Proceedings of the Int. Comput. Symp., Tainan, Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., and Smeulders, A. W. M. (2013).
Taiwan, 16-18 Dec. 2010 (IEEE), 936–941. doi:10.1109/compsym.2010. Selective Search for Object Recognition. Int. J. Comput. Vis. 104, 154–171.
5685377 doi:10.1007/s11263-013-0620-5
Hu, Y., Wen, G., Liao, H., Wang, C., Dai, D., and Yu, Z. (2021). Automatic Wang, X., Liu, J., Wu, C., Liu, J., Li, Q., Chen, Y., et al. (2020). Artificial Intelligence
Construction of Chinese Herbal Prescriptions From Tongue Images Using in Tongue Diagnosis: Using Deep Convolutional Neural Network for
CNNs and Auxiliary Latent Therapy Topics. IEEE Transact. Cybernet. 51 Recognizing Unhealthy Tongue with Tooth-Mark. Comput. Struct.
(2), 708–721. doi:10.1109/TCYB.2019.2909925 Biotechnol. J. 18, 973–980. doi:10.1016/j.csbj.2020.04.002
Jiang, T., Hu, X., Yao, X., Tu, L., Huang, J., Ma, X., et al. (2021). Tongue Image Wang, X., Wang, X., Lou, Y., Liu, J., Huo, S., Pang, X., et al. (2022). Constructing Tongue
Quality Assessment Based on a Deep Convolutional Neural Network. Coating Recognition Model Using Deep Transfer Learning to Assist Syndrome
BMC Med. Inform. Decision Making 21, 147. doi:10.1186/s12911-021- Diagnosis and its Potential in Noninvasive Ethnopharmacological Evaluation.
01508-8 J. Ethnopharmacology 285 (1), 114905. doi:10.1016/j.jep.2021.114905

Weng, H., Li, L., Lei, H., Luo, Z., Li, C., and Li, S. (2021). A Weakly Supervised Tooth- Conflict of Interest: Author WL is employed by Beijing Yikang Medical
Mark and Crack Detection Method in Tongue Image. Concurr. Computat. Pract. Technology Co., Ltd.
Exper. 33 (16), e6262. doi:10.1002/cpe.6262
Xu, Q., Zeng, Y., Tang, W., Peng, W., Xia, T., Li, Z., et al. (2020). Multi-Task Joint The remaining authors declare that the research was conducted in the absence of
Learning Model for Segmenting and Classifying Tongue Images Using a Deep any commercial or financial relationships that could be construed as a potential
Neural Network. IEEE J. Biomed. Health Inform. 24 (9), 2481–2489. doi:10. conflict of interest.
1109/JBHI.2020.2986376
Yu, Z., Zhang, H., Fu, L., and Lu, X. (2017). Objective Research on Tongue Publisher’s Note: All claims expressed in this article are solely those of the authors
Manifestation of Patients with Eczema. Technol. Health Care 25 (S1), 143–149. and do not necessarily represent those of their affiliated organizations, or those of
doi:10.3233/THC-171316 the publisher, the editors and the reviewers. Any product that may be evaluated in
Zhang, D., Zhang, H., and Zhang, B. (2017). Tongue Image Analysis. New York, this article, or claim that may be made by its manufacturer, is not guaranteed or
NYBerlin Heidelberg: Springer. endorsed by the publisher.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016). “Learning
Deep Features for Discriminative Localization,” in 2016 IEEE Conference on Copyright © 2022 Zhou, Li, Wang, Yang, Hou, Lai, Zhao, Deng and Zhou. This is an
Computer Vision and Pattern Recognition (Cambridge: IEEE), 2921–2929. open-access article distributed under the terms of the Creative Commons Attribution
doi:10.1109/CVPR.2016.319 License (CC BY). The use, distribution or reproduction in other forums is permitted,
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2015). “Object provided the original author(s) and the copyright owner(s) are credited and that the
Detectors Emerge in Deep Scene CNNs,” in ICLR (Cambridge: arXiv). original publication in this journal is cited, in accordance with accepted academic
Zitnick, C. L., and Dollár, P. (2014). “Edge Boxes: Locating Object Proposals from Edges,” practice. No use, distribution or reproduction is permitted which does not comply
in Proc. ECCV (Cham: Springer), 391–405. doi:10.1007/978-3-319-10602-1_26 with these terms.

Marking

Uploaded by

Document Informationclick to expand document informationgloss

Document Informationclick to expand document information

Copyright:

Available Formats

Marking

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Marking

Uploaded by

Copyright:

Available Formats

ORIGINAL RESEARCH

published: 12 April 2022

Weakly Supervised Deep Learning for

Frontiers in Physiology | www.frontiersin.org 1 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 2 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 3 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 4 April 2022 | Volume 13 | Article 847267

output have different dimensions. Solid line ⊕ is calculated as

2.3.2 Weakly Supervised Tooth-Marked Tongue

Frontiers in Physiology | www.frontiersin.org 5 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 6 April 2022 | Volume 13 | Article 847267

TABLE 1 | Performance comparison of different methods for tooth-marked tongue recognition.

Backbone Methods Accuracy Precision Recall F1-score

Frontiers in Physiology | www.frontiersin.org 7 April 2022 | Volume 13 | Article 847267

TABLE 2 | Performance comparison of different region proposal methods.

Backbone Methods Accuracy Precision Recall F1-score Time per

TABLE 3 | Performance of ablation study in the proposed WSTDN method.

Backbone Methods Accuracy Precision Recall F1-score

WSTDN IW 0.7460 ± 0.0569 0.8403 ± 0.1120 0.4795 ± 0.2397 0.5661 ± 0.1795

Frontiers in Physiology | www.frontiersin.org 8 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 9 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 10 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 11 April 2022 | Volume 13 | Article 847267

Jing, F. (2002). General Situation of Modern Research on Tooth-Marked

Frontiers in Physiology | www.frontiersin.org 12 April 2022 | Volume 13 | Article 847267

Frontiers in Physiology | www.frontiersin.org 13 April 2022 | Volume 13 | Article 847267

You might also like