Abstract
Semi-supervised learning can reduce the burden of manual label data and improve classification performance by learning with unlabelled data. However, due to the absence of label constraints, unlabelled data is usually ambiguous, which typically results in requiring large datasets to learn the correct feature space distribution. The inherently small sample characteristics of medical image datasets may make semi-supervised learning unstable, which may lead to mixed results and even degrade performance. The domain knowledge (DK) of the physician is of great value for disease diagnosis. In this paper, we propose to promote semi-supervised learning with DK and develop a DScGANS model (DScGAN (dual-path semi-supervised conditional generative adversarial networks) and S3VM (semi-supervised support vector machine)) to diagnose ultrasound thyroid nodules. DScGAN uses DK as a condition and multimodal ultrasound data for training. We concatenate the image representation of DScGAN learning and use it as the input of S3VM. DK will be used as a condition to constrain S3VM for thyroid nodule classification. The experimental results show that our proposed model can effectively avoid mixed results that may occur in semi-supervised learning with a small medical dataset with insufficient labels. Additionally, our model provides stable and advanced diagnostic performance and is potentially integrated into the thyroid ultrasound system.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Domain knowledge
- Semi-supervised
- Generative adversarial networks
- Ultrasound
- Thyroid nodules classification
1 Introduction
Ultrasonography has become the preferred method for clinical diagnosis of thyroid nodules [1]. The Thyroid Imaging Reporting and Data System (TI-RADS) [2] provides standardised terminology that describes thyroid nodule features in ultrasound B-mode images, including boundary, calcification, echo pattern, etc. Ultrasound elastography reflects the hardness of lesion tissue by using the colour distribution ratio [3]. In clinical diagnosis, radiologists generally consider these two kinds of information together and then give diagnostic results. However, manual diagnostics not only have a large workload but also have subjective differences. Therefore, reducing the workload and misdiagnosis rate of radiologists by computer-aided diagnosis is important.
Recently, deep learning has become greatly successful in the diagnosis of ultrasound thyroid nodules. Among them, semi-supervised learning has been receiving increasingly more attention because it can greatly reduce the burden of manually annotating data by using unlabelled data for learning [4, 5]. Wang et al. [4] proposed a semi-supervised learning model based on weak label data for the diagnosis of thyroid nodules. Ding et al. [5] used the MIL (multiple-instance learning) method to extract pre-designed features of ultrasound B-mode and elasticity data and applied SVM for classification. Although these methods produce commendable results, the inherently small sample characteristics of medical image datasets may make the semi-supervised learning process unstable, which will lead to mixed results and even degrade the performance of the model. Therefore, how to ensure stable and advanced diagnostic performance when performing semi-supervised learning on a small medical dataset with insufficient labels remains a challenging issue.
To address these challenges, we propose the DScGANS model (DScGAN (dual-path semi-supervised conditional generative adversarial networks) and the S3VM (semi-supervised support vector machine)), which is built by combining semi-supervised generative adversarial networks (sGAN) [6], conditional generative adversarial networks (cGAN) [7] and S3VM [8]. The uniqueness of our model includes the following: (1) We quantify the features of thyroid nodules in the B-mode ultrasound image described by TI-RADS and the hardness information of lesion tissue in the elasticity image and use them as domain knowledge (DK) to promote semi-supervised learning; (2) DK acts as a conditional constraint generator to produce higher quality images. When DK connects with deep features of labelled data in the discriminant network, correlation information between DK, deep features and class labels will be learned. The correlation information can be used as a condition constraint for the unlabelled data when DK is connected with deep features of unlabelled data. Thus, the ambiguity of unlabelled data will significantly reduce, which will enable DScGAN to use unlabelled data for stable and high-performing feature extraction; (3) DK is used as a condition to constrain S3VM for thyroid nodule classification. As S3VM combines powerful regularization-based SVMs with the cluster assumption [9], when concatenating DK and the deep features learned by DScGAN as the input of S3VM, the conditional constraint DK will make the different class of unlabelled data more inclined to their own cluster, which greatly improves the classification results. The experimental results show that our proposed model can provide a stable 90.5% accuracy and 91.4% AUC using only 35% of labelled data by semi-supervised learning and potentially integrated into the thyroid ultrasound system.
2 Datasets
We collected 3090 fully anonymous thyroid nodule lesion images of 1030 different age stages (the age older than 18 years) patients from the co-operative hospital and ensured that different sizes of nodules were included. The average size of the nodules is 2.4 cm. Each lesion image includes two types of images, B-mode and elasticity, which correspond to the same lesion area. In 3090 pairs of images, including 1601 pairs of benign nodules and 1489 pairs of malignant nodules. Each pair of images is labelled as benign or malignant by two senior radiologists based on pathological examination results. Finally, a total of 6180 images are obtained, with each type of image accounting for 50%.
3 Methods
Our proposed DScGANS model is shown in Fig. 1. The model has three steps: (1) image pre-processing and augmentation, (2) acquiring DK, and then use the B-mode and elastic data as input to train the DScGAN component (DScGAN-A/B) under the constraints of DK, (3) saving the discriminator network as a feature extraction network and concatenating the output image representation of a penultimate fully connected layer (the output result includes both deep features and DK) as input to S3VM for thyroid nodule classification.
3.1 Image Preprocessing and Augmentation
From each B-mode image, we obtain square regions of interest (ROI) by using the method in [10] and define the ROI as an OB patch. We also obtain an elasticity ROI according to the B-mode ROI position and define it as the OE patch. To characterise the shape of the nodule, we use the live-wire algorithm [11] to semi-automatically segment the nodule area in the OB patch with the help of an experienced radiologist; then, we set the pixel value of the nodule region to 255 and that of the non-nodule region to 0 and define it as an OS patch. Then, the OB patch, OE patch, and OS patch are resized to 128 × 128. We applied data augmentation (including random rotation and horizontal and vertical flipping) to enlarge each patch and apply them to form three datasets. Note that the OS patch is only used to automatically extract DK, while the OB and OE patches are the inputs of DScGAN-A and DScGAN-B.
3.2 Automatic Acquisition of Domain Knowledge
According to TI-RADS [2] and the elastography score [3] and after discussing with experienced radiologists, we calculated the following DK.
-
1.
Dimension. First, the OS patch is fitted using an ellipse [12]. For two axes of the fitting ellipse, the axis closer to the horizontal is defined as the horizontal axis, and its length is represented by \( L_{h} \). The other axis is defined as the vertical axis, and its length is represented by \( L_{v} \). \( L_{h} \) and \( L_{v} \) are used as the values of dimension.
-
2.
Orientation. Growth direction is important for reflecting the shape of the nodule. We use the rotation angle of the fitting ellipse as the values of orientation.
-
3.
Aspect ratio. The aspect ratio of malignant nodules is likely to be greater than 1 [2]. The ratio of \( L_{v} \) to \( L_{h} \) is used as the aspect ratio.
-
4.
Boundary feature. It can be described as clear or unclear. According to the nodule area (pixel value is 255) and surrounding tissue (pixel value is 0) in the OS patch, we use acutance [13] to calculate the nodule boundary feature in the OB patch.
-
5.
Echo pattern. This pattern can be described as hypoechoic, hyperechoic, or isoechoic. Locate the nodule area with the OS patch. We calculated the ratio of the gray value of the nodule area to the gray value of the surrounding tissue area in the OB patch and used it as the value of the echo pattern.
-
6.
Cystic or Solid. A cystic nodule is more likely to be benign. Locate the nodule area with the OS patch. We calculated the average gray value of the nodule area in the OB patch and used the result to determine if the value represents cystic or solid.
-
7.
Calcification. The calcification can be described as coarse or micro-calcification. Locate the nodule area with the OS patch. We calculated the calcification information of the nodule area in the OB patch using the method in [13].
-
8.
Elastography colour distribution. The colour distribution is quantised by calculating the RGB colour histogram inside the image [14]. For each OE patch, a 1 × 64-dimensional vector is obtained as values of elastography colour distribution.
Finally, the DK dimensions of the B-mode and elasticity patches are 1 × 8 and 1 × 64.
3.3 DScGAN Component
The DScGAN component is built by combining sGAN [6] and cGAN [7].
Generator:
We use \( G \) to indicate, which consists of a fully connected layer and five deconvolution layers, the parameter is \( \varvec{\theta}_{g} \). Each input random noise vector \( \varvec{z} \) has a corresponding DK as condition \( \varvec{Dk} \), which is obtained from the real image corresponding to a fake image. \( G \) can use both to generate image patch \( G\left( {\left( {\varvec{z},\varvec{Dk}} \right);\varvec{\theta}_{g} } \right) \).
Discriminator:
We use \( D \) to indicate, which consists of six convolutional layers, a feature dimension reduction layer (fdr), a joint learning network (jln) and a fully connected layer; the parameter is \( \varvec{\theta}_{d} \). The fdr layer is a fully connected layer that has 16 neurons (DScGAN-B has 128 neurons) and is used to reduce the dimension of deep features before concatenating the DK and deep features. Our experimental results demonstrate that high-dimensional deep features will overwhelm low-dimensional DK if they are connected directly. After the DK and deep features are concatenated, they are fed into the jln. The jln consists of two fully connected layers to learn the correlation information between DK, deep features and class labels of labelled data. Assuming that the real image patch has \( k \) classes, we define the image patch generated by \( G \) as the \( \left( {k + 1} \right) \)-th class. Thus, the last fully connected layer has \( \left( {k + 1} \right) \) neurons, of which the first \( k \) are softmax neurons, and the \( \left( {k + 1} \right) \)-th is a sigmoid neuron. An input image patch \( I \) may come from a real image with a label \( data\left( {x,y} \right) \), a real image without a label \( data\left( x \right) \), or a generated fake image \( G\left( {\left( {{\mathbf{z}},\varvec{Dk}} \right);\varvec{\theta}_{g} } \right) \). We use the first \( k \) neurons to calculate the class probability that the input \( I \sim data\left( {x,y} \right) \) belongs to class \( j \) as follows:
where y is the class label of input \( I \). We use the \( \left( {k + 1} \right) \)-th neuron to calculate the probability that the input \( I \) is from the real image \( data\left( x \right) \) or the fake image \( G\left( {\left( {{\mathbf{z}},\varvec{Dk}} \right);\varvec{\theta}_{g} } \right) \). We define the output of th \( {\text{e }}\left( {k + 1} \right) \)-th neuron as \( D^{{\left( {k + 1} \right)}} \left( {\left( {I,\varvec{Dk}} \right);\varvec{\theta}_{d} } \right) \) as follows:
If \( I \) is from a fake image, the value of variable \( l \) is 0; otherwise the value is 1.
3.4 Training
To train the DScGAN component, we use a variety of loss functions during the training process. For the input random noise vector \( \varvec{z} \), the generator loss function is:
The loss function of the discriminator consists of two parts: the supervised loss function \( L_{D\_su} \) and the unsupervised loss function \( L_{D\_un} \). We use \( L_{D\_un} \) as the loss of distinguishing whether input \( \varvec{ }I \) is from the real or fake image and \( L_{D\_su} \) as the loss of distinguishing whether input \( I \) belongs to class \( j \). Given an input \( I \), the loss function of the discriminator is:
S3VM Classifier.
S3VM [8] is used as a classifier, given its good application in the two-category classification task. We denote \( \varvec{F}_{\alpha } \) and \( \varvec{F}_{\beta } \) as the outputs of the penultimate fully connected layer of DScGAN-A and DScGAN-B, which contain their own deep features and DK. We concatenate \( \varvec{F}_{\alpha } \) with \( \varvec{F}_{\beta } \), and define them as feature vector \( \varvec{F}_{\alpha *\beta } \). Thus, DK will be used as a conditional constraint for unlabelled data during the classification process of S3VM. Assuming that \( \varvec{F}_{\alpha *\beta }^{\left( i \right)} \) represents the \( i \)-th vector in \( \varvec{F}_{\alpha *\beta } \), \( L = \left\{ {\left( {\varvec{F}_{\alpha *\beta }^{\left( 1 \right)} ,y_{1} } \right), \ldots ,\left( {\varvec{F}_{\alpha *\beta }^{\left( m \right)} ,y_{m} } \right)} \right\} \) denote the labelled dataset, \( y_{m} \) is the benign or malignant label, \( U = \left\{ {\varvec{F}_{\alpha *\beta }^{{\left( {m + 1} \right)}} , \ldots ,\varvec{F}_{\alpha *\beta }^{{\left( {m + n} \right)}} } \right\} \) denote the unlabelled dataset, and \( f\left( {\varvec{F}_{\alpha *\beta }^{\left( i \right)} } \right) = \left\langle {\omega ,\varphi \left( {\varvec{F}_{\alpha *\beta }^{\left( i \right)} } \right)} \right\rangle + b \) is the discriminating hyperplane. \( \omega \) is the hyperplane normal vector, \( b \) is the offset term, and \( \varphi \) is a transformation function from an input space to a high-dimensional reproducing kernel Hilbert space. The S3VM discriminator optimisation problem is as follows:
where \( H\left( t \right) \) is the hinge loss of labelled data, \( \tilde{H}\left( t \right) \) is the symmetric hinge loss of unlabelled data, \( C \) is the regularization parameter, and \( C^{*} \) is the penalization parameter of unlabelled data. When \( C^{*} \) = 0, Eq. (7) will degenerate into the standard SVM.
4 Experiments and Results
In our experiments, we randomly divide the dataset in a mutually exclusive way: 80% for training, and 20% as an independent test set. To mitigate the problem that labelled data are usually too few to perform effective cross-validation, 5-fold cross-validation is thus used at each stage to increase the number of labelled data in each fold and tested on an independent test set for evaluation. Since more than one ultrasound image can be collected from each patient, we ensure that the image patches from the same patient are assigned to the same folding.
Table 1 shows the mean and standard deviation (std) of the predictive capability of the DScGANS model on the test set. Specifically, 25% means that 25% of data in the training set are labelled, which are obtained by random selection. FSL (fully supervised learning) indicates that the training set is composed entirely of labelled data. We applied ten experiments on the training set independently (ensuring that all samples were used as labelled and unlabelled data to form the training set at least once), with 5-fold cross-validation, and calculated the average test result on the test set. The results indicate that regardless of the proportion of labelled data being used, the standard deviation of the obtained performance is smaller when using DK. This result clearly shows that DK tremendously improves the stability of semi-supervised learning. Furthermore, with the help of DK, our proposed model can achieve stable and advanced diagnostic performance that is almost on par with that of FSL, using only approximately 30%–35% of labelled data. Subsequently, we continue to increase the amount of labelled data without significant performance gains.
5 Discussion
Comparison of Deep Features and DK Combinations in Different Dimensions:
Figure 2 shows the ROC curve and AUC value obtained when the original deep feature (without dimensionality reduction) and the reduced dimensional feature size are 3, 2, or 1 times that of DK, using 35% of the labelled data (given this situation, almost optimal performance is obtained with minimal labelled data). When the dimension of the deep features was twice that of DK, the proposed model obtained an optimal AUC value. Plausibly, the deep learning model can capture more valuable implied information than can human experts, and when the dimension of deep features is too high, DK will be overwhelmed, which will degrade the performance of the DScGANS model.
S3VM Kernel Function and Parameter C and C* Setting:
In Table 2, we compare the test results of using the linear kernel and the Gaussian kernel (based on libsvm in sklearnFootnote 1) on different proportions of labelled data. Specifically, 25% means that 25% of the data in the training set are labelled. To mitigate the possible negative effect of unlabelled data on the results, we set \( C^{*} \) to begin with a small value and reach the same value as \( C \) when the model converges and define \( C^{*} \) = \( C_{p} \times C \), where \( C_{p} \) is the proportionality factor of \( C \). Note that \( C^{*} \) is 0 in FSL. The grid search strategy was adopted to find optimal parameters. Finally, we use the Gaussian kernel and \( C \) = 10, \( C_{p} \) = 0.2 and \( \gamma \) = 2 with 35% of labelled data and obtain stable and advanced diagnostic performance that is almost on par with that of FSL.
6 Conclusion
We propose the DScGANS model for the diagnosis of ultrasound thyroid nodules based on multimodal ultrasound data. The experimental results suggest that with the help of DK, our proposed model can effectively avoid mixed results that may occur when learning with a small medical dataset with insufficient labels and can provide stable and advanced diagnostic performance; this model can be potentially integrated into the thyroid ultrasound system. In future works, we aim to achieve automatic segmentation of the nodules and integrate more types of DK to train the proposed model and extend our model to real medical environments.
References
Acharya, U.R., Faust, O., Sree, S.V., et al.: ThyroScreen system: high resolution ultrasound thyroid image characterization into benign and malignant classes using novel combination of texture and discrete wavelet transform. Comput. Methods Programs Biomed. 107(2), 233–241 (2012)
Park, J.Y., Lee, H.J., Jang, H.W., et al.: A proposal for a thyroid imaging reporting and image system for ultrasound features of thyroid carcinoma. Thyroid 19(11), 1257–1264 (2009)
Hong, Y., et al.: Real-time ultrasound elastography in the differential diagnosis of benign and malignant thyroid nodules. J. Ultrasound Med. 28(7), 861–867 (2009)
Wang, J., Li, S., Song, W., et al.: Learning from weakly-labelled clinical image for automatic thyroid nodule classification in ultrasound images. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3114–3118. IEEE (2018)
Ding, J., Cheng, H., Huang, J., et al.: Multiple-instance learning with global and local features for thyroid ultrasound image classification. In: 2014 7th International Conference on Biomedical Engineering and Informatics, pp. 66–70. IEEE (2014)
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Gu, B., Yuan, X.T., Chen, S., et al.: New incremental learning algorithm for semi-supervised support vector machine. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1475–1484. ACM (2018)
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: AISTATS 2005, pp. 57–64 (2005)
Koundal, D., Vishraj, R., Gupta, S., et al.: An automatic ROI extraction technique for Thyroid Ultrasound image. In: 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS), pp. 1–5. IEEE (2015)
Falcão, A.X., Udupa, J.K., Samarasekera, S., et al.: User-steered image segmentation paradigms: live wire and live lane. Graph. Models Image Process. 60(4), 233–260 (1998)
Fitzgibbon, A., Pilu, M., Fisher, R.B.: Direct least square fitting of ellipses. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 476–480 (1999)
Wang, H., Yang, Y., Peng, B., et al.: A thyroid nodule classification method based on TI-RADS. In: Ninth International Conference on Digital Image Processing (ICDIP 2017), vol. 10420, pp. 1042041. International Society for Optics and Photonics (2017)
Zhou, J.X., Liu, X., Xu, T.W., et al.: A new fusion approach for content-based image retrieval with color histogram and local directional pattern. Int. J. Mach. Learn. Cybernet. 9(4), 677–689 (2018). https://doi.org/10.1007/s13042-016-0597-9
Acknowledgements
This work is partly supported by the National Natural Science Foundation of China (Grant number 61872261), the Funding Project of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (Grant No. 2018-VRLAB2018B07), Shanxi Scholarship Council of China (201801D121139) and the Department of Radiology, Shanxi Province Cancer Hospital.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, W. et al. (2019). DScGANS: Integrate Domain Knowledge in Training Dual-Path Semi-supervised Conditional Generative Adversarial Networks and S3VM for Ultrasonography Thyroid Nodules Classification. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-32251-9_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)