Abstract
Purpose
For realizing computer-aided diagnosis (CAD) of computed tomography (CT) images, many pattern recognition methods have been applied to automatic classification of normal and abnormal opacities; however, for the learning of accurate classifier, a large number of images with correct labels are necessary. It is a very time-consuming and impractical task for radiologists to give correct labels for a large number of CT images. In this paper, to solve the above problem and realize an unsupervised class labeling mechanism without using correct labels, a new clustering algorithm for diffuse lung diseases using frequent attribute patterns is proposed.
Methods
A large number of frequently appeared patterns of opacities are extracted by a data mining algorithm named genetic network programming (GNP), and the extracted patterns are automatically distributed to several clusters using genetic algorithm (GA). In this paper, lung CT images are used to make clusters of normal and diffuse lung diseases.
Results
After executing the pattern extraction by GNP, 1,148 frequent attribute patterns were extracted; then, GA was executed to make clusters. This paper deals with making clusters of normal and five kinds of abnormal opacities (i.e., six-class problem), and then, the proposed method without using correct class labels in the training showed 47.7 % clustering accuracy.
Conclusion
It is clarified that the proposed method can make clusters without using correct labels and has the potential to apply to CAD, reducing the time cost for labeling CT images.
Similar content being viewed by others
Notes
Clustering accuracy = (164+829+119+257+540+108+282+301+935+546+126+54+553)/10094 = 0.477.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, pp 487–499
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chen H, Xu Y, Ma Y, Ma B (2010) Neural network ensemble-based computer-aided diagnosis for differentiation of lung nodules on CT images: clinical evaluation. Acad Radiol 17(5):595–602
Goldberg DE (1989) Genetic algorithm in search, optimization and machine learning. Addison-Wesley, Boston
Gonzales E, Mabu S, Taboada K, Shimada K, Hirasawa K (2010) Efficient pruning of class association rules using statistics and genetic relation algorithm. J Control Measurement Syst Integr 3(5):336–345
Kim KG, Goo JM, Kim JH, Lee HJ, Min BG, Bae KT, Im JG (2005) Computer-aided diagnosis of localized ground-glass opacity in the lung at CT: initial experience. Radiology 237(2):657–661
Kuwahara M, Kido S, Shouno H (2009) Classification of patterns for diffuse lung diseases in thoracic ct images by adaboost algorithm. In: Proceedings of SPIE, medical imaging, computer-aided diagnosis. 7260:37–1–8
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Mabu S, Hirasawa K, Hu J (2007) A graph-based evolutionary algorithm: genetic network programming (GNP) and its extension using reinforcement learning. Evol Comput 15(3):369–398
Machine Learning Group at the University of Waikato (2015) Waikato environment for knowledge analysis, open source project for machine learning. www.cs.waikato.ac.nz/ml/weka/
Miranda GHB, Felipe JC (2015) Computer-aided diagnosis system based on fuzzy logic for breast cancer categorization. Comput Biol Med 64:334–346
Quinlan JR (1993) C4 5: programs for machine learning, vol 1. Morgan kaufmann, Burlington
Rawat J, Singh A, Bhadauria H, Virmani J (2015) Computer aided diagnostic system for detection of leukemia using microscopic images. Procedia Comput Sci 70:748–756. In: Proceedings of the 4th international conference on eco-friendly computing and communication systems
Rui X, Hirano Y, Tachibana R, Shoji K (2013) A bag-of-features approach to classify six types of pulmonary textures on high-resolution computed tomography. IEICE Trans Inf Syst 96(4):845–855
Shimada K, Hirasawa K, Hu J (2006) Genetic network programming with acquisition mechanisms of association rules. J Adv Comput Intell Intell Inform 10(1):102–111
Wedashwara W, Mabu S, Obayashi M, Kuremoto T (2016) Combination of genetic network programming and knapsack problem to support record clustering on distributed databases. Expert Syst Appl 46:15–23
Zhao W, Xu R, Hirano Y, Tachibana R, Kido S (2013) Classification of diffuse lung diseases patterns by a sparse representation based method on hrct images. In: 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 5457–5460
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Human and animal rights
This article does not contain any studies with animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
About this article
Cite this article
Mabu, S., Obayashi, M., Kuremoto, T. et al. Unsupervised class labeling of diffuse lung diseases using frequent attribute patterns. Int J CARS 12, 519–528 (2017). https://doi.org/10.1007/s11548-016-1476-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-016-1476-2