Abstract
Whole slide images (WSIs) are high-resolution and lack localized annotations, whose classification can be treated as a multiple instance learning (MIL) problem while slide-level labels are available. We introduce a approach for WSI classification that leverages the MIL and Transformer, effectively eliminating the requirement for localized annotations. Our method consists of three key components. Firstly, we use ResNet50, which has been pre-trained on ImageNet, as an instance feature extractor. Secondly, we present a Transformer-based MIL aggregator that adeptly captures contextual information within individual regions and correlation information among diverse regions within the WSI. Thirdly, we introduce the global average pooling (GAP) layer to increase the mapping relationship between WSI features and category features. To evaluate our model, we conducted experiments on the The Cancer Imaging Archive (TCIA) Clinical Proteomic Tumor Analysis Consortium (CPTAC) dataset. Our proposed method achieves a top-1 accuracy of 94.8% and an area under the curve (AUC) exceeding 0.996, establishing state-of-the-art performance in WSI classification without reliance on localized annotations. The results demonstrate the superiority of our approach compared to previous MIL-based methods.
H. Luan and T. Hu—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, H., Qi, X., Yu, L., Heng, P.A.: Dcan: deep contour-aware networks for accurate gland segmentation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Choromanski, K., et al.: Rethinking attention with performers. arXiv preprint arXiv:2009.14794 (2020)
Deng, S., et al.: Deep learning in digital pathology image analysis: a survey. Front. Med. 14(4), 18 (2020)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy
Feng, J., Zhou, Z.H.: Deep miml network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. AAAI’17, pp. 1884–1890. AAAI Press (2017)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1243–1252. PMLR, 06–11 August 2017. https://proceedings.mlr.press/v70/gehring17a.html
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 2127–2136. PMLR, 10–15 July 2018. https://proceedings.mlr.press/v80/ilse18a.html
Islam, M.A., Jia, S., Bruce, N.D.: How much position information do convolutional neural networks encode. arXiv preprint arXiv:2001.08248 (2020)
Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451 (2020)
Kraus, O.Z., Ba, J.L., Frey, B.J.: Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32(12), i52–i59 (2016). https://doi.org/10.1093/bioinformatics/btw252
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops 2021, pp. 14318–14328 (2021)
Lu, M.Y., et al.: AI-based pathology predicts origins for cancers of unknown primary. Nature 594(7861), 106–110 (2021)
Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 1–16 (2021)
Pinheiro, P.O., Collobert, R.: From image-level to pixel-level labeling with convolutional networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1713–1721 (2015). https://doi.org/10.1109/CVPR.2015.7298780
Sabeena Beevi, K., Nair, M.S., Bindu, G.: Automatic mitosis detection in breast histopathology images using convolutional neural network based deep transfer learning. Biocybern. Biomed. Eng. 39(1), 214–223 (2019). https://doi.org/10.1016/j.bbe.2018.10.007, https://www.sciencedirect.com/science/article/pii/S0208521618302572
Shao, Z., et al.: Transmil: transformer based correlated multiple instance learning for whole slide image classification. In: Advances in Neural Information Processing Systems, vol. 34, pp. 2136–2147 (2021)
Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Wang, S., Li, B.Z., Khabsa, M., Fang, H., Ma, H.: Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)
Xing, F., Yang, L.: Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: a comprehensive review. IEEE Rev. Biomed. Eng. 9, 234–263 (2016). https://doi.org/10.1109/RBME.2016.2515127
Xu, Y., Jia, Z., Ai, Y., Zhang, F., Lai, M., Chang, E.I.C.: Deep convolutional activation features for large scale brain tumor histopathology image classification and segmentation. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 947–951 (2015). https://doi.org/10.1109/ICASSP.2015.7178109
Zheng, Y., et al.: A graph-transformer for whole slide image classification. IEEE Trans. Med. Imaging 41(11), 3003–3015 (2022). https://doi.org/10.1109/TMI.2022.3176598
Zheng, Y., et al.: Diagnostic regions attention network (DRA-net) for histopathology WSI recommendation and retrieval. IEEE Trans. Med. Imaging 40(3), 1090–1103 (2021). https://doi.org/10.1109/TMI.2020.3046636
Acknowledgments
This work was supported by the National Natural Science Foundation of China (grant numbers 92259101) and the Strategic Priority Research Program of the Chinese Academy of Sciences (grant number XDB38040100).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Availability
The pathology slides and corresponding labels for WSIs are available from the CPTAC Pathology Portal. All source code used in our study was implemented in Python using PyTorch learning library, which are available at https://github.com/Luan-zb/TMG.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Luan, H. et al. (2023). Multi-class Cancer Classification of Whole Slide Images Through Transformer and Multiple Instance Learning. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_12
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)