Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447548.3467069acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation

Published: 14 August 2021 Publication History

Abstract

With the development of electronic health records (EHRs), prenatal care examination records have become available for developing automatic prediction or diagnosis approaches with machine learning methods. In this paper, we study how to effectively learn representations applied to various downstream tasks for EHR data. Although several methods have been proposed in this direction, they usually adapt classic sequential models to solve one specific diagnosis task or address unique EHR data issues. This makes it difficult to reuse these existing methods for the early diagnosis of pregnancy complications or provide a general solution to address the series of health problems caused by pregnancy complications. In this paper, we propose a novel model RAPT, which stands for RepresentAtion by Pre-training time-aware Transformer. To associate pre-training and EHR data, we design an architecture that is suitable for both modeling EHR data and pre-training, namely time-aware Transformer. To handle various characteristics in EHR data, such as insufficiency, we carefully devise three pre-training tasks to handle data insufficiency, data incompleteness and short sequence problems, namely similarity prediction, masked prediction and reasonability check. In this way, our representations can capture various EHR data characteristics. Extensive experimental results for four downstream tasks have shown the effectiveness of the proposed approach. We also introduce sensitivity analysis to interpret the model and design an interface to show results and interpretation for doctors. Finally, we implement a diagnosis system for pregnancy complications based on our pre-training model. Doctors and pregnant women can benefit from the diagnosis system in early diagnosis of pregnancy complications.

Supplementary Material

MP4 File (KDD21-fp227.mp4)
Presentation video

References

[1]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR, Vol. abs/1607.06450 (2016).
[2]
Tian Bai, Shanshan Zhang, Brian L. Egleston, and Slobodan Vucetic. 2018. Interpretable Representation Learning for Healthcare via Capturing Disease Progression through Time. In KDD. ACM, 43--51.
[3]
Inci M. Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K. Jain, and Jiayu Zhou. 2017. Patient Subtyping via Time-Aware LS™ Networks. In KDD. ACM, 65--74.
[4]
Thomas A Buchanan, Anny H Xiang, et al. 2005. Gestational diabetes mellitus. The Journal of clinical investigation, Vol. 115, 3 (2005), 485--491.
[5]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In NeurIPS .
[6]
Zhengping Che, David C. Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep Computational Phenotyping. In KDD. ACM, 507--516.
[7]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In ICML (Proceedings of Machine Learning Research, Vol. 119). PMLR, 1597--1607.
[8]
Yu Cheng, Fei Wang, Ping Zhang, and Jianying Hu. 2016. Risk Prediction with Electronic Health Records: A Deep Learning Approach. In SDM. SIAM, 432--440.
[9]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based Attention Model for Healthcare Representation Learning. In KDD. ACM, 787--795.
[10]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter F. Stewart. 2016. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. In NIPS. 3504--3512.
[11]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In CVPR (1). IEEE Computer Society, 539--546.
[12]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR. OpenReview.net.
[13]
Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised Sequence Learning. In NIPS. 3079--3087.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171--4186.
[15]
Alexey Dosovitskiy, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas Brox. 2014. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks. In NIPS. 766--774.
[16]
Junyi Gao, Cao Xiao, Yasha Wang, Wen Tang, Lucas M. Glass, and Jimeng Sun. 2020. StageNet: Stage-Aware Neural Networks for Health Risk Prediction. In WWW. ACM / IW3C2, 530--540.
[17]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In CVPR. IEEE, 9726--9735.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770--778.
[19]
Sepp Hochreiter and Jü rgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., Vol. 9, 8 (1997), 1735--1780.
[20]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster) .
[21]
Evangelia Kintiraki, Sophia Papakatsika, George Kotronis, Dimitrios G Goulis, and Vasilios Kotsis. 2015. Pregnancy-induced hypertension. Hormones, Vol. 14, 2 (2015), 211--223.
[22]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR. OpenReview.net.
[23]
Junyu Luo, Muchao Ye, Cao Xiao, and Fenglong Ma. 2020. HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records. In KDD. ACM, 647--656.
[24]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. 2017. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks. In KDD. ACM, 1903--1911.
[25]
Ishan Misra and Laurens van der Maaten. 2020. Self-Supervised Learning of pre-training-Invariant Representations. In CVPR. IEEE, 6706--6716.
[26]
Phuoc Nguyen, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh. 2017. textttDeepr: A Convolutional Net for Medical Records. IEEE J. Biomed. Health Informatics, Vol. 21, 1 (2017), 22--30.
[27]
World Health Organization. 2017. World Health Statistics 2017: Monitoring Health for the SDGs, Sustainable Development Goals .World Health Organization. https://books.google.com/books?id=JVXptAEACAAJ
[28]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In NAACL-HLT. Association for Computational Linguistics, 2227--2237.
[29]
Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, Vol. 20 (1987), 53--65.
[30]
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., Vol. 15, 1 (2014), 1929--1958.
[31]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.
[33]
Jingyuan Wang, Ze Wang, Jianfeng Li, and Junjie Wu. 2018. Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis. In KDD. ACM, 2437--2446.
[34]
Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2020. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. In ICLR. OpenReview.net.
[35]
Yan-Ting Wu, Chen-Jie Zhang, Ben Willem Mol, Andrew Kawai, Cheng Li, Lei Chen, Yu Wang, Jian-Zhong Sheng, Jian-Xia Fan, Yi Shi, et al. 2020. Early prediction of gestational diabetes mellitus in the Chinese population via advanced machine learning. The Journal of Clinical Endocrinology & Metabolism (2020).
[36]
Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination. CoRR, Vol. abs/1805.01978 (2018).
[37]
Mang Ye, Xu Zhang, Pong C. Yuen, and Shih-Fu Chang. 2019. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature. In CVPR. Computer Vision Foundation / IEEE, 6210--6219.
[38]
Jun Zhang, Susan Meikle, and Ann Trumble. 2003. Severe maternal morbidity associated with hypertensive disorders in pregnancy in the United States. Hypertension in pregnancy, Vol. 22, 2 (2003), 203--212.
[39]
Xianli Zhang, Buyue Qian, Shilei Cao, Yang Li, Hang Chen, Yefeng Zheng, and Ian Davidson. 2020. INPREM: An Interpretable and Trustworthy Predictive Model for Healthcare. In KDD. ACM, 450--460.
[40]
Xi Sheryl Zhang, Fengyi Tang, Hiroko H. Dodge, Jiayu Zhou, and Fei Wang. 2019. MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records. In KDD. ACM, 2487--2495.

Cited By

View all
  • (2024)ProtoMix: Augmenting Health Status Representation Learning via Prototype-based MixupProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671937(3633-3644)Online publication date: 25-Aug-2024
  • (2024)Time-Aware Attention-Based Transformer (TAAT) for Cloud Computing System Failure PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671547(4906-4917)Online publication date: 25-Aug-2024
  • (2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
    August 2021
    4259 pages
    ISBN:9781450383325
    DOI:10.1145/3447548
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. healthcare informatics
    2. pre-training
    3. representation learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • National Key R&D Program of China

    Conference

    KDD '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)179
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ProtoMix: Augmenting Health Status Representation Learning via Prototype-based MixupProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671937(3633-3644)Online publication date: 25-Aug-2024
    • (2024)Time-Aware Attention-Based Transformer (TAAT) for Cloud Computing System Failure PredictionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671547(4906-4917)Online publication date: 25-Aug-2024
    • (2024)MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer RechargeProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671533(4862-4872)Online publication date: 25-Aug-2024
    • (2024)Robust Sequence-Based Self-Supervised Representation Learning for Anti-Money LaunderingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680078(4571-4578)Online publication date: 21-Oct-2024
    • (2024)Temporal-Spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care UnitIEEE Transactions on Biomedical Engineering10.1109/TBME.2023.330995671:2(583-595)Online publication date: Feb-2024
    • (2024)DeepApnea: Deep Learning Based Sleep Apnea Detection Using Smartwatches2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)10.1109/PerCom59722.2024.10494473(206-216)Online publication date: 11-Mar-2024
    • (2024)BioDynGrapExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122964244:COnline publication date: 2-Jul-2024
    • (2024)Transformers and large language models in healthcare: A reviewArtificial Intelligence in Medicine10.1016/j.artmed.2024.102900154(102900)Online publication date: Aug-2024
    • (2024)Transformers in health: a systematic review on architectures for longitudinal data analysisArtificial Intelligence Review10.1007/s10462-023-10677-z57:2Online publication date: 3-Feb-2024
    • (2024)Loss Function Role in Processing Sequences with Heavy-Tailed DistributionsIntelligent Data Engineering and Automated Learning – IDEAL 202410.1007/978-3-031-77731-8_33(361-374)Online publication date: 14-Nov-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media