Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

Published: 24 September 2024 Publication History

Abstract

Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a visual analytics system that leverages the collaboration between human and Large Language Models (LLMs) to analyze the extensive and complex data of acute ischemic stroke patients. PhenoFlow pioneers an innovative workflow, where the LLM serves as a data wrangler while neurologists explore and supervise the output using visualizations and natural language interactions. This approach enables neurologists to focus more on decision-making with reduced cognitive load. To protect sensitive patient information, PhenoFlow only utilizes metadata to make inferences and synthesize executable codes, without accessing raw patient data. This ensures that the results are both reproducible and interpretable while maintaining patient privacy. The system incorporates a slice-and-wrap design that employs temporal folding to create an overlaid circular visualization. Combined with a linear bar graph, this design aids in exploring meaningful patterns within irregularly measured BP data. Through case studies, PhenoFlow has demonstrated its capability to support iterative analysis of extensive clinical datasets, reducing cognitive load and enabling neurologists to make well-informed decisions. Grounded in long-term collaboration with domain experts, our research demonstrates the potential of utilizing LLMs to tackle current challenges in data-driven clinical decision-making for acute ischemic stroke patients.

References

[1]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., Gpt-4 technical report. arXiv preprint arXiv:, 2023. 3.
[2]
H. P. Adams Jr, B. H. Bendixen, L. J. Kappelle, J. Biller, B. B. Love, D. L. Gordon, and E. Marsh 3rd. Classification of subtype of acute ischemic stroke. definitions for use in a multicenter clinical trial. toast. trial of org 10172 in acute stroke treatment. stroke, 24(1):35–41, 1993. 3.
[3]
W. Aigner, S. Miksch, W. Müller, H. Schumann, and C. Tominski. Visu-alizing time-oriented data-a systematic view. Computers & Graphics, 31(3):401–409, 2007. 2.
[4]
G. W. Albers, L. R. Caplan, J. D. Easton, P. B. Fayad, J. Mohr, J. L. Saver, and D. G. Sherman. Transient ischemic attack-proposal for a new definition, 2002. 3.
[5]
H.-J. Bae. David g. sherman lecture award: 15-year experience of the nationwide multicenter stroke registry in korea. Stroke, 53(9):2976–2987, 2022. 2,3,7.
[6]
I. Batal, L. Sacchi, R. Bellazzi, and M. Hauskrecht. A temporal abstraction framework for classifying clinical temporal data. In AMIA Annual Symposium Proceedings, vol. 2009, p. 29. American Medical Informatics Association, 2009. 2.
[7]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. 3.
[8]
J. J. Caban and D. Gotz. Visual analytics in healthcare-opportunities and research challenges. Journal of the American Medical Informatics Association, 22(2):260–262, 2015. 2.
[9]
N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang. Quantifying memorization across neural language models. arXiv preprint arXiv:, 2022. 3.
[10]
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson et al., Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650, 2021. 3.
[11]
V. Caruso, A. Cattaneo, and J.-L. Gurtner. Creating technology-enhanced scenarios to promote observation skills of fashion-design students. Form@ re-Open Journal per la formazione in rete, 17(1):4–17, 2017. 7.
[12]
K.-T. Chen, T. Dwyer, B. Bach, and K. Marriott. Rotate or wrap? inter-active visualisations of cyclical data on cylindrical or toroidal topologies. IEEE Transactions on Visualization and Computer Graphics, 28(1):727–736, 2021. 7.
[13]
M. Cho, B. Kim, H.-J. Bae, and J. Seo. Stroscope: Multi-scale visualization of irregularly measured time-series data. IEEE transactions on visualization and computer graphics, 20(5):808–821, 2014. 2,3,4.
[14]
N. J. Dobbins, B. Han, W. Zhou, K. F. Lan, H. N. Kim, R. Harrington, O. Uzuner, and M. Yetisgen. Leafai: query generator for clinical cohort discovery rivaling a human programmer. Journal of the American Medical Informatics Association, 30(12):1954–1964, 2023.3.
[15]
F. Du, B. Shneiderman, C. Plaisant, S. Malik, and A. Perer. Coping with volume and variety in temporal event sequences: Strategies for sharpening analytic focus. IEEE transactions on visualization and computer graphics, 23(6):1636–1649, 2016.2.
[16]
D. Duong and B. D. Solomon. Analysis of large-language model versus human performance for genetics questions. European Journal of Human Genetic, pp. 1–3, 2023.3,5.
[17]
A. S. Elstein, L. S. Shulman, and S. A. Sprafka. Medical problem solving: An analysis of clinical reasoning. Harvard University Press, 1978. 4.
[18]
P. Federico, S. Hoffmann, A. Rind, W. Aigner, and S. Miksch. Qualizon graphs: Space-efficient time-series visualization with qualitative abstractions. In Proceedings of the 2014 international working conference on advanced visual interfaces, pp. 273–280, 2014. 2.
[19]
Q. Guo, S. Cao, and Z. Yi. A medical question answering system using large language models and knowledge graphs. International Journal of Intelligent Systems, 37(11):8548–8564, 2022. 3.
[20]
Y. Guo, S. Guo, Z. Jin, S. Kaul, D. Gotz, and N. Cao. Survey on visual analysis of event sequence data. IEEE Transactions on Visualization and Computer Graphics, 28(12):5091–5112, 2021. 2.
[21]
A. Hamidi and K. Roberts. Evaluation of ai chatbots for patient-specific ehr questions. arXiv preprint arXiv:, 2023. 3, 5.
[22]
K. He, R. Mao, Q. Lin, Y. Ruan, X. Lan, M. Feng, and E. Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv preprint arXiv:, 2023. 3.
[23]
J. Holmes, Z. Liu, L. Zhang, Y. Ding, T. T. Sio, L. A. McGee, J. B. Ashman, X. Li, T. Liu, J. Shen et al., Evaluating large language models on a highly-specialized topic, radiation oncology physics. Frontiers in Oncology, 13, 2023. 3, 5.
[24]
E. C. Jauch, J. L. Saver, H. P. Adams Jr, A. Bruno, J. Connors, B. M. Demaerschalk, P. Khatri, P. W. McMullan Jr, A. I. Qureshi, K. Rosenfield et al., Guidelines for the early management of patients with acute ischemic stroke: a guideline for healthcare professionals from the american heart association/american stroke association. Stroke, 44(3):870–947, 2013. 3.
[25]
A. Johnson, L. Bulgarelli, T. Pollard, S. Horng, L. A. Celi, and R. Mark. Mimic-iv. PhysioNet. Available online at: https://physionet.org/content/mimiciv/1.0/(accessed August 23,2021), pp. 49–55, 2020. 2.
[26]
A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghas-semi, B. Moody, P. Szolovits, L. Anthony Celi, and R. G. Mark. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016. 2.
[27]
T. Lammarsch, W. Aigner, A. Bertone, M. Bögl, T. Gschwandtner, S. Miksch, and A. Rind. Interactive visual transformation for symbolic representation of time-oriented data. In Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data: Third International Workshop, HCI-KDD 2013, Held at SouthCHI 2013, Maribor, Slovenia, July 1–3,2013. Proceedings, pp. 400–419. Springer, 2013.2.
[28]
C. M. Lawes, D. A. Bennett, V. L. Feigin, and A. Rodgers. Blood pressure and stroke: an overview of published reviews. Stroke, 35(3):776–785, 2004.3,4.
[29]
E. Lindenstr⊘m, G. Boysen, and J. Nyboe. Influence of systolic and diastolic blood pressure on stroke risk: a prospective observational study. American journal of epidemiology, 142(12):1279–1290, 1995. 3,4.
[30]
M. H. Loorak, C. Perin, N. Kamal, M. Hill, and S. Carpendale. Timespan: Using visualization to explore temporal multidimensional data of stroke patients. IEEE transactions on visualization and computer graphics, 22(1):409–418, 2015. 2,3,4.
[31]
S. LYi, J. Jo, and J. Seo. Comparative layouts revisited: Design space, guidelines, and future directions. IEEE Transactions on Visualization and Computer Graphics, 27(2):1525–1535, 2020. 2.
[32]
E. A. Mistry, H. Sucharew, A. M. Mistry, T. Mehta, N. Arora, A. K. Starosciak, F. De Los Rios La Rosa, J. E. Siegler III, N. R. Barnhill, K. Patel et al., Blood pressure after endovascular therapy for ischemic stroke (best) a multicenter prospective cohort study. Stroke, 50(12):3449–3455, 2019.4.
[33]
R. Moskovitch and Y. Shahar. Medical temporal-knowledge discovery via temporal abstraction. In AMIA annual symposium proceedings, vol. 2009, p. 452. American Medical Informatics Association, 2009. 2.
[34]
T. B. Murdoch and A. S. Detsky. The inevitable application of big data to health care. Jama, 309(13):1351–1352, 2013. 2.
[35]
A. Narayan, I. Chami, L. Orr, S. Arora, and C. Ré. Can foundation models wrangle your data? arXiv preprint arXiv:, 2022. 3.
[36]
S. Nusrat, T. Harbig, and N. Gehlenborg. Tasks, techniques, and tools for genomic data visualization. In Computer Graphics Forum, vol. 38, pp. 781–805. Wiley Online Library, 2019. 2,7.
[37]
N. Oh, G.-S. Choi, and W. Y. Lee. Chatgpt goes to the operating room: evaluating gpt-4 performance and its potential in surgical education and training in the era of large language models. Annals of Surgical Treatment and Research, 104(5):269,2023. 3,5.
[38]
C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman. Lifelines: using visualization to enhance navigation and analysis of patient records. In The craft of information visualization, pp. 308–312. Elsevier, 2003. 2.
[39]
J. Qiu, L. Li, J. Sun, J. Peng, P. Shi, R. Zhang, Y. Dong, K. Lam, F. P.-W. Lo, B. Xiao et al., Large ai models in health informatics: Applications, challenges, and the future. IEEE Journal of Biomedical and Health Informatics, 2023. 3.
[40]
M. Rastegar-Mojarad, S. Liu, Y. Wang, N. Afzal, L. Wang, F. Shen, S. Fu, and H. Liu. Biocreative/ohnlp challenge 2018. In Proceedings of the 2018 ACM International Conference on Bioinformatics. Computational Biology, and Health Informatics, pp. 575–575, 2018. 3.
[41]
A. Rind, W. Aigner, S. Miksch, S. Wiltner, M. Pohl, F. Drexler, B. Neubauer, and N. Suchy. Visually exploring multivariate trends in patient cohorts using animated scatter plots. In Ergonomics and Health Aspects of Work with Computers: International Conference, EHAWC 2011, Held as Part of HCI International 2011, Orlando, FL, USA, July 9–14, 2011. Proceedings, pp. 139–148. Springer, 2011. 2.
[42]
J. S. Samaan, Y. H. Yeo, N. Rajeev, L. Hawley, S. Abel, W. H. Ng, N. Srinivasan, J. Park, M. Burch, R. Watson et al., Assessing the accuracy of responses by the language model chatgpt to questions regarding bariatric surgery. Obesity surgery, 33(6): 1790–1796, 2023. 3,5.
[43]
J. Scheer, A. Volkert, N. Brich, L. Weinert, N. Santhanam, M. Krone, T. Ganslandt, M. Boeker, and T. Nagel. Visualization techniques of time-oriented data for the comparison of single patients with multiple patients or cohorts: Scoping review. Journal of medical Internet research, 24(10):e38041, 2022. 2.
[44]
M. Sedlmair, M. Meyer, and T. Munzner. Design study methodology: Reflections from the trenches and the stacks. IEEE transactions on visualization and computer graphics, 18(12):2431–2440, 2012. 4.
[45]
W. Shi, R. Xu, Y. Zhuang, Y. Yu, J. Zhang, H. Wu, Y. Zhu, J. Ho, C. Yang, and M. D. Wang. Ehragent: Code empowers large language models for complex tabular reasoning on electronic health records. arXiv preprint arXiv:, 2024. 3.
[46]
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024. 5.
[47]
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl et al., Large language models encode clinical knowledge. Nature, 620(7972): 172–180, 2023. 3.
[48]
K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al., Towards expert-level med-ical question answering with large language models. arXiv preprint arXiv:, 2023. 3.
[49]
S. Soni, S. Datta, and K. Roberts. quehry: a question answering system to query electronic health records. Journal of the American Medical Informatics Association, 30(6):1091–1102, 2023. 3.
[50]
S. Thapa and S. Adhikari. Chatgpt, bard, and large language models for biomedical research: opportunities and pitfalls. Annals of biomedical engineering, 51(12):2647–2651, 2023. 5.
[51]
I. Trummer. Codexdb: Synthesizing code for query processing from natural language instructions using gpt-3 codex. Proceedings of the VLDB Endowment, 15(11):2921–2928, 2022. 3.
[52]
M. Van Someren, Y. F. Barnard, and J. Sandberg. The think aloud method: a practical approach to modelling cognitive. London: AcademicPress, 11(6), 1994. 4, 7.
[53]
I. Viola and T. Isenberg. Pondering the concept of abstraction in (illus-trative) visualization. IEEE transactions on visualization and computer graphics, 24(9):2573–2588, 2017. 7.
[54]
T. D. Wang, C. Plaisant, B. Shneiderman, N. Spring, D. Roseman, G. Marc-hand, V. Mukherjee, and M. Smith. Temporal summaries: Supporting temporal categorical searching, aggregation and comparison. IEEE trans-actions on visualization and computer graphics, 15(6):1049–1056, 2009. 2.
[55]
Z. Wang, R. Li, B. Dong, J. Wang, X. Li, N. Liu, C. Mao, W. Zhang, L. Dong, J. Gao et al., Can llms like gpt-4 outperform traditional ai tools in dementia diagnosis? maybe, but not today. arXiv preprint arXiv:, 2023. 3,5.
[56]
E. U. Weber, U. Böckenholt, D. J. Hilton, and B. Wallace. Determinants of diagnostic hypothesis generation: effects of information, base rates, and experience. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(5):1151, 1993. 4.
[57]
Y. Zhang, K. Chanana, and C. Dunne. Idmvis: Temporal event sequence visualization for type 1 diabetes treatment decision support. IEEE trans-actions on visualization and computer graphics, 25(1):512–522, 2018. 2.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics
IEEE Transactions on Visualization and Computer Graphics  Volume 31, Issue 1
Jan. 2025
1276 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 24 September 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media