Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Real-time Context-Aware Multimodal Network for Activity and Activity-Stage Recognition from Team Communication in Dynamic Clinical Settings

Published: 28 March 2023 Publication History

Abstract

In clinical settings, most automatic recognition systems use visual or sensory data to recognize activities. These systems cannot recognize activities that rely on verbal assessment, lack visual cues, or do not use medical devices. We examined speech-based activity and activity-stage recognition in a clinical domain, making the following contributions. (1) We collected a high-quality dataset representing common activities and activity stages during actual trauma resuscitation events-the initial evaluation and treatment of critically injured patients. (2) We introduced a novel multimodal network based on audio signal and a set of keywords that does not require a high-performing automatic speech recognition (ASR) engine. (3) We designed novel contextual modules to capture dynamic dependencies in team conversations about activities and stages during a complex workflow. (4) We introduced a data augmentation method, which simulates team communication by combining selected utterances and their audio clips, and showed that this method contributed to performance improvement in our data-limited scenario. In offline experiments, our proposed context-aware multimodal model achieved F1-scores of 73.2±0.8% and 78.1±1.1% for activity and activity-stage recognition, respectively. In online experiments, the performance declined about 10% for both recognition types when using utterance-level segmentation of the ASR output. The performance declined about 15% when we omitted the utterance-level segmentation. Our experiments showed the feasibility of speech-based activity and activity-stage recognition during dynamic clinical events.

References

[1]
Jalal Abdulbaqi, Yue Gu, Zhichao Xu, Chenyang Gao, Ivan Marsic, and Randall S Burd. 2020. Speech-Based Activity Recognition for Trauma Resuscitation, in IEEE International Conference on Healthcare Informatics (ICHI). IEEE, Oldenburg, Germany, p. 1--8. https://doi.org/10.1109/ICHI48887.2020.9374372.
[2]
Rebecca Adaimi, Howard Yong, and Edison Thomaz. 2021. Ok Google, What Am I Doing? Acoustic Activity Recognition Bounded by Conversational Assistant Interactions, in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 5(1), p. 1--24. https://doi.org/doi.org/10.1145/3448090.
[3]
Ali Ahmadvand, Jason Ingyu Choi, and Eugene Agichtein. 2019. Contextual dialogue act classification for open-domain conversational agents, in Proceedings of the 42nd international acm sigir conference on research and development in information retrieval, p. 1273--1276. https://doi.org/10.1145/3331184.3331375.
[4]
Emily Alsentzer, John R Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan Naumann, and Matthew McDermott. 2019. Publicly available clinical BERT embeddings, in 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, p. 72--78. https://doi.org/10.18653/v1/W19-1909.
[5]
Akin Avci, Stephan Bosch, Mihai Marin-Perianu, Raluca Marin-Perianu, and Paul Havinga. 2010. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey, in 23th International conference on architecture of computing systems 2010. VDE, p. 1--10.
[6]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization, in arXiv preprint arXiv:1607.06450. https://doi.org/10.48550/arXiv.1607.06450.
[7]
Chandrakant Bothe, Sven Magg, Cornelius Weber, and Stefan Wermter. 2018. Conversational analysis using utterance-level attention-based bidirectional recurrent neural networks. INTERSPEECH. https://doi.org/10.21437/Interspeech.2018-2527.
[8]
Herve A Bourlard and Nelson Morgan, Connectionist speech recognition: a hybrid approach. Vol. 247. 2012: Springer Science & Business Media.
[9]
Maxime Burchi and Valentin Vielzeuf. 2021. Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition, arXiv preprint arXiv:2109.01163. 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), p. 8--15. https://doi.org/10.1109/ASRU51503.2021.9687874.
[10]
Ishani Chakraborty, Ahmed Elgammal, and Randall S Burd. 2013. Video based activity recognition in trauma resuscitation, in 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, p. 1--8. https://doi.org/10.1109/FG.2013.6553758.
[11]
William Chan, Navdeep Jaitly, Quoc V Le, and Oriol Vinyals. 2015. Listen, attend and spell, in arXiv preprint arXiv:1508.01211. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/ICASSP.2016.7472621.
[12]
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, and Xiong Xiao. 2021. Wavlm: Large-scale self-supervised pre-training for full stack speech processing, in IEEE Journal of Selected Topics in Signal Processing. https://doi.org/10.1109/JSTSP.2022.3188113.
[13]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, p. 1251--1258. https://doi.org/10.1109/CVPR.2017.195.
[14]
Yohan Chon, Nicholas D Lane, Fan Li, Hojung Cha, and Feng Zhao. 2012. Automatically characterizing places with opportunistic crowdsensing using smartphones, in Proceedings of the 2012 ACM conference on ubiquitous computing, p. 481--490. https://doi.org/doi.org/10.1145/2370216.2370288.
[15]
Pierre Colombo, Emile Chapuis, Matteo Manica, Emmanuel Vignon, Giovanna Varni, and Chloe Clavel. 2020. Guiding attention in sequence-to-sequence models for dialogue act prediction, in Proceedings of the AAAI Conference on Artificial Intelligence, p. 7594--7601. https://doi.org/10.1609/aaai.v34i05.6259.
[16]
Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks, in International conference on machine learning. PMLR, p. 933--941. https://doi.org/10.48550/arXiv.1612.08083.
[17]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Association for Computational Linguistics, Minneapolis, Minnesota, USA, p. 4171--4186. https://doi.org/10.18653/v1/N19-1423.
[18]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, and Sylvain Gelly. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2010.11929.
[19]
Mark Fitzgerald, Peter Cameron, Colin Mackenzie, Nathan Farrow, Pamela Scicluna, Robert Gocentas, Adam Bystrzycki, Geraldine Lee, Nick Andrianopoulos, and Linas Dziukas. 2011. Trauma resuscitation errors and computer-assisted decision support, in Archives of Surgery. 146(2), p. 218--225. https://doi.org/10.1001/archsurg.2010.333.
[20]
Mark Fitzgerald, Rob Gocentas, Linas Dziukas, Peter Cameron, Colin Mackenzie, and Nathan Farrow. 2006. Using video audit to improve trauma resuscitation---time for a new approach, in Canadian journal of surgery. 49(3), p. 208.
[21]
Russell L Gruen, Gregory J Jurkovich, Lisa K McIntyre, Hugh M Foy, and Ronald V Maier. 2006. Patterns of errors contributing to trauma mortality: lessons learned from 2594 deaths, in Annals of surgery. 244(3), p. 371. https://doi.org/10.1097/01.sla.0000234655.83517.56.
[22]
Yue Gu, Xinyu Li, Shuhong Chen, Jianyu Zhang, and Ivan Marsic. 2017. Speech intention classification with multimodal deep learning, in Canadian conference on artificial intelligence. Springer, p. 260--271. https://doi.org/10.1007/978-3-319-57351-9_30.
[23]
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, and Ivan Marsic. 2018. Multimodal affective analysis using hierarchical attention strategy with word-level alignment, in Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access, p. 2225. https://doi.org/10.18653/v1/P18-1207.
[24]
Yue Gu, Ruiyu Zhang, Xinwei Zhao, Shuhong Chen, Jalal Abdulbaqi, Ivan Marsic, Megan Cheng, and Randall S Burd. 2019. Multimodal attention network for trauma activity recognition from spoken language and environmental sound, 2019 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, p. 1--6. https://doi.org/10.1109/ICHI.2019.8904713.
[25]
Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, and Yonghui Wu. 2020. Conformer: Convolution-augmented Transformer for Speech Recognition. INTERSPEECH. https://doi.org/10.21437/Interspeech.2020-3015.
[26]
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus), in arXiv preprint arXiv:1606.08415. https://doi.org/10.48550/arXiv.1606.08415.
[27]
John B Holcomb, Russell D Dumire, John W Crommett, Connie E Stamateris, Matthew A Fagert, Jim A Cleveland, Gina R Dorlac, Warren C Dorlac, James P Bonar, and Kenji Hira. 2002. Evaluation of trauma team performance using an advanced human patient simulator for resuscitation training, in Journal of Trauma and Acute Care Surgery. 52(6), p. 1078--1086. https://doi.org/10.1097/00005373-200206000-00009.
[28]
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units, in IEEE/ACM Transactions on Audio, Speech, and Language Processing. 29, p. 3451--3460. https://doi.org/10.1109/TASLP.2021.3122291.
[29]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift, in International conference on machine learning. PMLR, p. 448--456. https://doi.org/10.48550/arXiv.1502.03167.
[30]
Swathi Jagannath, Neha Kamireddi, Katherine Ann Zellner, Randall S Burd, Ivan Marsic, and Aleksandra Sarcevic. 2022. A Speech-Based Model for Tracking the Progression of Activities in Extreme Action Teamwork, in Proceedings of the ACM on Human-Computer Interaction. 6(CSCW1), p. 1--26. https://doi.org/10.1145/3512920.
[31]
Swathi Jagannath, Aleksandra Sarcevic, Neha Kamireddi, and Ivan Marsic. 2019. Assessing the Feasibility of Speech-Based Activity Recognition in Dynamic Medical Settings, in Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, p. 1--6. https://doi.org/10.1145/3290607.3312983.
[32]
Swathi Jagannath, Aleksandra Sarcevic, and Ivan Marsic. 2018. An analysis of speech as a modality for activity recognition during complex medical teamwork, in Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare, p. 88--97. https://doi.org/10.1145/3240925.3240941.
[33]
Charmi Jobanputra, Jatna Bavishi, and Nishant Doshi. 2019. Human activity recognition: A survey, in Procedia Computer Science. 155, p. 698--703. https://doi.org/10.1016/j.procs.2019.08.100.
[34]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization, in arXiv preprint arXiv:1412.6980. https://doi.org/doi.org/10.48550/arXiv.1412.6980.
[35]
Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, and Sachindra Joshi. 2018. Dialogue act sequence labeling using hierarchical encoder with crf, in Proceedings of the aaai conference on artificial intelligence. https://doi.org/10.1609/aaai.v32i1.11701.
[36]
Nicholas D Lane, Petko Georgiev, and Lorena Qendro. 2015. Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning, in Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, p. 283--294. https://doi.org/doi.org/10.1145/2750858.2804262.
[37]
Ji Young Lee and Franck Dernoncourt. 2016. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p. 515--520. https://doi.org/10.18653/v1/N16-1062.
[38]
Keyi Li, Sen Yang, Travis M Sullivan, Randall S Burd, and Ivan Marsic. 2022. Exploring Runtime Decision Support for Trauma Resuscitation, in arXiv preprint arXiv:2207.02922. https://doi.org/10.48550/arXiv.2207.02922.
[39]
Ruizhe Li, Chenghua Lin, Matthew Collinson, Xiao Li, and Guanyi Chen. 2019. A Dual-Attention Hierarchical Recurrent Neural Network for Dialogue Act Classification, in Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), p. 383--392. https://doi.org/10.18653/v1/K19-1036.
[40]
Xinyu Li, Dongyang Yao, Xuechao Pan, Jonathan Johannaman, JaeWon Yang, Rachel Webman, Aleksandra Sarcevic, Ivan Marsic, and Randall S Burd. 2016. Activity recognition for medical teamwork based on passive RFID, in 2016 IEEE international conference on RFID (RFID). IEEE, p. 1--9. https://doi.org/10.1109/RFID.2016.7488002.
[41]
Xinyu Li, Yanyi Zhang, Mengzhu Li, Shuhong Chen, Farneth R Austin, Ivan Marsic, and Randall S Burd. 2016. Online process phase detection using multimodal deep learning, in 2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, p. 1--7. https://doi.org/10.1109/UEMCON.2016.7777912.
[42]
Xinyu Li, Yanyi Zhang, Ivan Marsic, Aleksandra Sarcevic, and Randall S Burd. 2016. Deep learning for rfid-based activity recognition, in Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, p. 164--175. https://doi.org/10.1145/2994551.2994569.
[43]
Xinyu Li, Yanyi Zhang, Jianyu Zhang, Yueyang Chen, Huangcan Li, Ivan Marsic, and Randall S Burd. 2017. Region-based activity recognition using conditional GAN, in Proceedings of the 25th ACM international conference on Multimedia, p. 1059--1067. https://doi.org/10.1145/3123266.3123365.
[44]
Dawei Liang and Edison Thomaz. 2019. Audio-based activities of daily living (adl) recognition with large-scale acoustic embeddings from online videos, in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 3(1), p. 1--18. https://doi.org/doi.org/10.1145/3314404.
[45]
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit, in Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, p. 63--70.
[46]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space, in arXiv preprint arXiv:1301.3781. https://doi.org/10.48550/arXiv.1301.3781.
[47]
Abdelrahman Mohamed, Dmytro Okhonko, and Luke Zettlemoyer. 2019. Transformers with convolutional context for ASR. https://doi.org/10.48550/arXiv.1904.11660.
[48]
Ed Oakley, Sergio Stocker, Georg Staubli, and Simon Young. 2006. Using video recording to identify management errors in pediatric trauma resuscitation, in Pediatrics. 117(3), p. 658--664. https://doi.org/10.1542/peds.2004-1803.
[49]
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: an asr corpus based on public domain audio books, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, p. 5206--5210. https://doi.org/10.1109/ICASSP.2015.7178964.
[50]
Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. INTERSPEECH. https://doi.org/10.21437/Interspeech.2019-2680.
[51]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. Pytorch: An imperative style, high-performance deep learning library, in Advances in neural information processing systems. 32, p. 8026--8037. https://doi.org/10.48550/arXiv.1912.01703.
[52]
Vipul Raheja and Joel Tetreault. 2019. Dialogue act classification with context-aware self-attention. Association for Computational Linguistics, Minneapolis, Minnesota, USA, p. 3727--3733. https://doi.org/10.18653/v1/N19-1373.
[53]
Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions, in arXiv preprint arXiv:1710.05941. https://doi.org/10.48550/arXiv.1710.05941.
[54]
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, and Yoshua Bengio. 2018. Light gated recurrent units for speech recognition, in IEEE Transactions on Emerging Topics in Computational Intelligence. 2(2), p. 92--102. https://doi.org/10.1109/TETCI.2017.2762739.
[55]
Aidean Sharghi, Helene Haugerud, Daniel Oh, and Omid Mohareri. 2020. Automatic operating room surgical activity recognition for robot-assisted surgery, in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, p. 385--395. https://doi.org/10.1007/978-3-030-59716-0_37.
[56]
Xingyi Song, Johann Petrak, and Angus Roberts. 2018. A deep neural network sentence level classification method with context information. Association for Computational Linguistics, p. 900--904. https://doi.org/10.18653/v1/D18-1107.
[57]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting, in The journal of machine learning research. 15(1), p. 1929--1958.
[58]
Hang Su, Wen Qi, Chenguang Yang, Jiehao Li, Xuanyi Zhou, Giancarlo Ferrigno, and Elena De Momi. 2020. Human Activity Recognition Enhanced Robot-Assisted Minimally Invasive Surgery, in International Conference on Robotics in Alpe-Adria Danube Region. Springer, p. 121--129. https://doi.org/10.1007/978-3-030-48989-2_14.
[59]
ATLS Subcommittee and International ATLS Working Group. 2013. Advanced trauma life support (ATLS®): the ninth edition, in The journal of trauma and acute care surgery. 74(5), p. 1363--1366. https://doi.org/10.1097/TA.0b013e31828b82f5.
[60]
Bruno Trstenjak, Sasa Mikac, and Dzenana Donko. 2014. KNN with TF-IDF based framework for text categorization, in Procedia Engineering. 69, p. 1356--1364. https://doi.org/10.1016/j.proeng.2014.03.129.
[61]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences, in Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access, p. 6558. https://doi.org/10.18653/v1/P19-1656.
[62]
Donald D Vernon, Ronald A Furnival, Kristine W Hansen, Edma M Diller, Robert G Bolte, Dale G Johnson, and J Michael Dean. 1999. Effect of a pediatric trauma response team on emergency department treatment time and mortality of pediatric trauma victims, in Pediatrics. 103(1), p. 20--24. https://doi.org/10.1542/peds.103.1.20.
[63]
Yongqiang Wang, Abdelrahman Mohamed, Due Le, Chunxi Liu, Alex Xiao, Jay Mahadeokar, Hongzhao Huang, Andros Tjandra, Xiaohui Zhang, and Frank Zhang. 2020. Transformer-based acoustic modeling for hybrid speech recognition, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, p. 6874--6878. https://doi.org/10.1109/ICASSP40776.2020.9054345.
[64]
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, and Nanxin Chen. 2018. Espnet: End-to-end speech processing toolkit, in arXiv preprint arXiv:1804.00015. INTERSPEECH. https://doi.org/10.21437/Interspeech.2018-1456.
[65]
Sen Yang, Xin Dong, Leilei Sun, Yichen Zhou, Richard A Farneth, Hui Xiong, Randall S Burd, and Ivan Marsic. 2017. A data-driven process recommender framework, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, p. 2111--2120. https://doi.org/10.1145/3097983.3098174.
[66]
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y Lin, Andy T Liu, Jiatong Shi, Xuankai Chang, and Guan-Ting Lin. 2021. Superb: Speech processing universal performance benchmark, in arXiv preprint arXiv:2105.01051. INTERSPEECH. https://doi.org/10.21437/Interspeech.2021-1775.
[67]
Zhang Yun-tao, Gong Ling, and Wang Yong-cheng. 2005. An improved TF-IDF approach for text classification, in Journal of Zhejiang University-Science A. 6(1), p. 49--55. https://doi.org/10.1007/BF02842477.
[68]
Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of TF IDF, LSI and multi-words for text classification, in Expert systems with applications. 38(3), p. 2758--2765. https://doi.org/10.1016/j.eswa.2010.08.066.
[69]
Yanyi Zhang, Ivan Marsic, and Randall S Burd. 2021. Real-time medical phase recognition using long-term video understanding and progress gate method, in Medical Image Analysis. 74, p. 102224. https://doi.org/10.1016/j.media.2021.102224.
[70]
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books, in Proceedings of the IEEE international conference on computer vision, p. 19--27. https://doi.org/10.1109/ICCV.2015.11.

Cited By

View all
  • (2024)SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal SignalsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997798:4(1-30)Online publication date: 21-Nov-2024
  • (2024)Providing Context to the "Unknown": Patient and Provider Reflections on Connecting Personal Tracking, Patient-Reported Insights, and EHR Data within a Post-COVID ClinicProceedings of the ACM on Human-Computer Interaction10.1145/36869888:CSCW2(1-34)Online publication date: 8-Nov-2024
  • (2024)G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596238:2(1-33)Online publication date: 15-May-2024
  • Show More Cited By

Index Terms

  1. Real-time Context-Aware Multimodal Network for Activity and Activity-Stage Recognition from Team Communication in Dynamic Clinical Settings

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 1
      March 2023
      1243 pages
      EISSN:2474-9567
      DOI:10.1145/3589760
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 March 2023
      Published in IMWUT Volume 7, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. activity recognition
      2. activity-stage recognition
      3. context-aware recognition
      4. keyword spotting
      5. real-time application

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)243
      • Downloads (Last 6 weeks)38
      Reflects downloads up to 22 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal SignalsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997798:4(1-30)Online publication date: 21-Nov-2024
      • (2024)Providing Context to the "Unknown": Patient and Provider Reflections on Connecting Personal Tracking, Patient-Reported Insights, and EHR Data within a Post-COVID ClinicProceedings of the ACM on Human-Computer Interaction10.1145/36869888:CSCW2(1-34)Online publication date: 8-Nov-2024
      • (2024)G-VOILA: Gaze-Facilitated Information Querying in Daily ScenariosProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596238:2(1-33)Online publication date: 15-May-2024
      • (2024)CrossHAR: Generalizing Cross-dataset Human Activity Recognition via Hierarchical Self-Supervised PretrainingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595978:2(1-26)Online publication date: 15-May-2024
      • (2023)HARE: Unifying the Human Activity Recognition Engineering WorkflowSensors10.3390/s2323957123:23(9571)Online publication date: 2-Dec-2023
      • (2023)Integrating Gaze and Mouse Via Joint Cross-Attention Fusion Net for Students' Activity Recognition in E-learningProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108767:3(1-35)Online publication date: 27-Sep-2023
      • (2023)Automating medical simulationsJournal of Biomedical Informatics10.1016/j.jbi.2023.104446144:COnline publication date: 1-Aug-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media