Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

STMMI: : A Self-Tuning Multi-Modal Fusion Algorithm Applied in Assist Robot Interaction

Published: 01 January 2022 Publication History

Abstract

While facing complex surroundings, robots need to identify the same intention which is expressed in different ways. In order to solve the problem of assisting robots to get a better intention understanding, a self-tuning multimodal fusion algorithm is put forward in this paper, which is not restricted by the expressions of interacting participants and environment. The multimodal fusion algorithm can be transferred to different application platforms. Robots can own the understanding competence and adapt new tasks by changing the content of the robot knowledge base. Compared with other multimodal fusion algorithms, this paper attempts to transfer the basic structure of feed-forward neural networks on discrete sets, which has strengthened the consistency and perfect the complementary relations between multiple mode, and has realized the simultaneous operation of fusion operator’s self-tuning and intention search. There are three kinds of modes selected in the paper: speech, gesture, and scene objects, where the single modal classifiers are trained separately. This method conducted a human-computer interaction experiment on the bionic robot Pepper platform, which proved that the method can effectively improve the accuracy and robustness of robots in aspects of understanding human intentions, and reduce the uncertainty about intention judgment in a single modal interaction.

References

[1]
B. Tony, B. Paul, R. Robin, W. Rachel, H. Cuayáhuitl, K. Bernd, R. Stefania, I. Kruijff-Korbayová, G. Athanasopoulos, E. Valentin, R. Looije, N. Mark, Y. Demiris, R. Ros-Espinoza, A. Beck, C. Lola, H. Antione, M. Lewis, I. Baroni, M. Nalin, P. Cosi, G. Paci, F. Tesser, G. Sommavilla, and R. Humber, “Multimodal child-robot interaction: building social bonds,” Journal of Human Robot Interaction, vol. 1, no. 2, pp. 33–53, 2013.
[2]
E. S. Kim, L. D. Berkovits, E. P. Bernier, D. Leyzberg, F. Shic, R. Paul, and B. Scassellati, “Social robots as embedded reinforcers of social behavior in children with autism,” Journal of Autism and Developmental Disorders, vol. 43, no. 5, pp. 1038–1049, 2013.
[3]
M. Fridin, “Storytelling by a kindergarten social assistive robot: a tool for constructive learning in preschool education,” Computers & Education, vol. 70, pp. 53–64, 2014.
[4]
M. Yang and J. Tao, “Intelligence methods of multi-modal information fusion in human-computer interaction,” Scientia Sinica Informationis, vol. 48, no. 4, pp. 433–448, 2018.
[5]
P. R. Cohen and D. R. McGee, “Tangible multimodal interfaces for safety-critical applications,” Communications of the ACM, vol. 47, no. 1, pp. 41–46, 2004.
[6]
A. Jaimes and N. Sebe, “Multimodal human-computer interaction: a survey,” Computer Vision and Image Understanding, vol. 108, no. 1-2, pp. 116–134, 2007.
[7]
H. Kivrak, F. Cakmak, H. Kose, and S Yavuz, “Social navigation framework for assistive robots in human inhabited unknown environments,” Engineering Science and Technology, an International Journal, vol. 24, no. 2, pp. 284–298, 2021.
[8]
J. Hemminahaus and S. Kopp, “Towards Adaptive Social Behavior Generation for Assistive Robots Using Reinforcement learning,” in Proceedings of the 2017 12th ACM/IEEE International Conference on Human-Robot Interaction, pp. 332–340, Vienna, Austria, March 2017.
[9]
J. K. Chorowski, D. Bahdanau, D. Serdyuk, C. Kyunghyun, and B. Yoshua, “Attention-based models for speech recognition,” Advances in Neural Information Processing Systems, vol. 28, 2015.
[10]
M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces in images: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp. 34–58, 2002.
[11]
N. Elena, S. Oleg, A. C. Miguel, T. Jonathan, S. Valerii, R. Viktor, and T. Dzmitry, “CobotAR:Interaction with Robots Using Omnidirectionally Projected Image and DNN-Based Gesture Recognition,” in Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, October 2021.
[12]
S. J. Wang, W. J. Yan, X. Li, G. Zhao, C. G. Zhou, X. Fu, M. Yang, and J. Tao, “Micro-expression recognition using color spaces,” IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 6034–6047, 2015.
[13]
L. Chao, J. Tao, M. Yang, L. Ya, and W. Zhengqi, “Long short term memory recurrent neural network based multimodal dimensional emotion recognition,” in Proceedings of the 5th International Workshop on Audio/visual Emotion challenge, pp. 65–72, Brisbane Australia, October 2015.
[14]
M. Liwicki, M. Weber, T. Zimmermann, and D. Andreas, “Seamless Integration of Handwriting Recognition into Pen-Enabled Displays for Fast User Interaction,” in Proceedings of the 2012 Iapr International Workshop on Document Analysis Systems, IEEE, Gold Coast, QLD, Australia, March 2012.
[15]
F. Tian, F. Lu, Y. Jiang, X. L. Zhang, X. Cao, G. Dai, and H. Wang, “An exploration of pen tail gestures for interactions,” International Journal of Human-Computer Studies, vol. 71, no. 5, pp. 551–569, 2013.
[16]
S. Cheng, Z. Sun, L. Sun, Y. Kirsten, and K. Anind, “Gaze-based Annotations for reading comprehension,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1569–1572, Seoul Republic of Korea, April 2015.
[17]
M. Kim and J. Leskovec, “Latent multi-group membership graph model,” 2012, https://arxiv.org/abs/1205.4546.
[18]
A. Vanzo, D. Croce, E. Bastianelli, R. Basili, and D. Nardi, “Grounded language interpretation of robotic commands through structured learning,” Artificial Intelligence, vol. 278, 2020.
[19]
S. Trick, D. Koert, J. Peters, and C. Rothkopf, “Multimodal uncertainty reduction for intention recognition in human-robot interaction,” in Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7009–7016, IEEE, Macau, SAR, China, November 2019.
[20]
S. S. Dou, Z. Q. Feng, J. L. Tian, X. Fan, Y. Hou, and X. Zhang, “An intention understanding algorithm based on multimodal information fusion,” Scientific Programming, vol. 2021, 11 pages, 2021.
[21]
S. Ondáš, M. Pleva, R. Krištan, H. Rastislav, and J. Jozef, “VoMIS-the VoiceXML-Based Multimodal Interactive System for NAO Robot,” in Proceedings of the 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), pp. 315–320, IEEE, Košice, Slovakia, August 2018.
[22]
M. Wang, Z. Yan, T. Wang, P. Cai, S. Gao, Y. Zeng, C. Wan, H. Wang, L. Pan, J. Yu, S. Pan, K. He, J. Lu, and X. Chen, “Gesture recognition using a bioinspired learning architecture that integrates visual data with somatosensory data from stretchable sensors,” Nature Electronics, vol. 3, no. 9, pp. 563–570, 2020.
[23]
I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi, A. Katsamanis, A. Tsiami, and P. Maragos, “Multimodal Human Action Recognition in Assistive Human-Robot interaction,” in Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2702–2706, IEEE, Shanghai, China, March 2016.
[24]
S. K. Kim, E. A. Kirchner, A. Stefes, and F. Kirchner, “Intrinsic interactive reinforcement learning - using error-related potentials for real world human-robot interaction,” Scientific Reports, vol. 7, no. 1, 2017.
[25]
J. D. S. Ortega, M. Senoussaoui, E. Granger, M. Pedersoli, P. Cardinal, and L. K. Alessandro, “Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion recognition,” 2019, https://arxiv.org/abs/1907.03196.
[26]
L. Kaiser, A. N. Gomez, N. Shazeer, A. Vaswani, N. Parmar, L. Jones, and J. Uszkoreit, “One Model to Learn Them all,” 2017, https://arxiv.org/abs/1706.05137.
[27]
E. Tzeng, J. Hoffman, T. Darrell, and S. Kate, “Simultaneous deep transfer across domains and tasks[C],” in Proceedings of the IEEE international conference on computer vision, pp. 4068–4076, Santiago, Chile, December 2015.
[28]
C. Fagiani, M. Betke, and J. Gips, “Evaluation of tracking methods for human-computer interaction,” in Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, 2002, pp. 121–126, Orlando, FL, USA, December 2002.
[29]
C. Mollaret, A. A. Mekonnen, F. Lerasle, I. Ferrané, J. Pinquier, B. Boudet, and P Rumeau, “A multi-modal perception based assistive robotic system for the elderly,” Computer Vision and Image Understanding, vol. 149, pp. 78–97, 2016.
[30]
D. Vaufreydaz, W. Johal, and C. Combe, “Starting engagement detection towards a companion robot using multimodal features,” Robotics and Autonomous Systems, vol. 75, pp. 4–16, 2016.
[31]
J. Redmon and A. Farhadi, “Yolov3: An Incremental improvement,” 2018, https://arxiv.org/abs/1804.02767.
[32]
Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, and H. Wang, “Ernie 2.0: a continual pre-training framework for language understanding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8968–8975, NY, USA, February 2020.
[33]
B. Xie, X. He, and Y. Li, “RGB‐D static gesture recognition based on convolutional neural network,” Journal of Engineering, vol. 2018, no. 16, pp. 1515–1520, 2018.

Index Terms

  1. STMMI: A Self-Tuning Multi-Modal Fusion Algorithm Applied in Assist Robot Interaction
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Scientific Programming
          Scientific Programming  Volume 2022, Issue
          2022
          11290 pages
          ISSN:1058-9244
          EISSN:1875-919X
          Issue’s Table of Contents
          This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

          Publisher

          Hindawi Limited

          London, United Kingdom

          Publication History

          Published: 01 January 2022

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 17 Feb 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media