Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3536220.3558034acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Improving Supervised Learning in Conversational Analysis through Reusing Preprocessing Data as Auxiliary Supervisors

Published: 07 November 2022 Publication History

Abstract

Emotions recognition systems are trained using noisy human labels and often require heavy preprocessing during multi-modal feature extraction. Using noisy labels in single-task learning increases the risk of over-fitting. Auxiliary tasks could improve the performance of the primary task learning during the same training – multi-task learning (MTL). In this paper, we explore how the preprocessed data used for creating the textual multimodal description of the conversation, that supports conversational analysis, can be re-used as auxiliary tasks (e.g. predicting future or previous labels and predicting the ranked expressions of actions and prosody), thereby promoting the productive use of data. Our main contributions are: (1) the identification of sixteen beneficially auxiliary tasks, (2) studying the method of distributing learning capacity between the primary and auxiliary tasks, and (3) studying the relative supervision hierarchy between the primary and auxiliary tasks. Extensive experiments on IEMOCAP and SEMAINE data validate the improvements over single-task approaches, and suggest that it may generalize across multiple primary tasks.

References

[1]
Udit Arora, William Scott Paka, and Tanmoy Chakraborty. 2019. Multitask learning for blackmarket tweet detection. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 127–130.
[2]
Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, 2017. A closer look at memorization in deep networks. In International Conference on Machine Learning. PMLR, 233–242.
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473(2014).
[4]
Tadas Baltrušaitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 1–10.
[5]
Peter Bell, Pawel Swietojanski, and Steve Renals. 2016. Multitask learning of context-dependent targets in deep neural network acoustic models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 2(2016), 238–247.
[6]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 1 (2012), 281–305.
[7]
Stefano B Blumberg, Ryutaro Tanno, Iasonas Kokkinos, and Daniel C Alexander. 2018. Deeper image quality transfer: Training low-memory neural networks for 3D images. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 118–125.
[8]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335.
[9]
Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.
[10]
Rich Caruana, Shumeet Baluja, and Tom Mitchell. 1996. Using the future to “sort out" the present: Rankprop and multitask learning for medical risk evaluation. In Advances in neural information processing systems. 959–965.
[11]
Dongpeng Chen, Brian Mak, Cheung-Chi Leung, and Sunil Sivadas. 2014. Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5592–5596.
[12]
Dongpeng Chen and Brian Kan-Wing Mak. 2015. Multitask learning of deep neural networks for low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 7(2015), 1172–1183.
[13]
Hao Cheng, Hao Fang, and Mari Ostendorf. 2015. Open-domain name error detection using a multi-task rnn. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 737–746.
[14]
Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W Kempa-Liehr. 2018. Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package). Neurocomputing 307(2018), 72–77.
[15]
Gregory F Cooper, Vijoy Abraham, Constantin F Aliferis, John M Aronis, Bruce G Buchanan, Richard Caruana, Michael J Fine, Janine E Janosky, Gary Livingston, Tom Mitchell, 2005. Predicting dire outcomes of patients with community acquired pneumonia. Journal of biomedical informatics 38, 5 (2005), 347–366.
[16]
Yongping Du, Yunpeng Pan, and Junzhong Ji. 2017. A novel serial deep multi-task learning model for large scale biomedical semantic indexing. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 533–537.
[17]
Rosenberg Ekman. 1997. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA.
[18]
Anna Fariha. 2016. Automatic image captioning using multitask learning. In In the Proceedings of Neural Information Processing Systems, Vol. 20. 11–20.
[19]
Jose Maria Garcia-Garcia, Victor MR Penichet, and Maria D Lozano. 2017. Emotion detection: a technology review. In Proceedings of the XVIII international conference on human computer interaction. 1–8.
[20]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1487–1495.
[21]
Ting Gong, Tyler Lee, Cory Stephenson, Venkata Renduchintala, Suchismita Padhy, Anthony Ndirango, Gokce Keskin, and Oguz H Elibol. 2019. A comparison of loss weighting strategies for multi task learning in deep neural networks. IEEE Access 7(2019), 141627–141632.
[22]
Judith A Hall, Debra L Roter, Danielle C Blanch, and Richard M Frankel. 2009. Observer-rated rapport in interactions between medical students and standardized patients. Patient Education and Counseling 76, 3 (2009), 323–327.
[23]
Kaveh Hassani and Mike Haley. 2019. Unsupervised multi-task feature learning on point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 8160–8171.
[24]
Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, and Roger Zimmermann. 2018. Icon: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2594–2604.
[25]
Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, and Roger Zimmermann. 2018. Conversational memory network for emotion recognition in dyadic dialogue videos. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, Vol. 2018. NIH Public Access, 2122.
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[27]
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in neural information processing systems. 103–112.
[28]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An introduction to statistical learning. Vol. 112. Springer.
[29]
Gail Jefferson. 2004. Glossary of transcript symbols with an introduction. Pragmatics and Beyond New Series 125 (2004), 13–34.
[30]
Joshua Y. Kim, Rafael A. Calvo, N. J. Enfield, and Kalina Yacef. 2021. A Systematic Review on Dyadic Conversation Visualizations. In Companion Publication of the 2021 International Conference on Multimodal Interaction (Montreal, QC, Canada) (ICMI ’21 Companion). Association for Computing Machinery, New York, NY, USA, 137–147. https://doi.org/10.1145/3461615.3485396
[31]
Joshua Y Kim, Greyson Y Kim, and Kalina Yacef. 2019. Detecting depression in dyadic conversations with multimodal narratives and visualizations. In Australasian Joint Conference on Artificial Intelligence. Springer, 303–314.
[32]
Joshua Y Kim, Kalina Yacef, Greyson Kim, Chunfeng Liu, Rafael Calvo, and Silas Taylor. 2021. MONAH: Multi-Modal Narratives for Humans to analyze conversations. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 466–479.
[33]
Kalpesh Krishna, Shubham Toshniwal, and Karen Livescu. 2018. Hierarchical multitask learning for CTC-based speech recognition. arXiv preprint arXiv:1807.06234(2018).
[34]
Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, and Bjórn Wolfgang Schuller. 2020. Multi-task semi-supervised adversarial autoencoding for speech emotion recognition. IEEE Transactions on Affective Computing(2020).
[35]
Yann LeCun, Yoshua Bengio, 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361, 10(1995), 1995.
[36]
Giwoong Lee, Eunho Yang, and Sung Hwang. 2016. Asymmetric multi-task learning based on task relatedness and loss. In International Conference on Machine Learning. 230–238.
[37]
Hae Beom Lee, Eunho Yang, and Sung Ju Hwang. 2018. Deep asymmetric multi-task feature learning. In International Conference on Machine Learning. PMLR, 2956–2964.
[38]
Yuanchao Li, Tianyu Zhao, and Tatsuya Kawahara. 2019. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning. In Interspeech. 2803–2807.
[39]
Shengchao Liu, Yingyu Liang, and Anthony Gitter. 2019. Loss-balanced task weighting to reduce negative transfer in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9977–9978.
[40]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.
[41]
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, and Erik Cambria. 2019. Dialoguernn: An attentive rnn for emotion detection in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6818–6825.
[42]
Gary McKeown, Michel Valstar, Roddy Cowie, Maja Pantic, and Marc Schroder. 2011. The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE transactions on affective computing 3, 1 (2011), 5–17.
[43]
Afonso Menegola, Michel Fornaciali, Ramon Pires, Flávia Vasques Bittencourt, Sandra Avila, and Eduardo Valle. 2017. Knowledge transfer for melanoma screening with deep learning. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 297–300.
[44]
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues. In AAAI. 1359–1367.
[45]
Lorenza Mondada. 2018. Multiple temporalities of language and body in interaction: Challenges for transcribing multimodality. Research on Language and Social Interaction 51, 1 (2018), 85–106.
[46]
Taylor Mordan, Nicolas Thome, Gilles Henaff, and Matthieu Cord. 2018. Revisiting multi-task learning with rock: a deep residual auxiliary block for visual detection. In Advances in Neural Information Processing Systems. 1310–1322.
[47]
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2019. Deep double descent: Where bigger models and more data hurt. arXiv preprint arXiv:1912.02292(2019).
[48]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
[49]
Soujanya Poria, Navonil Majumder, Rada Mihalcea, and Eduard Hovy. 2019. Emotion recognition in conversation: Research challenges, datasets, and recent advances. IEEE Access 7(2019), 100943–100953.
[50]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550(2014).
[51]
Najmeh Sadoughi and Carlos Busso. 2018. Expressive speech-driven lip movements with multitask learning. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 409–415.
[52]
Ozan Sener and Vladlen Koltun. 2018. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems. 527–538.
[53]
Wei Shen, Xiaonan He, Chuheng Zhang, Qiang Ni, Wanchun Dou, and Yan Wang. 2020. Auxiliary-task Based Deep Reinforcement Learning for Participant Selection Problem in Mobile Crowdsourcing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1355–1364.
[54]
Jaak Simm, Ildefons Magrans de Abril, and Masashi Sugiyama. 2014. Tree-based ensemble multi-task learning method for classification and regression. IEICE TRANSACTIONS on Information and Systems 97, 6 (2014), 1677–1681.
[55]
Anders Søgaard and Yoav Goldberg. 2016. Deep multi-task learning with low level tasks supervised at lower layers. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 231–235.
[56]
Fei Tao and Carlos Busso. 2020. End-to-End Audiovisual Speech Recognition System with Multitask Learning. IEEE Transactions on Multimedia(2020).
[57]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
[58]
Trieu H Trinh, Andrew M Dai, Minh-Thang Luong, and Quoc V Le. 2018. Learning longer-term dependencies in rnns with auxiliary losses. arXiv preprint arXiv:1803.00144(2018).
[59]
Vokaturi. 2019. Vokaturi Overview. https://developers.vokaturi.com/getting-started/overview
[60]
Sen Wu, Hongyang R Zhang, and Christopher Ré. 2020. Understanding and Improving Information Transfer in Multi-Task Learning. arXiv preprint arXiv:2005.00944(2020).
[61]
Rui Xia and Yang Liu. 2015. A multi-task learning framework for emotion recognition using 2D continuous space. IEEE Transactions on affective computing 8, 1 (2015), 3–14.
[62]
Jianliang Yang, Yuenan Liu, Minghui Qian, Chenghua Guan, and Xiangfei Yuan. 2019. Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding. Applied Sciences 9, 18 (2019), 3658.
[63]
Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, and Hichem Sahli. 2016. Decision tree based depression classification from audio video and language information. In Proceedings of the 6th international workshop on audio/visual emotion challenge. 89–96.
[64]
Min Yang, Wei Zhao, Wei Xu, Yabing Feng, Zhou Zhao, Xiaojun Chen, and Kai Lei. 2018. Multitask learning for cross-domain image captioning. IEEE Transactions on Multimedia 21, 4 (2018), 1047–1061.
[65]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.
[66]
ByungIn Yoo, Youngjun Kwak, Youngsung Kim, Changkyu Choi, and Junmo Kim. 2018. Deep facial age estimation using conditional multitask learning with weak label expansion. IEEE Signal Processing Letters 25, 6 (2018), 808–812.
[67]
Abdallah Yousif, Zhendong Niu, and Ally S Nyamawe. 2018. Citation classification using multitask convolutional neural network model. In International Conference on Knowledge Science, Engineering and Management. Springer, 232–243.
[68]
Jianfei Yu and Jing Jiang. 2016. Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In Association for Computational Linguistics.
[69]
Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory fusion network for multi-view sequential learning. arXiv preprint arXiv:1802.00927(2018).
[70]
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. In Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, Vol. 2018. NIH Public Access, 5642.
[71]
Nasser Zalmout and Nizar Habash. 2019. Adversarial multitask learning for joint multi-feature and multi-dialect morphological modeling. arXiv preprint arXiv:1910.12702(2019).
[72]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818–833.
[73]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530(2016).
[74]
Fengda Zhu, Yi Zhu, Xiaojun Chang, and Xiaodan Liang. 2020. Vision-language navigation with self-supervised auxiliary reasoning tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10012–10022.

Index Terms

  1. Improving Supervised Learning in Conversational Analysis through Reusing Preprocessing Data as Auxiliary Supervisors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '22 Companion: Companion Publication of the 2022 International Conference on Multimodal Interaction
    November 2022
    225 pages
    ISBN:9781450393898
    DOI:10.1145/3536220
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multimodal conversational analysis
    2. neural network architecture
    3. transfer learning

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    ICMI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 68
      Total Downloads
    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media