Abstract
Conversational modeling using Large Language Models (LLMs) requires a nuanced understanding of context to generate coherent and contextually relevant responses. In this paper, we present Token Trails, a novel approach that leverages Token-Type Embeddings to navigate the intricate contextual nuances within conversations. Our framework utilizes Token-Type Embeddings to distinguish between user utterances and bot responses, facilitating the generation of context-aware replies. Through comprehensive experimentation and evaluation, we demonstrate the effectiveness of Token Trails in improving conversational understanding and response generation, achieving state-of-the-art performance. Our results highlight the significance of contextual modeling in conversational AI and underscore the promising potential of Token Trails to advance the field, paving the way for more sophisticated and contextually aware chatbot interactions. Model and source code available at: https://huggingface.co/Kowsher/TokenTrails.
P. Bhat—This work does not relate to Prakash’s position at Amazon.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achiam, J., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Adiwardana, D., et al.: Chatgpt: integrating language and vision into a single model for task-oriented dialogue. arXiv preprint arXiv:2109.00510 (2021)
Allouch, M., Azaria, A., Azoulay, R.: Conversational agents: goals, technologies, vision and challenges. Sensors 21(24), 8448 (2021)
Almazrouei, E., et al.: The falcon series of open language models. arXiv preprint arXiv:2311.16867 (2023)
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
Brown, T.B., et al.: Gpt-2: Language models are unsupervised multitask learners. Technical report. OpenAI (2019)
Brown, T.B., et al.: Towards a human-like open-domain chatbot. In: Proceedings of the 2020 Conference on Neural Information Processing Systems (NeurIPS) (2020)
Chae, H., et al.: Dialogue chain-of-thought distillation for commonsense-aware conversational agents. arXiv preprint arXiv:2310.09343 (2023)
Chen, W., et al.: Dialogved: a pre-trained latent variable encoder-decoder model for dialog response generation. In: Muresan, S., Nakov, P., Villavicencio, A. (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, 22–27 May 2022, pp. 4852–4864. Association for Computational Linguistics (2022)
Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., Cieliebak, M.: Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54, 755–810 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019)
Feng, Y., Wang, Y., Li, H.: A sequence-to-sequence approach to dialogue state tracking. arXiv preprint arXiv:2011.09553 (2020)
Ghosal, D., Majumder, N., Gelbukh, A., Mihalcea, R., Poria, S.: COSMIC: COmmonSense knowledge for eMotion identification in conversations. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2470–2481. Association for Computational Linguistics (2020)
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Huang, M., Zhu, X., Gao, J.: Challenges in building intelligent open-domain dialog systems. ACM Trans. Inf. Syst. (TOIS) 38(3), 1–32 (2020)
Huang, Z., Gutierrez, S., Kamana, H., MacNeil, S.: Memory sandbox: transparent and interactive memory management for conversational agents. In: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–3 (2023)
Khennouche, F., Elmir, Y., Djebari, N., Himeur, Y., Amira, A.: Revolutionizing customer interactions: insights and challenges in deploying chatgpt and generative chatbots for faqs. arXiv preprint arXiv:2311.09976 (2023)
Kwak, J.M., Kim, M., Hwang, S.J.: Context-dependent instruction tuning for dialogue response generation. arXiv preprint arXiv:2311.07006 (2023)
Li, Y., Su, H., Shen, X., Li, W., Cao, Z., Niu, S.: Dailydialog: a manually labelled multi-turn dialogue dataset. arXiv preprint arXiv:1710.03957 (2017)
McTear, M., Ashurkina, M.: Transforming Conversational AI: Exploring the Power of Large Language Models in Interactive Conversational Agents. Apress, Berkeley (2024)
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: a multimodal multi-party dataset for emotion recognition in conversations. CoRR arxiv:1810.02508 (2018)
Shen, W., Wu, S., Yang, Y., Quan, X.: Directed acyclic graph network for conversational emotion recognition. In: Annual Meeting of the Association for Computational Linguistics (2021)
Skantze, G.: Turn-taking in conversational systems and human-robot interaction: a review. Comput. Speech Lang. 67, 101178 (2021)
Skantze, G., Doğruöz, A.S.: The open-domain paradox for chatbots: common ground as the basis for human-like dialogue. In: Stoyanchev, S., Joty, S., Schlangen, D., Dusek, O., Kennington, C., Alikhani, M. (eds.) Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 605–614. Association for Computational Linguistics, Prague (2023)
Sun, K., Yu, D., Chen, J., Yu, D., Choi, Y., Cardie, C.: DREAM: a challenge dataset and models for dialogue-based reading comprehension. CoRR arxiv:1902.00164 (2019)
Team, G., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
Weizenbaum, J.: Eliza-a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966)
Xi, Z., et al.: The rise and potential of large language model based agents: a survey. arXiv preprint arXiv:2309.07864 (2023)
Zahiri, S.M., Choi, J.D.: Emotion detection on TV show transcripts with sequence-based convolutional neural networks. CoRR arxiv:1708.04299 (2017)
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., Weston, J.: Personalizing dialogue agents: i have a dog, do you have pets too? In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 2204–2213. Association for Computational Linguistics, Melbourne (2018)
Zhang, Y., et al.: Dialogpt: large-scale generative pre-training for conversational response generation. In: Proceedings of the 2020 Conference on Neural Information Processing Systems (NeurIPS) (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kowsher, M., Panditi, R., Prottasha, N.J., Bhat, P., Bairagi, A.K., Arefin, M.S. (2024). Token Trails: Navigating Contextual Depths in Conversational AI with ChatLLM. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-70242-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70241-9
Online ISBN: 978-3-031-70242-6
eBook Packages: Computer ScienceComputer Science (R0)