Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681651acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

SCREEN: A Benchmark for Situated Conversational Recommendation

Published: 28 October 2024 Publication History

Abstract

Engaging in conversational recommendations within a specific scenario represents a promising paradigm in the real world. Scenario-relevant situations often affect conversations and recommendations from two closely related aspects: varying the appealingness of items to users, namely situated item representation, and shifting user interests in the targeted items, namely situated user preference. We highlight that considering those situational factors is crucial, as this aligns with the realistic conversational recommendation process in the physical world. However, it is challenging yet under-explored. In this work, we are pioneering to bridge this gap and introduce a novel setting: Situated Conversational Recommendation Systems (SCRS). We observe an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To this end, we construct a new benchmark, named SCREEN, via a role-playing method based on multimodal large language models. We take two multimodal large language models to play the roles of a user and a recommender, simulating their interactions in a co-observed scene. Our SCREEN comprises over 20k dialogues across 1.5k diverse situations, providing a rich foundation for exploring situational influences on conversational recommendations. Based on the SCREEN, we propose three worth-exploring subtasks and evaluate several representative baseline models. Our evaluations suggest that the benchmark is high quality, establishing a solid experimental basis for future research. The code and data are available at https://github.com/DongdingLin/SCREEN.

Supplemental Material

MP4 File - 5331-video
This is a 3-minute video presentation on the paper "SCREEN: A Benchmark for Situated Conversational Recommendation," which briefly introduces the Motivation, Research Question, and Methodology of this work.

References

[1]
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. CoRR, Vol. abs/2302.04023 (2023). https://doi.org/10.48550/ARXIV.2302.04023
[2]
Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. Towards Knowledge-Based Recommender Dialog System. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP. 1803--1813. https://doi.org/10.18653/v1/D19--1189
[3]
Xiaolin Chen, Xuemeng Song, Liqiang Jing, Shuo Li, Linmei Hu, and Liqiang Nie. 2022. Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model. CoRR, Vol. abs/2207.07934 (2022). https://doi.org/10.48550/ARXIV.2207.07934
[4]
Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 815--824. https://doi.org/10.1145/2939672.2939746
[5]
Paul A. Crook, Shivani Poddar, Ankita De, Semir Shafi, David Whitney, Alborz Geramifard, and Rajen Subba. 2019. SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform. CoRR, Vol. abs/1911.02690 (2019). showeprint[arXiv]1911.02690
[6]
Wenzhe Du, Su Haoyang, Cam-Tu Nguyen, and Jian Sun. 2023. Enhancing Product Representation with Multi-form Interactions for Multimodal Conversational Recommendation. In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, and M. Shamim Hossain (Eds.). ACM, 6491--6500. https://doi.org/10.1145/3581783.3613755
[7]
Siqi Fan, Yequan Wang, Xiaobing Pang, Lisi Chen, Peng Han, and Shuo Shang. 2023. UaMC: user-augmented conversation recommendation via multi-modal graph learning and context mining. World Wide Web (WWW), Vol. 26, 6 (2023), 4109--4129. https://doi.org/10.1007/S11280-023-01219--2
[8]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, Vol. 76, 5 (1971), 378.
[9]
Lewis R Goldberg. 1993. The structure of phenotypic personality traits. American psychologist, Vol. 48, 1 (1993), 26.
[10]
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. 2023. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. CoRR, Vol. abs/2301.07597 (2023). https://doi.org/10.48550/ARXIV.2301.07597
[11]
Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, and Zhou Yu. 2020. INSPIRED: Toward Sociable Recommendation Dialog Systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8142--8152. https://doi.org/10.18653/v1/2020.emnlp-main.654
[12]
Xin Huang, Chor Seng Tan, Yan Bin Ng, Wei Shi, Kheng Hui Yeo, Ridong Jiang, and Jung-jae Kim. 2021. Joint generation and bi-encoder for situated interactive multimodal conversations. In AAAI 2021 DSTC9 Workshop.
[13]
Raisa Islam and Owana Marzia Moushi. 2024. GPT-4o: The Cutting-Edge Advancement in Multimodal LLM. Authorea Preprints (2024).
[14]
Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A Survey on Conversational Recommender Systems. ACM Comput. Surv., Vol. 54, 5 (2021), 105:1--105:36. https://doi.org/10.1145/3453154
[15]
Fred Jelinek, Robert L Mercer, Lalit R Bahl, and James K Baker. 1977. Perplexity?a measure of the difficulty of speech recognition tasks. The Journal of the Acoustical Society of America, Vol. 62, S1 (1977), S63--S63.
[16]
Satwik Kottur and Seungwhan Moon. 2023. Overview of Situated and Interactive Multimodal Conversations (SIMMC) 2.1 Track at DSTC 11. In Proceedings of The Eleventh Dialog System Technology Challenge. 235--241.
[17]
Satwik Kottur, Seungwhan Moon, Alborz Geramifard, and Babak Damavandi. 2021. SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7--11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 4903--4912. https://doi.org/10.18653/V1/2021.EMNLP-MAIN.401
[18]
Po-Nien Kung, Chung-Cheng Chang, Tse-Hsuan Yang, Hsin-Kai Hsu, Yu-Jia Liou, and Yun-Nung Chen. 2021. Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems. CoRR, Vol. abs/2110.05221 (2021).
[19]
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society. CoRR, Vol. abs/2303.17760 (2023). https://doi.org/10.48550/ARXIV.2303.17760
[20]
Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards Deep Conversational Recommendations. In Advances in Neural Information Processing Systems. 9748--9758.
[21]
Lizi Liao, Le Hong Long, Zheng Zhang, Minlie Huang, and Tat-Seng Chua. 2021. MMConv: An Environment for Multimodal Conversational Search across Multiple Domains. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11--15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 675--684. https://doi.org/10.1145/3404835.3462970
[22]
Dongding Lin, Jian Wang, and Wenjie Li. 2023. COLA: Improving Conversational Recommender Systems by Collaborative Augmentation. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7--14, 2023, Brian Williams, Yiling Chen, and Jennifer Neville (Eds.). AAAI Press, 4462--4470. https://doi.org/10.1609/AAAI.V37I4.25567
[23]
Zeming Liu, Haifeng Wang, Zhengyu Niu, Hua Wu, and Wanxiang Che. 2021. DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). 4335--4347. https://doi.org/10.18653/v1/2021.emnlp-main.356
[24]
Zeming Liu, Haifeng Wang, Zheng-Yu Niu, Hua Wu, Wanxiang Che, and Ting Liu. 2020. Towards Conversational Recommendation over Multi-Type Dialogs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 1036--1049. https://doi.org/10.18653/v1/2020.acl-main.98
[25]
Yuxing Long, Binyuan Hui, Caixia Yuan, Fei Huang, Yongbin Li, and Xiaojie Wang. 2023. Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9--14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 3515--3533.
[26]
Yuxing Long, Huibin Zhang, Binyuan Hui, Zhenglu Yang, Caixia Yuan, Xiaojie Wang, Fei Huang, and Yongbin Li. 2023. Improving Situated Conversational Agents with Step-by-Step Multi-modal Logic Reasoning. In Proceedings of The Eleventh Dialog System Technology Challenge. 15--24.
[27]
Jo ao Magalh aes, Tat-Seng Chua, Tao Mei, and Alan F. Smeaton. 2021. The Next Generation Multimodal Conversational Search and Recommendation. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 953--954. https://doi.org/10.1145/3474085.3480025
[28]
Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, and Alborz Geramifard. 2020. Situated and Interactive Multimodal Conversations. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8--13, 2020, Donia Scott, Núria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, 1103--1121. https://doi.org/10.18653/V1/2020.COLING-MAIN.96
[29]
Liqiang Nie, Fangkai Jiao, Wenjie Wang, Yinglong Wang, and Qi Tian. 2021. Conversational Image Search. IEEE Trans. Image Process., Vol. 30 (2021), 7732--7743. https://doi.org/10.1109/TIP.2021.3108724
[30]
OpenAI. 2024. Hello GPT-4o. https://openai.com/index/hello-gpt-4o/.
[31]
Naoki Otani, Jun Araki, HyeongSik Kim, and Eduard H. Hovy. 2023. A Textual Dataset for Situated Proactive Response Selection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9--14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 3856--3874. https://doi.org/10.18653/V1/2023.ACL-LONG.214
[32]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 311--318. https://doi.org/10.3115/1073083.1073135
[33]
Amrita Saha, Mitesh M. Khapra, and Karthik Sankaranarayanan. 2018. Towards Building Large Scale Multimodal Domain-Aware Conversation Systems. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 696--704. https://doi.org/10.1609/AAAI.V32I1.11331
[34]
Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. In The 41st International ACM Conference on Research and Development in Information Retrieval (SIGIR). 235--244. https://doi.org/10.1145/3209978.3210002
[35]
Jian Wang, Yi Cheng, Dongding Lin, Chak Tou Leong, and Wenjie Li. 2023. Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6--10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 1132--1143.
[36]
Te-Lin Wu, Satwik Kottur, Andrea Madotto, Mahmoud Azab, Pedro Rodríguez, Babak Damavandi, Nanyun Peng, and Seungwhan Moon. 2023. SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9--14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, 6273--6291. https://doi.org/10.18653/V1/2023.ACL-LONG.345
[37]
Yuxiang Wu, Zhengyao Jiang, Akbir Khan, Yao Fu, Laura Ruis, Edward Grefenstette, and Tim Rocktäschel. 2023. ChatArena: Multi-Agent Language Game Environments for Large Language Models.
[38]
Yuxia Wu, Lizi Liao, Gangyi Zhang, Wenqiang Lei, Guoshuai Zhao, Xueming Qian, and Tat-Seng Chua. 2023. State Graph Reasoning for Multimodal Conversational Recommendation. IEEE Trans. Multim., Vol. 25 (2023), 3113--3124. https://doi.org/10.1109/TMM.2022.3155900
[39]
Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao. 2023. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V. CoRR, Vol. abs/2310.11441 (2023).
[40]
Yang Yang, Chubing Zhang, Xin Song, Zheng Dong, Hengshu Zhu, and Wenjie Li. 2024. Contextualized Knowledge Graph Embedding for Explainable Talent Training Course Recommendation. ACM Trans. Inf. Syst., Vol. 42, 2 (2024), 33:1--33:27.
[41]
Mingzhi Yu, Emer Gilmartin, and Diane J. Litman. 2019. Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue. In Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15--19 September 2019, Gernot Kubin and Zdravko Kacic (Eds.). ISCA, 1921--1925. https://doi.org/10.21437/INTERSPEECH.2019--1886
[42]
Haoyu Zhang, Meng Liu, Zan Gao, Xiaoqiang Lei, Yinglong Wang, and Liqiang Nie. 2021. Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 695--703. https://doi.org/10.1145/3474085.3475234
[43]
Tong Zhang, Yong Liu, Peixiang Zhong, Chen Zhang, Hao Wang, and Chunyan Miao. 2021. KECRS: Towards Knowledge-Enriched Conversational Recommendation System. CoRR, Vol. abs/2105.08261 (2021).
[44]
Hongyu Zhou, Xin Zhou, Zhiwei Zeng, Lingzi Zhang, and Zhiqi Shen. 2023. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions. CoRR, Vol. abs/2302.04473 (2023). https://doi.org/10.48550/ARXIV.2302.04473
[45]
Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuanhang Zhou, Ji-Rong Wen, and Jingsong Yu. 2020. Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1006--1014. https://doi.org/10.1145/3394486.3403143
[46]
Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards Topic-Guided Conversational Recommender System. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). 4128--4139. https://doi.org/10.18653/v1/2020.coling-main.365
[47]
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR, Vol. abs/2304.10592 (2023). https://doi.org/10.48550/ARXIV.2304.10592

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

  1. benchmark
  2. role-playing
  3. situated conversational recommendation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 109
    Total Downloads
  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)56
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media