DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

Abstract

Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows De-SIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

Anthology ID:: 2023.emnlp-main.191
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3169–3180
Language:
URL:: https://aclanthology.org/2023.emnlp-main.191/
DOI:: 10.18653/v1/2023.emnlp-main.191
Bibkey:
Cite (ACL):: Xiao-Yu Guo, Yuan-Fang Li, and Reza Haf. 2023. DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3169–3180, Singapore. Association for Computational Linguistics.
Cite (Informal):: DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding (Guo et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.191.pdf
Video:: https://aclanthology.org/2023.emnlp-main.191.mp4

PDF Cite Search Video Fix data