Abstract
Social relations are ubiquitous in people’s daily life. Especially, the widespread of video in social media and intelligent surveillance gives us a new chance to discover the social relations among people. Previous researches mostly focus on the recognition of social relations from texts, blogs, or images. However, these methods are only concentrated on limited social relations and incapable of dealing with video data. In this paper, we address the challenges of social relation recognition by employing a multi-stream model to exploit the abundant multimodal information in videos. First of all, we build a video dataset with 16 categories of social relations annotation according to psychology and sociology studies, named Social Relation In Videos (SRIV), which comprises of 3,124 videos. According to our knowledge, it is the first video dataset for the social relation recognition. Secondly, we propose a multi-stream deep learning model as a benchmark for recognizing social relations, which learns high level semantic information of spatial, temporal, and audio of people’s social interactions in videos. Finally, we fuse them with logical regression to achieve accurate recognition. Experimental results show that the multi-stream deep model is effective for social relation recognition on the proposed dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Luan, M.N.: Context-aware text representation for social relation aided sentiment analysis. In: WWW, pp. 85–86 (2016)
Xiang, L., Sang, J., Xu, C.: Demographic attribute inference from social multimedia behaviors: a cross-OSN approach. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 515–526. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_42
Dai, Q., Carr, P., Sigal, L., Hoiem, D.: Family member identification from photo collections. In: Applications of Computer Vision, pp. 982–989 (2015)
Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: CVPR, pp. 3707–3715 (2015)
Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: CVPR, pp. 435–444 (2017)
Zhang, Z., Luo, P., Loy, C.-C., Tang, X.: Learning social relation traits from face images. In: ICCV, pp. 3631–3639 (2015)
Kiesler, D.J.: The 1982 interpersonal circle: a taxonomy for complementarity in human transactions. Psychol. Rev. 90(3), 185 (1983)
Ho, D.Y.: Interpersonal relationships and relationship dominance: An analysis based on methodological relationism. Asian J. Soc. Psychol. 1(1), 1–16 (1998)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Tanisik, G., Zalluhoglu, C., Ikizler-Cinbis, N.: Facial descriptors for human interaction recognition in still images. Pattern Recogn. Lett. 73, 44–51 (2016)
Zurrida, S., Mazzarol, G., Galimberti, V., Renne, G., Bassi, F., Iafrate, F., Viale, G.: Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimed. 17(5), 746–760 (2015)
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR, pp. 3043–3053 (2016)
Tran, Q.D., Jung, J.E.: Cocharnet: extracting social networks using character co-occurrence in movies. J. Univers. Comput. Sci. 21(6), 796–815 (2015)
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: ICCV, pp. 2280–2287 (2013)
Petscharnig, S., Schöffmann, K.: Deep learning for shot classification in gynecologic surgery videos. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 702–713. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_57
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Wu, Z., Jiang, Y.-G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: MM, pp. 791–800 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Acknowledgment
This research is supported in part by the National High-tech R&D Program (No. 2015AA050204), the Special Found for Beijing Common Construction Project, the National Natural Science Foundation of China (No. 61602049), and the Fundamental Research Funds for the Central Universities (No. 2016RCGD32).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Lv, J., Liu, W., Zhou, L., Wu, B., Ma, H. (2018). Multi-stream Fusion Model for Social Relation Recognition from Videos. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)