Multi-stream Fusion Model for Social Relation Recognition from Videos

Jinna Lv²¹,
Wu Liu²¹,
Lili Zhou²¹,
Bin Wu²¹ &
…
Huadong Ma²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

International Conference on Multimedia Modeling

Abstract

Social relations are ubiquitous in people’s daily life. Especially, the widespread of video in social media and intelligent surveillance gives us a new chance to discover the social relations among people. Previous researches mostly focus on the recognition of social relations from texts, blogs, or images. However, these methods are only concentrated on limited social relations and incapable of dealing with video data. In this paper, we address the challenges of social relation recognition by employing a multi-stream model to exploit the abundant multimodal information in videos. First of all, we build a video dataset with 16 categories of social relations annotation according to psychology and sociology studies, named Social Relation In Videos (SRIV), which comprises of 3,124 videos. According to our knowledge, it is the first video dataset for the social relation recognition. Secondly, we propose a multi-stream deep learning model as a benchmark for recognizing social relations, which learns high level semantic information of spatial, temporal, and audio of people’s social interactions in videos. Finally, we fuse them with logical regression to achieve accurate recognition. Experimental results show that the multi-stream deep model is effective for social relation recognition on the proposed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

A Multimodal Approach for Multiple-Relation Extraction in Videos

Article 15 September 2021

Learning Social Relations from Videos: Features, Models, and Analytics

References

Luan, M.N.: Context-aware text representation for social relation aided sentiment analysis. In: WWW, pp. 85–86 (2016)
Google Scholar
Xiang, L., Sang, J., Xu, C.: Demographic attribute inference from social multimedia behaviors: a cross-OSN approach. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 515–526. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_42
Chapter Google Scholar
Dai, Q., Carr, P., Sigal, L., Hoiem, D.: Family member identification from photo collections. In: Applications of Computer Vision, pp. 982–989 (2015)
Google Scholar
Liu, W., Mei, T., Zhang, Y., Che, C., Luo, J.: Multi-task deep visual-semantic embedding for video thumbnail selection. In: CVPR, pp. 3707–3715 (2015)
Google Scholar
Sun, Q., Schiele, B., Fritz, M.: A domain based approach to social relation recognition. In: CVPR, pp. 435–444 (2017)
Google Scholar
Zhang, Z., Luo, P., Loy, C.-C., Tang, X.: Learning social relation traits from face images. In: ICCV, pp. 3631–3639 (2015)
Google Scholar
Kiesler, D.J.: The 1982 interpersonal circle: a taxonomy for complementarity in human transactions. Psychol. Rev. 90(3), 185 (1983)
Article Google Scholar
Ho, D.Y.: Interpersonal relationships and relationship dominance: An analysis based on methodological relationism. Asian J. Soc. Psychol. 1(1), 1–16 (1998)
Article MathSciNet Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Tanisik, G., Zalluhoglu, C., Ikizler-Cinbis, N.: Facial descriptors for human interaction recognition in still images. Pattern Recogn. Lett. 73, 44–51 (2016)
Article Google Scholar
Zurrida, S., Mazzarol, G., Galimberti, V., Renne, G., Bassi, F., Iafrate, F., Viale, G.: Automatic recognition of emergent social roles in small group interactions. IEEE Trans. Multimed. 17(5), 746–760 (2015)
Article Google Scholar
Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR, pp. 3043–3053 (2016)
Google Scholar
Tran, Q.D., Jung, J.E.: Cocharnet: extracting social networks using character co-occurrence in movies. J. Univers. Comput. Sci. 21(6), 796–815 (2015)
Google Scholar
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: ICCV, pp. 2280–2287 (2013)
Google Scholar
Petscharnig, S., Schöffmann, K.: Deep learning for shot classification in gynecologic surgery videos. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 702–713. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_57
Chapter Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Google Scholar
Wu, Z., Jiang, Y.-G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: MM, pp. 791–800 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar

Download references

Acknowledgment

This research is supported in part by the National High-tech R&D Program (No. 2015AA050204), the Special Found for Beijing Common Construction Project, the National Natural Science Foundation of China (No. 61602049), and the Fundamental Research Funds for the Central Universities (No. 2016RCGD32).

Author information

Authors and Affiliations

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Jinna Lv, Wu Liu, Lili Zhou, Bin Wu & Huadong Ma

Authors

Jinna Lv
View author publications
You can also search for this author in PubMed Google Scholar
Wu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lili Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huadong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Wu .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, J., Liu, W., Zhou, L., Wu, B., Ma, H. (2018). Multi-stream Fusion Model for Social Relation Recognition from Videos. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_29
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-stream Fusion Model for Social Relation Recognition from Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

A Multimodal Approach for Multiple-Relation Extraction in Videos

Learning Social Relations from Videos: Features, Models, and Analytics

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multi-stream Fusion Model for Social Relation Recognition from Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Spatio-Temporal Attention Model Based on Multi-view for Social Relation Understanding

A Multimodal Approach for Multiple-Relation Extraction in Videos

Learning Social Relations from Videos: Features, Models, and Analytics

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation