Character Prediction in TV Series via a Semantic Projection Network

Ke Sun¹⁸,
Zhuo Lei¹⁹,
Jiasong Zhu¹⁸,
Xianxu Hou²⁰,
Bozhi Liu²⁰ &
…
Guoping Qiu^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11295))

Included in the following conference series:

International Conference on Multimedia Modeling

2738 Accesses

Abstract

The goal of this paper is to automatically recognize characters in popular TV series. In contrast to conventional approaches which rely on weak supervision afforded by transcripts, subtitles or character facial data, we formulate the problem as the multi-label classification which requires only label-level supervision. We propose a novel semantic projection network consisting of two stacked subnetworks with specially designed constraints. The first subnetwork is a contractive autoencoder which focuses on reconstructing feature activations extracted from a pre-trained single-label convolutional neural network (CNN). The second subnetwork functions as a region-based multi-label classifier which produces character labels for the input video frame as well as reconstructing the input visual feature from the mapped semantic labels space. Extensive experiments show that the proposed model achieves state-of-the-art performance in comparison with recent approaches on three challenging TV series datasets (the Big Bang Theory, the Defenders and Nirvava in Fire).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Skeleton-Based Deep Learning Approach for Recognizing Violent Actions in Surveillance Scenarios

Based on the Neural Network Classification of Human Behavior Research

CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application Scenarios

References

Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2280–2287. IEEE (2013)
Google Scholar
Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021 (2011)
Google Scholar
Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 919–926 (2009)
Google Scholar
Cour, T., Sapp, B., Nagle, A., Taskar, B.: Talking pictures: temporal grouping and dialog-supervised person recognition. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1014–1021. IEEE (2010)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI, pp. 3471–3477 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, Z., Chen, C., Bu, J., Li, P., Cai, D.: Multi-view based multi-label propagation for image annotation. Neurocomputing 168(C), 853–860 (2015)
Article Google Scholar
Iwata, M., Ito, A., Kise, K.: A study to achieve manga character retrieval method for manga images. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 309–313. IEEE (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Kostinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Learning to recognize faces from videos and weakly related information cues. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 23–28 (2011)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. M.Sc. thesis, University of Toronto (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Li, C., Kang, Q., Ge, G., Song, Q., Lu, H., Cheng, J.: Deepbe: learning deep binary encoding for multi-label classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 39–46 (2016)
Google Scholar
Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in tv-series. In: BMVC (2014)
Google Scholar
Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in tv-series. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1, pp. 1–8. IEEE (2015)
Google Scholar
Nagrani, A., Zisserman, A.: From benedict cumberbatch to sherlock holmes: Character identification in tv series without a script. CoRR abs/1801.10442 (2017)
Google Scholar
Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_28
Chapter Google Scholar
Parkhi, O.M., Rahtu, E., Zisserman, A.: It’s in the bag: stronger supervision for automated face labelling. In: ICCV Workshop, vol. 2, p. 6 (2015)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Google Scholar
Pont-Tuset, J., Arbeláez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2015)
Article Google Scholar
Ramanathan, V., Joulin, A., Liang, P., Fei-Fei, L.: Linking people in videos with “their” names using coreference resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 95–110. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_7
Chapter Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 512–519 (2014)
Google Scholar
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML (2011)
Google Scholar
Shan, C.: Face recognition and retrieval in video. Stud. Comput. Intell. 287, 235–260 (2010)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sivic, J., Everingham, M., Zisserman, A.: “who are you?"- learning person specific classifiers from video. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1145–1152. IEEE (2009)
Google Scholar
Tapaswi, M., Bäuml, M., Stiefelhagen, R.: Story-based video retrieval in TV series using plot synopses. In: Proceedings of International Conference on Multimedia Retrieval, p. 137. ACM (2014)
Google Scholar
Tapaswi, M., Bauml, M., Stiefelhagen, R.: Storygraphs: visualizing character interactions as a timeline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 827–834 (2014)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. (IJDWM) 3(3), 1–13 (2007)
Article Google Scholar
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2285–2294. IEEE (2016)
Google Scholar
Wei, Y., et al.: CNN: single-label to multi-label. arXiv preprint arXiv:1406.5726 (2014)
Wei, Y., et al.: HCP: A flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1901–1907 (2016)
Article Google Scholar
Wohlhart, P., Köstinger, M., Roth, P.M., Bischof, H.: Multiple instance boosting for face recognition in videos. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 132–141. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23123-0_14
Chapter Google Scholar
Wu, F., Wang, Z., Zhang, Z., Yang, Y., Luo, J., Zhu, W., Zhuang, Y.: Weakly semi-supervised deep learning for multi-label image annotation. IEEE Trans. Big Data 1(3), 109–122 (2015)
Article Google Scholar
Yu, Q., Wang, J., Zhang, S., Gong, Y., Zhao, J.: Combining local and global hypotheses in deep neural network for multi-label image classification. Neurocomputing 235, 38–45 (2017)
Article Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 6034–6042 (2016)
Google Scholar

Download references

Acknowledgment

This work was jointly supported in part by the National Natural Science Foundation of China under Grant 61773414, and in part by the Shenzhen Future Industry Development Funding program under Grant 201607281039561400, and the Shenzhen Scientific Research and Development Funding Program under Grant JCYJ20170818092931604.

Author information

Authors and Affiliations

Shenzhen Key Laboratory of Spatial Information Smarting Sensing and Services, Shenzhen University, Shenzhen, China
Ke Sun & Jiasong Zhu
School of Computer Science, University of Nottingham Ningbo, Ningbo, China
Zhuo Lei
Guangdong Key Laboratory of Intelligent Information Processing, College of Information Engineering, Shenzhen University, Shenzhen, China
Xianxu Hou, Bozhi Liu & Guoping Qiu
School of Computer Science, University of Nottingham, Nottingham, UK
Guoping Qiu

Authors

Ke Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Lei
View author publications
You can also search for this author in PubMed Google Scholar
Jiasong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xianxu Hou
View author publications
You can also search for this author in PubMed Google Scholar
Bozhi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guoping Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoping Qiu .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, K., Lei, Z., Zhu, J., Hou, X., Liu, B., Qiu, G. (2019). Character Prediction in TV Series via a Semantic Projection Network. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-05710-7_25
Published: 08 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05709-1
Online ISBN: 978-3-030-05710-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics