Abstract
The previous studies have demonstrated that the use of deep learning algorithms can make personality prediction based on two-dimensional image information, and the emergence of video provides more possibilities for exploring personality prediction. Compared to image-based personality prediction, using video can provide more information than static images. But videos contain hundreds of frames, not all of which are useful, and processing these images requires a lot of computation. This paper proposes to apply video analysis algorithms to the task of personality prediction and propose the use of LSTM to fuse image feature information. The best prediction effect is confirmed by experiments when the fusion frame number is 16 frames. This paper is based on 3D-ConvNet to build an end-to-end video analysis network and solve the network over fitting problem by pre-training and data augmentation. Experiments show that the accuracy of character prediction can be improved by using 3D-ConvNet to fuse the spatio-temporal information of videos.
Similar content being viewed by others
Data availability
Our research involves personal personality and face images and videos. In order to protect personal privacy, we signed a confidentiality agreement with the subjects before the evaluation to ensure that their data are not disclosed and only used for scientific research experiments. So,the PCCS (Chinese college students) representative personality dataset generated during and/or analysed during the current study are not publicly available.
References
Attrapadung N, Hamada K, Ikarashi D, Kikuchi R, Matsuda T, Mishina I, Morita H, Schuldt J (2021) Adam in Private: Secure and Fast Training of Deep Neural Networks with Adaptive Moment Estimation.
Brooks J (2011) Asdarepro deal forSun and Imagenet[J]. Packaging News, p.3
Cao X, Liu Z (2015) Type-2 Fuzzy Topic Models for Human Action Recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593. https://doi.org/10.1109/TFUZZ.2014.2370678
Diba A, Pazandeh AM, Gool LV (2016) Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification[J]
Hara K, Kataoka H and Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, doi: https://doi.org/10.1109/CVPR.2018.00685
Hara K, Kataoka H and Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?,"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, doi: https://doi.org/10.1109/CVPR.2018.00685
Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[J]
Joo J, Steen FF, Zhu S-C (2015) Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face, 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3712-3720, doi: https://doi.org/10.1109/ICCV.2015.423
Lin QH, Niu YW, Sui J et al (2022) SSPNet: An interpretable 3D-CNN for classification of schizophrenia using phase maps of resting-state complex-valued fMRI data[J]. Med Image Anal 79:102430
Liu S, Wang S, Liu X, Lin C-T, Lv Z (2021) Fuzzy Detection Aided Real-Time and Robust Visual Tracking Under Complex Environments. IEEE Trans Fuzzy Syst 29(1):90–102. https://doi.org/10.1109/TFUZZ.2020.3006520
Liu S et al (2021) Human Memory Update Strategy: A Multi-Layer Template Update Mechanism for Remote Visual Monitoring. IEEE Trans Multimedia 23:2188–2198. https://doi.org/10.1109/TMM.2021.3065580
Mohammadi G, Vinciarelli A (2012) Automatic Personality Perception: Prediction of Trait Attribution Based on Prosodic Features. IEEE Trans Affective Comput 3(3):273–284. https://doi.org/10.1109/T-AFFC.2012.5
Nguyen LS, Gatica-Perez D (2016) Hirability in the wild: Analysis of online conversational video resumes. IEEE Trans Multimedia 18(7):1422–1437
Ponce-López V et al (2016) ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results. In: Hua G, Jégou H (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9915. Springer, Cham. https://doi.org/10.1007/978-3-319-49409-8_32
Russakovsky O, Deng J, Su H et al (2015) ImageNet Large Scale Visual Recognition Challenge. IntJ Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y
Sammeta V, Naveen Y, Suresh C (n.d.) Acoustics Recognition and Video Sound-Track Classification using CNN
Schmid W (1975) On the characters of the discrete series. Invent Math 30:47–144. https://doi.org/10.1007/BF01389847
Teng M, Tao et al (2011) Contextual Bag-of-Words for Visual Categorization.[J]. IEEE Trans Circuits Syst Video Technol 21(4):381–392
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. IEEE Int Conference Comput Vision (ICCV) 2015:4489–4497. https://doi.org/10.1109/ICCV.2015.510
Wang S et al (2022) Human Short Long-Term Cognitive Memory Mechanism for Visual Monitoring in IoT-Assisted Smart Cities. IEEE Internet Things J 9(10):7128–7139. https://doi.org/10.1109/JIOT.2021.3077600
Wei X, Zhang C, Zhang H, Wu J (2018) Deep Bimodal Regression of Apparent Personality Traits from Short Video Sequences. IEEE Trans Affect Comput 9(3):303–315. https://doi.org/10.1109/TAFFC.2017.2762299
Wolf L, Levy N (2013) The SVM-Minus Similarity Score for Video Face Recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE
Xu J, Tian W, Fan Y, Lin Y, Zhang C (2018) Personality Trait Prediction Based on 2.5D Face Feature Model. In: Sun X, Pan Z, Bertino E (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science, vol 11068. Springer, Cham. https://doi.org/10.1007/978-3-030-00021-9_54
Xu J, Tian W, Lv G, Liu S, Fan Y (2021) Prediction of the Big Five Personality Traits Using Static Facial Images of College Students With Different Academic Backgrounds. IEEE Access 9:76822–76832. https://doi.org/10.1109/ACCESS.2021.3076989
Xu J, Tian W, Lv G, Liu S, Fan Y (2021) 2.5D Facial Personality Prediction Based on Deep Learning. J Adv Trans 2021:5581984, 12 pages. https://doi.org/10.1155/2021/5581984
Yan S (2014) Some examples from Caltech101/256 and PASCAL VOC 2007/2011 datasets
Yu Z, Xu D, Yu J, Yu T, Zhao Z, Zhuang Y, Tao D (2019) ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering. Proc AAAI Conference Artificial Intell 33:9127–9134. https://doi.org/10.1609/aaai.v33i01.33019127
Zha S, Luisier F, Andrews W, Srivastava N and Salakhutdinov R (2015) Exploiting Image-trained CNN Architectures for Unconstrained Video Classification. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 60.1-60.13. BMVA Press
Zhang W and Wu Y (2022) Semantic sentiment analysis based on a combination of CNN and LSTM model in 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, pp. 177-180.doi:https://doi.org/10.1109/MLKE55170.2022.00041
Funding
This work was funded by the National Natural Science Foundation of China (61402371), the Shaanxi Provincial Science and Technology Innovation Project Plan(2013SZS15-K02), and the Shaanxi Provincial Key Scientific Research Project (2020zdlgy04-09).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
Participants were asked for oral consent to participate in the study, and all data were collected after obtaining consent. The data from consenting participants were applied in this study. In addition, we numbered each subject, and the self-reported personality assessment data were collected anonymously in the form of numbers.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, J., Tian, W., Lv, G. et al. Spatiotemporal fusion personality prediction based on visual information. Multimed Tools Appl 82, 44227–44244 (2023). https://doi.org/10.1007/s11042-023-15537-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15537-0