Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Spatiotemporal fusion personality prediction based on visual information

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The previous studies have demonstrated that the use of deep learning algorithms can make personality prediction based on two-dimensional image information, and the emergence of video provides more possibilities for exploring personality prediction. Compared to image-based personality prediction, using video can provide more information than static images. But videos contain hundreds of frames, not all of which are useful, and processing these images requires a lot of computation. This paper proposes to apply video analysis algorithms to the task of personality prediction and propose the use of LSTM to fuse image feature information. The best prediction effect is confirmed by experiments when the fusion frame number is 16 frames. This paper is based on 3D-ConvNet to build an end-to-end video analysis network and solve the network over fitting problem by pre-training and data augmentation. Experiments show that the accuracy of character prediction can be improved by using 3D-ConvNet to fuse the spatio-temporal information of videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Our research involves personal personality and face images and videos. In order to protect personal privacy, we signed a confidentiality agreement with the subjects before the evaluation to ensure that their data are not disclosed and only used for scientific research experiments. So,the PCCS (Chinese college students) representative personality dataset generated during and/or analysed during the current study are not publicly available.

References

  1. Attrapadung N, Hamada K, Ikarashi D, Kikuchi R, Matsuda T, Mishina I, Morita H, Schuldt J (2021) Adam in Private: Secure and Fast Training of Deep Neural Networks with Adaptive Moment Estimation.

  2. Brooks J (2011) Asdarepro deal forSun and Imagenet[J]. Packaging News, p.3

  3. Cao X, Liu Z (2015) Type-2 Fuzzy Topic Models for Human Action Recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593. https://doi.org/10.1109/TFUZZ.2014.2370678

    Article  Google Scholar 

  4. Diba A, Pazandeh AM, Gool LV (2016) Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification[J]

  5. Hara K, Kataoka H and Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, doi: https://doi.org/10.1109/CVPR.2018.00685

  6. Hara K, Kataoka H and Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?,"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, doi: https://doi.org/10.1109/CVPR.2018.00685

  7. Hara K, Kataoka H, Satoh Y (2018) Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[J]

  8. Joo J, Steen FF, Zhu S-C (2015) Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face, 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3712-3720, doi: https://doi.org/10.1109/ICCV.2015.423

  9. Lin QH, Niu YW, Sui J et al (2022) SSPNet: An interpretable 3D-CNN for classification of schizophrenia using phase maps of resting-state complex-valued fMRI data[J]. Med Image Anal 79:102430

    Article  Google Scholar 

  10. Liu S, Wang S, Liu X, Lin C-T, Lv Z (2021) Fuzzy Detection Aided Real-Time and Robust Visual Tracking Under Complex Environments. IEEE Trans Fuzzy Syst 29(1):90–102. https://doi.org/10.1109/TFUZZ.2020.3006520

    Article  Google Scholar 

  11. Liu S et al (2021) Human Memory Update Strategy: A Multi-Layer Template Update Mechanism for Remote Visual Monitoring. IEEE Trans Multimedia 23:2188–2198. https://doi.org/10.1109/TMM.2021.3065580

    Article  Google Scholar 

  12. Mohammadi G, Vinciarelli A (2012) Automatic Personality Perception: Prediction of Trait Attribution Based on Prosodic Features. IEEE Trans Affective Comput 3(3):273–284. https://doi.org/10.1109/T-AFFC.2012.5

    Article  Google Scholar 

  13. Nguyen LS, Gatica-Perez D (2016) Hirability in the wild: Analysis of online conversational video resumes. IEEE Trans Multimedia 18(7):1422–1437

    Article  Google Scholar 

  14. Ponce-López V et al (2016) ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results. In: Hua G, Jégou H (eds) Computer Vision – ECCV 2016 Workshops. ECCV 2016. Lecture Notes in Computer Science(), vol 9915. Springer, Cham. https://doi.org/10.1007/978-3-319-49409-8_32

    Chapter  Google Scholar 

  15. Russakovsky O, Deng J, Su H et al (2015) ImageNet Large Scale Visual Recognition Challenge. IntJ Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  16. Sammeta V, Naveen Y, Suresh C (n.d.) Acoustics Recognition and Video Sound-Track Classification using CNN

  17. Schmid W (1975) On the characters of the discrete series. Invent Math 30:47–144. https://doi.org/10.1007/BF01389847

    Article  MathSciNet  MATH  Google Scholar 

  18. Teng M, Tao et al (2011) Contextual Bag-of-Words for Visual Categorization.[J]. IEEE Trans Circuits Syst Video Technol 21(4):381–392

    Article  Google Scholar 

  19. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. IEEE Int Conference Comput Vision (ICCV) 2015:4489–4497. https://doi.org/10.1109/ICCV.2015.510

    Article  Google Scholar 

  20. Wang S et al (2022) Human Short Long-Term Cognitive Memory Mechanism for Visual Monitoring in IoT-Assisted Smart Cities. IEEE Internet Things J 9(10):7128–7139. https://doi.org/10.1109/JIOT.2021.3077600

    Article  MathSciNet  Google Scholar 

  21. Wei X, Zhang C, Zhang H, Wu J (2018) Deep Bimodal Regression of Apparent Personality Traits from Short Video Sequences. IEEE Trans Affect Comput 9(3):303–315. https://doi.org/10.1109/TAFFC.2017.2762299

    Article  Google Scholar 

  22. Wolf L, Levy N (2013) The SVM-Minus Similarity Score for Video Face Recognition[C]// IEEE Conference on Computer Vision & Pattern Recognition. IEEE

  23. Xu J, Tian W, Fan Y, Lin Y, Zhang C (2018) Personality Trait Prediction Based on 2.5D Face Feature Model. In: Sun X, Pan Z, Bertino E (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science, vol 11068. Springer, Cham. https://doi.org/10.1007/978-3-030-00021-9_54

    Chapter  Google Scholar 

  24. Xu J, Tian W, Lv G, Liu S, Fan Y (2021) Prediction of the Big Five Personality Traits Using Static Facial Images of College Students With Different Academic Backgrounds. IEEE Access 9:76822–76832. https://doi.org/10.1109/ACCESS.2021.3076989

    Article  Google Scholar 

  25. Xu J, Tian W, Lv G, Liu S, Fan Y (2021) 2.5D Facial Personality Prediction Based on Deep Learning. J Adv Trans 2021:5581984, 12 pages. https://doi.org/10.1155/2021/5581984

    Article  Google Scholar 

  26. Yan S (2014) Some examples from Caltech101/256 and PASCAL VOC 2007/2011 datasets

  27. Yu Z, Xu D, Yu J, Yu T, Zhao Z, Zhuang Y, Tao D (2019) ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering. Proc AAAI Conference Artificial Intell 33:9127–9134. https://doi.org/10.1609/aaai.v33i01.33019127

    Article  Google Scholar 

  28. Zha S, Luisier F, Andrews W, Srivastava N and Salakhutdinov R (2015) Exploiting Image-trained CNN Architectures for Unconstrained Video Classification. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 60.1-60.13. BMVA Press

  29. Zhang W and Wu Y (2022) Semantic sentiment analysis based on a combination of CNN and LSTM model in 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, pp. 177-180.doi:https://doi.org/10.1109/MLKE55170.2022.00041

Download references

Funding

This work was funded by the National Natural Science Foundation of China (61402371), the Shaanxi Provincial Science and Technology Innovation Project Plan(2013SZS15-K02), and the Shaanxi Provincial Key Scientific Research Project (2020zdlgy04-09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Xu.

Ethics declarations

Ethical approval

Participants were asked for oral consent to participate in the study, and all data were collected after obtaining consent. The data from consenting participants were applied in this study. In addition, we numbered each subject, and the self-reported personality assessment data were collected anonymously in the form of numbers.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, J., Tian, W., Lv, G. et al. Spatiotemporal fusion personality prediction based on visual information. Multimed Tools Appl 82, 44227–44244 (2023). https://doi.org/10.1007/s11042-023-15537-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15537-0

Keywords

Navigation