Abstract
This paper reports on a user-experience study undertaken as part of the H2020 project MeMAD (‘Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy’), in which multimedia content describers from the television and archive industries tested Flow, an online platform, designed to assist the post-editing of automatically generated data, in order to enhance the production of archival descriptions of film content. Our study captured the participant experience using screen recordings, the User Experience Questionnaire (UEQ), a benchmarked interactive media questionnaire and focus group discussions, reporting a broadly positive post-editing environment. Users designated the platform’s role in the collation of machine-generated content descriptions, transcripts, named-entities (location, persons, organisations) and translated text as helpful and likely to enhance creative outputs in the longer term. Suggestions for improving the platform included the addition of specialist vocabulary functionality, shot-type detection, film-topic labelling, and automatic music recognition. The limitations of the study are, most notably, the current level of accuracy achieved in computer vision outputs (i.e. automated video descriptions of film material) which has been hindered by the lack of reliable and accurate training data, and the need for a more narratively oriented interface which allows describers to develop their storytelling techniques and build descriptions which fit within a platform-hosted storyboarding functionality. While this work has value in its own right, it can also be regarded as paving the way for the future (semi)automation of audio descriptions to assist audiences experiencing sight impairment, cognitive accessibility difficulties or for whom ‘visionless’ multimedia consumption is their preferred option.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Braun, S., Starr, K.: Finding the right words: investigating machine-generated video description quality using a human-derived corpus-based approach. J. Audiov. Transl. 2(2), 11–25 (2019). https://doi.org/10.47476/jat.v2i2.103
Starr, K., Braun, S., Delfani, J.: Taking a cue from the human: linguistic and visual prompts for the automatic sequencing of multimodal narrative. J. Audiov. Transl. 3(2), 140–169 (2020). https://doi.org/10.47476/jat.v3i2.2020.138
Huang, T.H., et al.: Visual storytelling. In: Proceedings of NAACL-HLT, San Diego, California, 12–17 June, pp. 1233–1239 (2016). https://doi.org/10.18653/v1/N16-1147
Park, J.S., Rohrbach, M., Darrell, T., Rohrbach, A.: Adversarial inference for multi-sentence video description. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6598–6608 (2019). https://doi.org/10.1109/CVPR.2019.00676
Laaksonen, J., Guo, Z.: PicSOM experiments in TRECVID 2020. In: TRECVID 2020 Workshop, 17–19 November, Online Conference (2020)
Limecraft homepage. https://www.limecraft.com/. Accessed 09 June 2021
Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9_6
Lisena, P., Laaksonen, J., Troncy, R.: FaceRec: an interactive framework for face recognition in video archives. In: 2nd International Workshop on Data-driven Personalisation of Television (DataTV) Collocated with the ACM International Conference on Interactive Media Experiences (IMX 2021), 21–23 June 2021, forthcoming. https://doi.org/10.5281/zenodo.4764633
Harrando, I., Troncy, R.: Named entity recognition as graph classification. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12739, pp. 103–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80418-3_19
Porjazovski, D., Leinonen, J., Kurimo, M.: Named entity recognition for spoken finnish. In: Proceedings of 2nd International Workshop on AI for Smart TV Content Production Access and Delivery (AI4TV), pp. 25–29 (2020). https://doi.org/10.1145/3422839.3423066
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Braun, S. et al. (2021). When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience. In: Stephanidis, C., et al. HCI International 2021 - Late Breaking Papers: Cognition, Inclusion, Learning, and Culture. HCII 2021. Lecture Notes in Computer Science(), vol 13096. Springer, Cham. https://doi.org/10.1007/978-3-030-90328-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-90328-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90327-5
Online ISBN: 978-3-030-90328-2
eBook Packages: Computer ScienceComputer Science (R0)