When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13096))

Included in the following conference series:

International Conference on Human-Computer Interaction

1666 Accesses
1 Citations

Abstract

This paper reports on a user-experience study undertaken as part of the H2020 project MeMAD (‘Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy’), in which multimedia content describers from the television and archive industries tested Flow, an online platform, designed to assist the post-editing of automatically generated data, in order to enhance the production of archival descriptions of film content. Our study captured the participant experience using screen recordings, the User Experience Questionnaire (UEQ), a benchmarked interactive media questionnaire and focus group discussions, reporting a broadly positive post-editing environment. Users designated the platform’s role in the collation of machine-generated content descriptions, transcripts, named-entities (location, persons, organisations) and translated text as helpful and likely to enhance creative outputs in the longer term. Suggestions for improving the platform included the addition of specialist vocabulary functionality, shot-type detection, film-topic labelling, and automatic music recognition. The limitations of the study are, most notably, the current level of accuracy achieved in computer vision outputs (i.e. automated video descriptions of film material) which has been hindered by the lack of reliable and accurate training data, and the need for a more narratively oriented interface which allows describers to develop their storytelling techniques and build descriptions which fit within a platform-hosted storyboarding functionality. While this work has value in its own right, it can also be regarded as paving the way for the future (semi)automation of audio descriptions to assist audiences experiencing sight impairment, cognitive accessibility difficulties or for whom ‘visionless’ multimedia consumption is their preferred option.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Layered Audio Descriptions for Videos

Audio Description: Concepts, Theories and Research Approaches

LiveDescribe Web Redefining What and How Entertainment Content Can Be Accessible to Blind and Low Vision Audiences

References

Braun, S., Starr, K.: Finding the right words: investigating machine-generated video description quality using a human-derived corpus-based approach. J. Audiov. Transl. 2(2), 11–25 (2019). https://doi.org/10.47476/jat.v2i2.103
Article Google Scholar
Starr, K., Braun, S., Delfani, J.: Taking a cue from the human: linguistic and visual prompts for the automatic sequencing of multimodal narrative. J. Audiov. Transl. 3(2), 140–169 (2020). https://doi.org/10.47476/jat.v3i2.2020.138
Article Google Scholar
Huang, T.H., et al.: Visual storytelling. In: Proceedings of NAACL-HLT, San Diego, California, 12–17 June, pp. 1233–1239 (2016). https://doi.org/10.18653/v1/N16-1147
Park, J.S., Rohrbach, M., Darrell, T., Rohrbach, A.: Adversarial inference for multi-sentence video description. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6598–6608 (2019). https://doi.org/10.1109/CVPR.2019.00676
Laaksonen, J., Guo, Z.: PicSOM experiments in TRECVID 2020. In: TRECVID 2020 Workshop, 17–19 November, Online Conference (2020)
Google Scholar
Limecraft homepage. https://www.limecraft.com/. Accessed 09 June 2021
Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9_6
Chapter Google Scholar
Lisena, P., Laaksonen, J., Troncy, R.: FaceRec: an interactive framework for face recognition in video archives. In: 2nd International Workshop on Data-driven Personalisation of Television (DataTV) Collocated with the ACM International Conference on Interactive Media Experiences (IMX 2021), 21–23 June 2021, forthcoming. https://doi.org/10.5281/zenodo.4764633
Harrando, I., Troncy, R.: Named entity recognition as graph classification. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12739, pp. 103–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80418-3_19
Chapter Google Scholar
Porjazovski, D., Leinonen, J., Kurimo, M.: Named entity recognition for spoken finnish. In: Proceedings of 2nd International Workshop on AI for Smart TV Content Production Access and Delivery (AI4TV), pp. 25–29 (2020). https://doi.org/10.1145/3422839.3423066

Download references

Author information

Authors and Affiliations

University of Surrey, Guildford, GU2 7XH, UK
Sabine Braun, Kim Starr & Jaleh Delfani
University of Helsinki, Yliopistonkatu 4, 00100, Helsinki, Finland
Liisa Tiittula
Aalto University, 02150, Espoo, Finland
Jorma Laaksonen
Limecraft, Sint-Salvatorstraat 18b/301, 9000, Gent, Belgium
Karel Braeckman & Dieter Van Rijsselbergen
YLE, Media House Uutiskatu 5, 00240, Helsinki, Finland
Sasha Lagrillière & Lauri Saarikoski

Authors

Sabine Braun
View author publications
You can also search for this author in PubMed Google Scholar
Kim Starr
View author publications
You can also search for this author in PubMed Google Scholar
Jaleh Delfani
View author publications
You can also search for this author in PubMed Google Scholar
Liisa Tiittula
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Laaksonen
View author publications
You can also search for this author in PubMed Google Scholar
Karel Braeckman
View author publications
You can also search for this author in PubMed Google Scholar
Dieter Van Rijsselbergen
View author publications
You can also search for this author in PubMed Google Scholar
Sasha Lagrillière
View author publications
You can also search for this author in PubMed Google Scholar
Lauri Saarikoski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabine Braun .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
Coventry University, Coventry, UK
Don Harris
Safety and Accident Investigation Centre, Cranfield University, Cranfield, UK
Wen-Chin Li
Soar Technology Inc., Orlando, FL, USA
Dylan D. Schmorrow
Design Interactive, Inc., Orlando, FL, USA
Cali M. Fidopiastis
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
Tsinghua University, Beijing, China
Qin Gao
Chongqing University, Chongqing, China
Jia Zhou
Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus
Panayiotis Zaphiris
Department of Multimedia and Graphic Arts, Cyprus University of Technology, Limassol, Cyprus
Andri Ioannou
Soar Technology Inc., Orlando, FL, USA
Robert A. Sottilare
MMS, Fraunhofer FKIE, Wachtberg, Nordrhein-Westfalen, Germany
Jessica Schwarz
Industrial Design, Eindhoven University of Technology, Eindhoven, Noord-Brabant, The Netherlands
Matthias Rauterberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Braun, S. et al. (2021). When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience. In: Stephanidis, C., et al. HCI International 2021 - Late Breaking Papers: Cognition, Inclusion, Learning, and Culture. HCII 2021. Lecture Notes in Computer Science(), vol 13096. Springer, Cham. https://doi.org/10.1007/978-3-030-90328-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-90328-2_10
Published: 13 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90327-5
Online ISBN: 978-3-030-90328-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

Abstract

Access this chapter

Subscribe and save

Buy Now