Abstract
In this paper, we present the first results of the project AMIS (Access Multilingual Information opinionS) funded by Chist-Era. The main goal of this project is to understand the content of a video in a foreign language. In this work, we consider the understanding process, such as the aptitude to capture the most important ideas contained in a media expressed in a foreign language. In other words, the understanding will be approached by the global meaning of the content of a support and not by the meaning of each fragment of a video.
Several stumbling points remain before reaching the fixed goal. They concern the following aspects: Video summarization, Speech recognition, Machine translation and Speech segmentation. All these issues will be discussed and the methods used to develop each of these components will be presented. A first implementation is achieved and each component of this system is evaluated on a representative test data. We propose also a protocol for a global subjective evaluation of AMIS.
Supported by Chist-Era (AMIS project).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baran, R., Zeja, A.: The IMCOP system for data enrichment and content discovery and delivery. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 143–146, December 2015. https://doi.org/10.1109/CSCI.2015.137
Bell, P., Lai, C., Llewellyn, C., Birch, A., Sinclair, M.: A system for automatic broadcast news summarisation, geolocation and translation. In: INTERSPEECH, pp. 730–731 (2015)
Choukri, K., Nikkhou, M., Paulsson, N.: Network of data centres (NetDc): BNSC-an Arabic broadcast news speech corpus. In: LREC (2004)
Christensen, H., Kolluru, B., Gotoh, Y., Renals, S.: From text summarisation to style-specific summarisation for broadcast news. In: European Conference on Information Retrieval, pp. 223–237. Springer (2004)
Furui, S., Kikuchi, T., Shinnaka, Y., Hori, C.: Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process. 12(4), 401–408 (2004)
Gales, M.J.: Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
González-Gallardo, C.E., Torres-Moreno, J.M.: Sentence boundary detection for French with subword-level information vectors and convolutional neural networks. arXiv preprint arXiv:1802.04559 (2018)
Gygli, M., Grabner, H., Gool, L.V.: Video summarization by learning submodular mixtures of objectives. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3090–3098, June 2015. https://doi.org/10.1109/CVPR.2015.7298928
Huang, M., Mahajan, A.B., Dementhon, D.F.: Automatic performance evaluation for video summarization. Technical report
Jouvet, D., Langlois, D., Menacer, M.A., Fohr, D., Mella, O., Smaïli, K.: Adaptation of speech recognition vocabularies for improved transcription of YouTube videos. In: Proceedings of the ICNLSSP Conference (2017)
Leszczuk, M., Grega, M., Koźbiał, A., Gliwski, J., Wasieczko, K., Smaïli, K.: Video summarization framework for newscasts and reports - work in progress. In: Dziech, A., Czyżewski, A. (eds.) Multimedia Communications, Services and Security, pp. 86–97. Springer International Publishing, Cham (2017)
Linhares Pontes, E., Huet, S., Linhares, A.C., Torres-Moreno, J.M.: Multi-sentence compression with word vertex-labeled graphs and integer linear programming. In: Proceedings of TextGraphs-12: The Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics (2018)
Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)
Maegaard, B., Choukri, K., Jørgensen, L.D., Krauwer, S.: NEMLAR: Arabic language resources and tools. In: Arabic Language Resources and Tools Conference, pp. 42–54 (2004)
Menacer, M.A., Langlois, D., Mella, O., Fohr, D., Jouvet, D., Smaïli, K.: Is statistical machine translation approach dead? In: ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, pp. 1–5. ISGA, Casablanca, December 2017. https://hal.inria.fr/hal-01660016
Menacer, M.A., Mella, O., Fohr, D., Jouvet, D., Langlois, D., Smaïli, K.: Development of the Arabic loria automatic speech recognition system (ALASR) and its evaluation for Algerian dialect. In: ACLing 2017 - 3rd International Conference on Arabic Computational Linguistics, Dubai, United Arab Emirates, pp. 1–8, November 2017. https://hal.archives-ouvertes.fr/hal-01583842
Mohri, M., Pereira, F., Riley, M.: Speech recognition with weighted finite-state transducers. In: Springer Handbook of Speech Processing, pp. 559–584. Springer (2008)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog No.: CFP11SRW-USB
Quemy, A., Jamrog, K., Janiszewski, M.: Unsupervised video semantic partitioning using IBM watson and topic modelling. In: Proceedings of the Workshops of the EDBT/ICDT 2018 Joint Conference (EDBT/ICDT 2018), pp. 44–49, March 2018
Sharghi, A., Laurel, J.S., Gong, B.: Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 2127–2136. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.229
Stolcke, A.: Entropy-based pruning of backoff language models. arXiv preprint cs/0006025 (2000)
Torres-Moreno, J.M.: Artex is another text summarizer. arXiv preprint arXiv:1210.3312 (2012)
Torres-Moreno, J.M.: Automatic Text Summarization. Wiley, London (2014)
Veselỳ, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Interspeech 2013 (2013)
Zhang, J.J., Fung, P.: Active learning with semi-automatic annotation for extractive speech summarization. ACM Trans. Speech Lang. Process. (TSLP) 8(4), 6 (2012)
Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1. 0. In: LREC (2016)
Acknowledgment
We would like to acknowledge the support of Chist-Era for funding this work through the AMIS (Access Multilingual Information opinionS) project. Research work funded by the National Science Center, Poland, conferred on the basis of the decision number DEC-2015/16/Z/ST7/00559.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Smaïli, K. et al. (2019). A First Summarization System of a Video in a Target Language. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)