Abstract
The structural richness of music notation leads to develop specific approaches to the problem of Optical Music Recognition (OMR). Among them, it is becoming common to formulate the output of the system as a graph structure, where the primitives of music notation are the vertices and their syntactic relationships are modeled as edges. As an intermediate step, many works focus on locating and categorizing the symbol primitives found in the music score image using object detection approaches. However, training these models requires precise annotations of where the symbols are located. This makes it difficult to apply these approaches to new collections, as manual annotation is very costly. In this work, we study how to extract the primitives as an image-to-multiset problem, where it is not necessary to provide fine-grained information. To do this, we implement a model based on image captioning that retrieves a sequence of music primitives found in a given image. Our experiments with the MUSCIMA++ dataset demonstrate the feasibility of this approach, obtaining good results with several models, even in situations with limited annotated data.
Project supported by a 2021 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The BBVA Foundation accepts no responsibility for the opinions, statements and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors. The second author is supported by grant ACIF/2021/356 from “Programa I+D+i de la Generalitat Valenciana”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A multiset is a generalization of the concept of set, which allows for repeated elements. This is our case, given that the same primitive category can appear many times in a music score image.
- 2.
During training, we use a teacher forcing methodology in both models.
References
Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Calvo-Zaragoza, J., Jr., J.H., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 77:1–77:35 (2020)
Jan Hajič, j., Pecina, P.: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 13 - 15, 2017. pp. 39–46. Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, IEEE Computer Society, New York, USA (2017)
Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full-pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (Nov 2019)
Pacha, A., Choi, K., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., Eidenberger, H.: Handwritten music object detection: open issues and baseline results. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April, 2018, pp. 163–168 (2018)
Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)
Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September, 2018, pp. 271–278 (2018)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015)
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille, France, 07–09 July 2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Garrido-Munoz, C., Ríos-Vila, A., Calvo-Zaragoza, J. (2022). Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-04881-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)