Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches

Carlos Garrido-Munoz¹²,
Antonio Ríos-Vila¹² &
Jorge Calvo-Zaragoza¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13256))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1593 Accesses

Abstract

The structural richness of music notation leads to develop specific approaches to the problem of Optical Music Recognition (OMR). Among them, it is becoming common to formulate the output of the system as a graph structure, where the primitives of music notation are the vertices and their syntactic relationships are modeled as edges. As an intermediate step, many works focus on locating and categorizing the symbol primitives found in the music score image using object detection approaches. However, training these models requires precise annotations of where the symbols are located. This makes it difficult to apply these approaches to new collections, as manual annotation is very costly. In this work, we study how to extract the primitives as an image-to-multiset problem, where it is not necessary to provide fine-grained information. To do this, we implement a model based on image captioning that retrieves a sequence of music primitives found in a given image. Our experiments with the MUSCIMA++ dataset demonstrate the feasibility of this approach, obtaining good results with several models, even in situations with limited annotated data.

Project supported by a 2021 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The BBVA Foundation accepts no responsibility for the opinions, statements and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors. The second author is supported by grant ACIF/2021/356 from “Programa I+D+i de la Generalitat Valenciana”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

Music symbol recognition by a LAG-based combination model

Article 09 December 2016

Object Retrieval and Localization in Large Art Collections Using Deep Multi-style Feature Fusion and Iterative Voting

Notes

1.
A multiset is a generalization of the concept of set, which allows for repeated elements. This is our case, given that the same primitive category can appear many times in a music score image.
2.
During training, we use a teacher forcing methodology in both models.

References

Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Article Google Scholar
Calvo-Zaragoza, J., Jr., J.H., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 77:1–77:35 (2020)
Google Scholar
Jan Hajič, j., Pecina, P.: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 13 - 15, 2017. pp. 39–46. Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, IEEE Computer Society, New York, USA (2017)
Google Scholar
Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full-pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (Nov 2019)
Google Scholar
Pacha, A., Choi, K., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., Eidenberger, H.: Handwritten music object detection: open issues and baseline results. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April, 2018, pp. 163–168 (2018)
Google Scholar
Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)
Google Scholar
Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September, 2018, pp. 271–278 (2018)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015)
Google Scholar
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille, France, 07–09 July 2015
Google Scholar

Download references

Author information

Authors and Affiliations

U.I for Computer Research, University of Alicante, Alicante, Spain
Carlos Garrido-Munoz, Antonio Ríos-Vila & Jorge Calvo-Zaragoza

Authors

Carlos Garrido-Munoz
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Ríos-Vila
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Calvo-Zaragoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Ríos-Vila .

Editor information

Editors and Affiliations

University of Aveiro, Aveiro, Portugal
Armando J. Pinho
University of Aveiro, Aveiro, Portugal
Petia Georgieva
University of Porto, Porto, Portugal
Luís F. Teixeira
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garrido-Munoz, C., Ríos-Vila, A., Calvo-Zaragoza, J. (2022). Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-031-04881-4_38
Published: 26 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

Music symbol recognition by a LAG-based combination model

Object Retrieval and Localization in Large Art Collections Using Deep Multi-style Feature Fusion and Iterative Voting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

Music symbol recognition by a LAG-based combination model

Object Retrieval and Localization in Large Art Collections Using Deep Multi-style Feature Fusion and Iterative Voting

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation