Nothing Special   »   [go: up one dir, main page]

Skip to main content

Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13256))

Included in the following conference series:

  • 1593 Accesses

Abstract

The structural richness of music notation leads to develop specific approaches to the problem of Optical Music Recognition (OMR). Among them, it is becoming common to formulate the output of the system as a graph structure, where the primitives of music notation are the vertices and their syntactic relationships are modeled as edges. As an intermediate step, many works focus on locating and categorizing the symbol primitives found in the music score image using object detection approaches. However, training these models requires precise annotations of where the symbols are located. This makes it difficult to apply these approaches to new collections, as manual annotation is very costly. In this work, we study how to extract the primitives as an image-to-multiset problem, where it is not necessary to provide fine-grained information. To do this, we implement a model based on image captioning that retrieves a sequence of music primitives found in a given image. Our experiments with the MUSCIMA++ dataset demonstrate the feasibility of this approach, obtaining good results with several models, even in situations with limited annotated data.

Project supported by a 2021 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The BBVA Foundation accepts no responsibility for the opinions, statements and contents included in the project and/or the results thereof, which are entirely the responsibility of the authors. The second author is supported by grant ACIF/2021/356 from “Programa I+D+i de la Generalitat Valenciana”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A multiset is a generalization of the concept of set, which allows for repeated elements. This is our case, given that the same primitive category can appear many times in a music score image.

  2. 2.

    During training, we use a teacher forcing methodology in both models.

References

  1. Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)

    Article  Google Scholar 

  2. Calvo-Zaragoza, J., Jr., J.H., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 77:1–77:35 (2020)

    Google Scholar 

  3. Jan Hajič, j., Pecina, P.: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 13 - 15, 2017. pp. 39–46. Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University, IEEE Computer Society, New York, USA (2017)

    Google Scholar 

  4. Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full-pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (Nov 2019)

    Google Scholar 

  5. Pacha, A., Choi, K., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., Eidenberger, H.: Handwritten music object detection: open issues and baseline results. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, 24–27 April, 2018, pp. 163–168 (2018)

    Google Scholar 

  6. Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)

    Google Scholar 

  7. Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September, 2018, pp. 271–278 (2018)

    Google Scholar 

  8. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015)

    Google Scholar 

  9. Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille, France, 07–09 July 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Ríos-Vila .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garrido-Munoz, C., Ríos-Vila, A., Calvo-Zaragoza, J. (2022). Retrieval of Music-Notation Primitives via Image-to-Sequence Approaches. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04881-4_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04880-7

  • Online ISBN: 978-3-031-04881-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics