Article

Towards automatic transcription of expressive oral percussive performances

Author:

Amaury HazanAuthors Info & Claims

IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

Pages 296 - 298

https://doi.org/10.1145/1040830.1040904

Published: 10 January 2005 Publication History

Get Access

Abstract

We describe a tool for transcribing voice generated percussive rhythms. The system consists of: (a) a segmentation component which separates the monophonic input stream into percussive events (b) a descriptors generation component that computes a set of acoustic features from each of the extracted segments, (c) a machine learning component which assigns to each of the segmented sounds of the input stream a symbolic class. We describe each of these components and compare different machine learning strategies that can be used to obtain a symbolic representation of the oral percussive performance.

References

[1]

Herrera, P. and Dehamel, A. and Gouyon, F. (2003). Automatic labeling of unpitched percussion Sounds, Proceedings of Audio Engineering Society, 114th Convention, Amsterdam, The Netherlands.

Google Scholar

[2]

Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, San Francisco, Morgan Kaufmann.

Digital Library

Google Scholar

[3]

Kapur, A., Benning, M., Tzanetakis, G. (2004) Query-By-Beat-Boxing: Music Retrieval for the DJ. Proc. of the 5th International Conference on Music Information Retrieval (ISMIR 2004), Barcelona, October 10-14.

Google Scholar

[4]

Nakano, T., Ogata, J., Goto, M., Hiraga, Y. (2004) A Drum Pattern Retrieval Method by Voice Percussion. Proc. of the 5th International Conference on Music Information Retrieval (ISMIR 2004), Barcelona, October 10-14.

Google Scholar

[5]

Brossier, P., Bello, J., Plumbley, M. (2004). Fast Labelling of Notes in Music Signals. Proc. of the 5th International Conference on Music Information Retrieval (ISMIR 2004), Barcelona, October 10-14.

Google Scholar

[6]

Klapuri, A. (1999). Sound Onset Detection by Applying Psychoacoustic Knowledge, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP.

Digital Library

Google Scholar

[7]

Hazan, A. (2004) Interfaz oral para el reconocimiento de ritmos. Master thesis. Facultat d'Informatica de Barcelona, February 2004.

Google Scholar

[8]

Gouyon, F. Herrera, P. (2003). Determination of the Meter of musical audio signals: Seeking recurrences in beat segment descriptors. Proceedings of Audio Engineering Society, 114th Convention Amsterdam, The Netherlands.

Google Scholar

[9]

Gillet, O. Richard,G. (2003). Automatic Labelling of Tabla Signals, Proc of ISMIR 2003, Baltimore, USA Oct. 2003.

Google Scholar

[10]

Paulus, J., Klapuri, A. (2003) Model-Based Event Labeling in the Transcription Of Percussive Audio Signals Proc. of the 6th Int. Conference on Digital Audio Effects (DAFX-03), London, UK, September 8-11.

Google Scholar

[11]

Ramirez, R., Hazan, A., Gómez, E., Maestre, E. (2004). Understanding Expressive Transformations in Monophonic Musical Phrases. Proc. of Sound and Music Computing (SMC'04), Paris, October 20-22.

Google Scholar

Cited By

View all

Evain SLecouteux BSchwab DContesse APinchaud AHenrich Bernardoni N(2021)Human beatbox sound recognition using an automatic speech recognition toolkitBiomedical Signal Processing and Control10.1016/j.bspc.2021.10246867(102468)Online publication date: May-2021
https://doi.org/10.1016/j.bspc.2021.102468
Delgado AMcDonald SXu NSandler M(2019)A New Dataset for Amateur Vocal Percussion AnalysisProceedings of the 14th International Audio Mostly Conference: A Journey in Sound10.1145/3356590.3356844(17-23)Online publication date: 18-Sep-2019
https://dl.acm.org/doi/10.1145/3356590.3356844
Wu CDittmar CSouthall CVogl RWidmer GHockman JMuller MLerch A(2018)A Review of Automatic Drum TranscriptionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.283011326:9(1457-1483)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2830113
Show More Cited By

Index Terms

Towards automatic transcription of expressive oral percussive performances

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Automatic phonetic transcription of large speech corpora

Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical 'canonical' phonetic transcription from the ...
An enhanced method for dialect transcription via error‐correcting thesaurus
Abstract
Automatic speech recognition (ASR) has been widely used in the field of customer service, but the performance of general ASR in dialect transcription is not satisfactory, especially in Guangdong Province. Targeted training of ASR transcription ...
This paper puts forward an automatic speech recognition (ASR) post‐transcription optimization technique based on a glossary. By adding an optimization module to the ASR engine established by the current popular statistical methods, that is, building a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

January 2005

344 pages

ISBN:1581138946

DOI:10.1145/1040830

General Chair:
Rob St. Amant
North Carolina State University, USA
,
Program Chairs:
John Riedl
University of Minnesota, USA
,
Anthony Jameson
DFKI and International University in Germany

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

IUI05

Sponsor:

IUI05: Tenth International Conference on Intelligent User Interfaces

January 10 - 13, 2005

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
326
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Evain SLecouteux BSchwab DContesse APinchaud AHenrich Bernardoni N(2021)Human beatbox sound recognition using an automatic speech recognition toolkitBiomedical Signal Processing and Control10.1016/j.bspc.2021.10246867(102468)Online publication date: May-2021
https://doi.org/10.1016/j.bspc.2021.102468
Delgado AMcDonald SXu NSandler M(2019)A New Dataset for Amateur Vocal Percussion AnalysisProceedings of the 14th International Audio Mostly Conference: A Journey in Sound10.1145/3356590.3356844(17-23)Online publication date: 18-Sep-2019
https://dl.acm.org/doi/10.1145/3356590.3356844
Wu CDittmar CSouthall CVogl RWidmer GHockman JMuller MLerch A(2018)A Review of Automatic Drum TranscriptionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.283011326:9(1457-1483)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2830113
Goto M(2014)Singing information processing2014 12th International Conference on Signal Processing (ICSP)10.1109/ICOSP.2014.7015431(2431-2438)Online publication date: Oct-2014
https://doi.org/10.1109/ICOSP.2014.7015431

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Automatic phonetic transcription of large speech corpora

An enhanced method for dialect transcription via error‐correcting thesaurus