Article

Software usando reconhecimento e síntese de voz: o estado da arte para o Português brasileiro

Authors:

Erick SousaAuthors Info & Claims

CLIHC '05: Proceedings of the 2005 Latin American conference on Human-computer interaction

Pages 326 - 331

https://doi.org/10.1145/1111360.1111396

Published: 23 October 2005 Publication History

Abstract

Speech is a natural interface for human-computer interaction. Speech (or voice) technology is a well-developed field when one considers the international community. There is a wide variety of academic and industrial software. The majority of them assumes a recognizer or synthesizer is available, and can be programmed through an API. In contrast, there are few resources in public domain for Brazilian Portuguese. This work discusses some of these issues and compares SAPI and JSAPI, which are APIs promoted by Microsoft and Sun, respectively. We also present two examples: a tic-tac-toe JSAPI-based game using Portuguese digits recognition and a computer-aided language learning (CALL) application using SAPI-based speech recognition in English and synthesis in Portuguese.

References

[1]

J. Allen, M. S. Hunnicutt, D. H. Klatt, R. C. Armstrong, and D. B. Pisoni. From text to speech: The MITalk system. Cambridge University Press, 1987.

Digital Library

[2]

http://www.research.att.com/projects/tts/demo.html, Visited in March, 2005.

[3]

A. Black, K. Lenzo, and V. Pagel. Issues in building general letter to sound rules. In ESCA Synthesis Workshop, Australia 1998, 1998.

[4]

http://cslu.cse.ogi.edu/toolkit/, Visited in March, 2005.

[5]

T. Dutoit. An Introduction to Text-To-Speech Synthesis. Kluwer, 2001.

Digital Library

[6]

http://htk.eng.ac.uk, 2005.

[7]

http://hts.ics.nitech.ac.jp/, Visited in March, 2005.

[8]

X. Huang, A. Acero, and H.-W. Hon. Spoken language processing. Prentice-Hall, 2001.

[9]

S. Isard and D. Miller. Diphone synthesis techniques. In Proceedings of the IEE International Conference on Speech Input/Output, pages 77--82, 1986.

[10]

http://java.sun.com/products/java-media/speech/, Visited in March, 2005.

[11]

http://www.ldc.upenn.edu., Visited in March, 2005.

[12]

K. A. Lenzo and A. W. Black. Diphone collection and synthesis. In ICSLP, 2000.

[13]

L. Pessoa, F. Violaro, and P. Barbosa. Modelo de língua baseado em gramática gerativa aplicado ao reconhecimento de fala contínua. In XVII Simpósio Brasileiro de Telecomunicações, pages 455--458, 1999.

[14]

Patent EP984430 - speech recognizer with lexicon updateable by spelled word input, http://gauss.ffii.org/patentview/ep984430, 2004.

[15]

P. L. Rodrigues, B. Feijó, and L. Velho. Expressive talking heads: uma ferramenta de animação com fala e expressão facial sincronizadas para o desenvolvimento de aplicações interativas. In Proceedings of Webmídia. SBC, 2004.

[16]

http://www.microsoft.com/speech/, Visited in March, 2005.

[17]

http://cmusphinx.sourceforge.net/sphinx4/, 2005.

Cited By

Ferreira Mde Souza JRoesler VValdeni de Lima JSaibel Santos CWillrich R(2017)Use of Automatic Speech Recognition Systems for Multimedia ApplicationsProceedings of the 23rd Brazillian Symposium on Multimedia and the Web10.1145/3126858.3131630(33-36)Online publication date: 17-Oct-2017
https://dl.acm.org/doi/10.1145/3126858.3131630
dos Santos FBarone DAdami A(2010)A baseline system for continuous speech recognition of Brazilian Prtuguese using the west point Brazilian Portuguese speech corpusProceedings of the 9th international conference on Computational Processing of the Portuguese Language10.1007/978-3-642-12320-7_18(132-141)Online publication date: 27-Apr-2010
https://dl.acm.org/doi/10.1007/978-3-642-12320-7_18
Munzlinger EForster CFilho JGomes RGerosa MGuizzardi R(2008)Desenvolvimento e avaliação de um sistema multimodal e multiusuário de navegação webCompanion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web10.1145/1809980.1809989(29-32)Online publication date: 26-Oct-2008
https://dl.acm.org/doi/10.1145/1809980.1809989

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CLIHC '05: Proceedings of the 2005 Latin American conference on Human-computer interaction

October 2005

361 pages

ISBN:1595932240

DOI:10.1145/1111360

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Tecnologia Virtual
SIG-CHI Mexico
SIG-CHI Brazil
Create-Net
Microsoft Research: Microsoft Research
SMCC
ITESM Cuernavaca
Pullman de Morelos

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 14 of 42 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
940
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ferreira Mde Souza JRoesler VValdeni de Lima JSaibel Santos CWillrich R(2017)Use of Automatic Speech Recognition Systems for Multimedia ApplicationsProceedings of the 23rd Brazillian Symposium on Multimedia and the Web10.1145/3126858.3131630(33-36)Online publication date: 17-Oct-2017
https://dl.acm.org/doi/10.1145/3126858.3131630
dos Santos FBarone DAdami A(2010)A baseline system for continuous speech recognition of Brazilian Prtuguese using the west point Brazilian Portuguese speech corpusProceedings of the 9th international conference on Computational Processing of the Portuguese Language10.1007/978-3-642-12320-7_18(132-141)Online publication date: 27-Apr-2010
https://dl.acm.org/doi/10.1007/978-3-642-12320-7_18
Munzlinger EForster CFilho JGomes RGerosa MGuizzardi R(2008)Desenvolvimento e avaliação de um sistema multimodal e multiusuário de navegação webCompanion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web10.1145/1809980.1809989(29-32)Online publication date: 26-Oct-2008
https://dl.acm.org/doi/10.1145/1809980.1809989

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten