Abstract
ASTER is an interactive computing system foraudio formatting electronic documents (presently, documents written in (LA)TEX) to produce audio documents. ASTER can speak both literary texts and highly technical documents that contain complex mathematics. In fact, the effective speaking of mathematics is a key goal of ASTER. To this end, a listener can request that segments of text or mathematics be spoken using several diffent rendering styles, in an interactive fashion. Listeners can themselves construct rendering rules and styles, if they feel it necessary.
In this paper, we describe the rendering component of ASTER—the system for writing rules for speaking various parts of text and mathematics-and discuss some of the principles that were used in developing rules for making spoken text, mathematics, and tables comprehensible.
Visual communication is characterized by the eye's ability to actively access parts of a two-dimensional display. The reader is active, while the display is passive. This active-passive role is reversed by the temporal nature of oral communication: information flows actively past a passive listener. This prohibits multiple views—it is impossible to first obtain a high-level view and then “look” at details. These shortcomings become severe when presenting complex mathematics orally.
Audio formatting, which renders information structure in a manner attuned to an auditory display, overcomes these problems. Audio layout, composed of fleeting and persistent cues, conveys complex structure without detracting from the content. ASTER is interactive, and the ability to browse information structure and obtain multiple views enablesactive listening.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Buxton W., Gaver W. and Bly S. (1988) The use of nonspeech audio at the interface. Tutorial Notes, CHI '88.
Buxton W. (1989) Introduction to this special issue on nonspeech audio. Human Computer Interaction 4(1):1–9.
Gaver William. (1993) Synthesizing auditory icons. Proceedings of INTERCHI 1993, pp. 228–235.
Raman, T.V. (1994)Audio System for Technical Readings. PhD thesis, Cornell University.URL http://www.research.digital.-com/CRL/personal/raman/raman.html.
Steele Guy L. (1990) Common Lisp The Language. Digital Press, Bedford, Mass, second edition.
Accredited Standards Committee X3J13. Programming Language-Common Lisp—Draft Proposed. CBEMA, 1993.URL FTP://parcftp. xerox.com/pub/cl/dpANS2.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Raman, T.V., Gries, D. Audio formatting—Making spoken text and math comprehensible. Int J Speech Technol 1, 21–31 (1995). https://doi.org/10.1007/BF02277177
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02277177