Article

Free access

SpeechSkimmer: interactively skimming recorded speech

Author:

Barry AronsAuthors Info & Claims

UIST '93: Proceedings of the 6th annual ACM symposium on User interface software and technology

Pages 187 - 196

https://doi.org/10.1145/168642.168661

Published: 01 December 1993 Publication History

PDF eReader

References

[1]

Aaronson, D., Markowitz, N., and Shapiro, H. Perception and Immediate Recall of Normal and Compressed Auditory Sequences. Perception and Psychophysics 9, 4 (1971), 338-344.]]

Crossref

Google Scholar

[2]

Arons, B. Hyperspeech: Navigating in Speech-Only Hypermedia. In Hypertext '91, ACM, 1991, pp. 133-146.]]

Digital Library

Google Scholar

[3]

Arons, B. Techniques, Perception, and Applications of Time-Compressed Speech. In Proceedings of 1992 Conference, American Voice i/O Society, Sep. 1992, pp. 169-177.]]

Google Scholar

[4]

Arons, B. Tools for Building Asynchronous Servers to Support Speech and Audio Applications. In UIST '92. Proceedings of the A CM Symposium on User Interface Software and Technology, Nov. 1992, pp. 71-78.]]

Digital Library

Google Scholar

[5]

Beasley, D.S. and Maki, J.E. Time- and Frequency- Altered Speech. In Contemporary Issues in Experimental Phonetics. Academic Press, Lass, N.J., editor, Ch. 12, pp. 419--458, 1976.]]

Google Scholar

[6]

Buxton, W., Gaver, B., and Bly, S., The Use of Non- Speech Audio at the Interface, ACM SIGCHI, 199 I, Tutorial Notes.]]

Google Scholar

[7]

Chen, F.R. and Withgott, M. The Use of Emphasis to Automatically Summarize Spoken Discourse. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1992, pp. 229-233.]]

Crossref

Google Scholar

[8]

De Souza, P. A Statistical Approach to the Design of an Adaptive Self-Normalizing Silence Detector. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-31, 3 (Jun. 1983), 678-684.]]

Crossref

Google Scholar

[9]

Degen, L., Mander, R., and Salomon, G. Working with Audio: Integrating Personal Tape Recorders and Desktop Computers. In CHI '92, ACM, Apr. 1992, pp. 413-418.]]

Digital Library

Google Scholar

[10]

Fairbanks, G., Everitt, W.L., and Jaeger, R.P. Method for Time or Frequency Compression- Expansion of Speech. Transaction of the Institute of Radio Engineers, Professional Group on Audio A U-2 (1954), 7-12, Reprinted in G. Fairbanks. Experimental Phonetics: Selected Articles, University of Illinois Press, 1966.]]

Crossref

Google Scholar

[11]

Foulke, E. The Perception of Time Compressed Speech. In Perception of Language. Chm'les E. Merrill Publishing Company, Kjeldergaard, P.M., Horton, D.L., and Jenkins, J.J., editors, Ch. 4, pp. 79-107, 1971.]]

Google Scholar

[12]

Furnas, G.W. Generalized Fisheye Views. In CHI '86, ACM, 1986, pp. 16-23.]]

Digital Library

Google Scholar

[13]

Gaver, W.W. Auditory Icons: Using Sound in Computer Interfaces. Human-Computer Interaction 2 (1989), 167-177.]]

Crossref

Google Scholar

[14]

Gerber, S.E. and Wulfeck, B.H. The Limiting Effect of Discard Interval on Time-Compressed Speech. Language and Speech 20, 2 (1977), 108-115.]]

Google Scholar

[15]

Glavitsch, U. and Sch~iuble, P. A System for Retrieving Speech Documents. In 15th Annual International SIGIR '92, ACM, 1992, pp. 168--176.]]

Digital Library

Google Scholar

[16]

Gruber, J.G. A Comparison of Measured and Calculated Speech Temporal Parameters Relevant to Speech Activity Detection. iEEE Transactions on Communications COM-30, 4 (Apr. 1982), 728-738.]]

Crossref

Google Scholar

[17]

Gruber, J.G. and Le, N.H. Performance Requirements for Integrated Voice/Data Networks. IEEE Journal on Selected Areas in Communications SAC-i, 6 (Dec. 1983), 981-1005.]]

Digital Library

Google Scholar

[18]

Grudin, J. Why CSCW applications fail: Problems in the Design and Evaluation of Organizational Interfaces. In CHI '88, 1988.]]

Digital Library

Google Scholar

[19]

Heiman, G.W., Leo, R.J., Leighbody, G., and Bowler, K. Word Intelligibility Decrements and the Comprehension of Time-Compressed Speech. Perception and Psychophysics 40, 6 (1986), 407- 411.]]

Crossref

Google Scholar

[20]

Hejna Jr., D.J. Real-Time Time-Scale Modification of Speech via the Synchronized Overlap-Add Algorithm, Master's thesis, Department of Electrical Engineering and Computer Science, MIT, Feb. 1990.]]

Google Scholar

[21]

Houle, G.R., Maksymowicz, A.T., and Penafiel, H.M. Back-End Processing for Automatic Gisting Systems. In Proceedings of 1988 Conference, American Voice I/O Society, 1988.]]

Google Scholar

[22]

Jeffries, R., Miller, J.R., Wharton, C., and Uyeda, K.M. User Interface Evaluation in the Real World: A comparison of Four techniques. In CHI '91, ACM, Apr 1991, pp. 119-124.]]

Digital Library

Google Scholar

[23]

Lamel, L.F., Rabiner, L.R., Rosenberg, A.E., and Wilpon, J.G. An Improved Endpoint Detec~tor for Isolated Word Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-29, 4 (Aug. 1981), 777-785.]]

Crossref

Google Scholar

[24]

Lass, N.J. and Leeper, H.A. Listening Rate Preference: Comparison of Two Time Alteration Techniques. Perceptual and Motor Skills 44 (1977), 1163-1168.]]

Crossref

Google Scholar

[25]

Lee, H.H. and Un, C.K. A Study of on-off Characteristics of Conversational Speech. IEEE Transactions on Communications COM-34, 6 (Jun. 1986), 630-637.]]

Google Scholar

[26]

Levelt, W.J.M. Speaking: From Intention to Articulation, MIT Press (1989).]]

Google Scholar

[27]

Lynch Jr., J.F., Josenhans, J.G., and Crochiere, R.E. Speech/Silence Segmentation for Real-Time Coding via Rule Based Adaptive Endpoint Detection. In Proceedings of the international Conference on Acoustics, Speech, and Signal Processing, IEEE, 1987, pp. 1348-1351.]]

Crossref

Google Scholar

[28]

Mackinlay, J.D., Robertson, G.G., and Card, S.K. The Perspective Wall: Detail and Context Smoothly Integrated. In CHi '91, ACM, 1991, pp. 173-179.]]

Digital Library

Google Scholar

[29]

UnMouse User's Manual, Microtouch Systems Inc., Wilmington, MA.]]

Google Scholar

[30]

Mills, M., Cohen, J., and Wong, Y.Y. A Magnifier Tool for Video Data. In CHI '92, ACM, Apr. 1992, pp. 93-98.]]

Digital Library

Google Scholar

[31]

Minifie, F.D. Durational Aspects of Connected Speech Samples. In Time-Compressed Speech. Scarecrow, Duker, S., editor, pp. 709-715, 1974.]]

Google Scholar

[32]

Neuburg, E.P. Simple Pitch-Dependent Algorithm for High Quality Speech Rate Changing. Journal of the Acoustic Society of America 63, 2 (1978), 624-625.]]

Crossref

Google Scholar

[33]

O'Shaughnessy, D. Speech Communication: Human and Machine, Addison-Wesley (1987).]]

Google Scholar

[34]

O'Shaughnessy, D. Recognition of Hesitations in Spontaneous Speech. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, iEEE, 1992, pp. 1521-1524.]]

Google Scholar

[35]

Rabiner, L.R. and Sambur, M.R. An Algorithm for Determining the Endpoints of Isolated Utterances. The Bell System Technical Journal 54, 2 (Feb. 1975), 297-315.]]

Google Scholar

[36]

Reich, S.S. Significance of Pauses for Speech Perception. Journal of Psycholinguistic Research 9, 4 (1980), 379-389.]]

Crossref

Google Scholar

[37]

Resnick, P. and Virzi, R.A. Skip and Scan: Cleaning Up Telephone Interfaces. In CH1 '92, ACM, Apr. 1992, pp. 419-426.]]

Digital Library

Google Scholar

[38]

Rose, R.C. Techniques for Information Retrieval from Speech Messages. The Lincoln Lab Journal 4, 1 (1991), 45-60.]]

Digital Library

Google Scholar

[39]

Roucos, S. and Wilgus, A.M. High Quality Time- Scale Modification for Speech. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, iEEE, 1985, pp. 493-496.]]

Crossref

Google Scholar

[40]

Savoji, M.H. A Robust Algorithm for Accurate Endpointing of Speech Signals. Speech Communication 8 (1989), 45-60.]]

Digital Library

Google Scholar

[41]

Schmandt, C. and Arons, B. A Conversational Telephone Messaging System. IEEE Transactions on Consumer Electronics CE-30, 3 (Aug. 1984), xxixxiv.]]

Google Scholar

[42]

Scott, R.J. Time Adjustment in Speech Synthesis. Journal of the Acoustic Society of America 41, 1 (1967), 60-65.]]

Crossref

Google Scholar

[43]

Stifelman, L.J., Arons, B., Schmandt, C., and Hulteen, E.A. VoiceNotes: A Speech Interface for a Hand-Held Voice Notetaker. In Proceedings of INTERCHI Conference, ACM SIGCHi, 1993.]]

Digital Library

Google Scholar

[44]

Wightman, C.W. and Ostendorf, M. Automatic Recognition of Intonational Features. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1992, pp. 1221-1224.]]

Crossref

Google Scholar

[45]

Wilcox, L., Smith, I., and Bush, M. Wordspotting for Voice Editing and Audio Indexing. In CHI '92, ACM SIGCHI, 1992, pp. 655-656.]]

Digital Library

Google Scholar

Cited By

View all

Trippas JSpina DSanderson MCavedon L(2021)Accessing Media Via an Audio-only Communication Channel: A Log AnalysisProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469623(1-6)Online publication date: 27-Jul-2021
https://dl.acm.org/doi/10.1145/3469595.3469623
Khan TYoon DMcGrenere JBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Designing an Eyes-Reduced Document Skimming App for Situational ImpairmentsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376641(1-14)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376641
Arawjo IYoon DGuimbretière FLee CPoltrock SBarkhuus LBorges MKellogg W(2017)TypeTalkerProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998260(1970-1981)Online publication date: 25-Feb-2017
https://dl.acm.org/doi/10.1145/2998181.2998260
Show More Cited By

Index Terms

SpeechSkimmer: interactively skimming recorded speech

Recommendations

SpeechSkimmer: a system for interactively skimming recorded speech
Special issue on speech as data

Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored ...
Hyperspeech
CHI '93: Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems

Hyperspeech is a speech-only hypermedia application that explores issues of speech user interfaces, navigation, and system architecture in a purely audio environment without a visual display. The system uses speech recognition input and synthetic speech ...
Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

UIST '93: Proceedings of the 6th annual ACM symposium on User interface software and technology

December 1993

267 pages

ISBN:089791628X

DOI:10.1145/168642

Chairmen:
Scott Hudson
Georgia Institute of Technology, Atlanta
,
Randy Pausch
Univ. of Virginia, Charlottesville
,
Brad Vander Zanden
Univ. of Tennessee, Philadelphia
,
James Foley
Georgia Institute of Technology, Atlanta

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 1993

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PRS93

Sponsor:

PRS93: Parallel Rendering Symposium

Georgia, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
756
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)12

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Trippas JSpina DSanderson MCavedon L(2021)Accessing Media Via an Audio-only Communication Channel: A Log AnalysisProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469623(1-6)Online publication date: 27-Jul-2021
https://dl.acm.org/doi/10.1145/3469595.3469623
Khan TYoon DMcGrenere JBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Designing an Eyes-Reduced Document Skimming App for Situational ImpairmentsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376641(1-14)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376641
Arawjo IYoon DGuimbretière FLee CPoltrock SBarkhuus LBorges MKellogg W(2017)TypeTalkerProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998260(1970-1981)Online publication date: 25-Feb-2017
https://dl.acm.org/doi/10.1145/2998181.2998260
KAHNG J(2017)The effect of pause location on perceived fluencyApplied Psycholinguistics10.1017/S014271641700053439:3(569-591)Online publication date: 23-Nov-2017
https://doi.org/10.1017/S0142716417000534
Sivaraman VYoon DMitros PKaye JDruin ALampe CMorris DHourcade J(2016)Simplified Audio Production in Asynchronous Voice-Based DiscussionsProceedings of the 2016 CHI Conference on Human Factors in Computing Systems10.1145/2858036.2858416(1045-1054)Online publication date: 7-May-2016
https://dl.acm.org/doi/10.1145/2858036.2858416
(2014)BibliographySemantic Multimedia Analysis and Processing10.1201/b17080-21(421-512)Online publication date: 18-Jun-2014
https://doi.org/10.1201/b17080-21
Wigdor D(2014)Input/Output Devices and Interaction TechniquesComputing Handbook, Third Edition10.1201/b16812-25(1-54)Online publication date: 8-May-2014
https://doi.org/10.1201/b16812-25
Abdulhamid FMarshall S(2013)Treemaps to visualise and navigate speech audioProceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration10.1145/2541016.2541021(555-564)Online publication date: 25-Nov-2013
https://dl.acm.org/doi/10.1145/2541016.2541021
Hinckley KWigdor DJacko J(2012)Input Technologies and TechniquesHuman–Computer Interaction Handbook10.1201/b11963-9(95-132)Online publication date: 14-May-2012
https://doi.org/10.1201/b11963-9
Harrison CHorstman JHsieh GHudson SKonstan JChi EHöök K(2012)Unlocking the expressivity of point lightsProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/2207676.2208296(1683-1692)Online publication date: 5-May-2012
https://dl.acm.org/doi/10.1145/2207676.2208296
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

SpeechSkimmer: a system for interactively skimming recorded speech

Hyperspeech

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News