research-article

Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Authors:

Wendy Yen-Ni Ng,

Wilson PangAuthors Info & Claims

i-CREATe '07: Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting

Pages 201 - 206

https://doi.org/10.1145/1328491.1328541

Published: 23 April 2007 Publication History

Abstract

One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.

References

[1]

A. S. Andreou, C. Leonidou, C. Chrysostomou, A. Pitsillides, G. Samaras, C. N. Schizas, and S. M. Mavromous, "Key issues for the design and development of mobile commerce services and applications", In International Journal of Mobile Communications, Vol. 3 (3), pp.303--323, 2005.

Digital Library

[2]

G. C. M Black, K. Morten, A. Laborde, J. Poulton, "Leber's hereditary optic neuropathy: heteroplasmy is likely to be significant in the expression of LHON in families with the 3460 ND1 mutation", In British Journal of Ophthalmology, Vol. 80, pp. 915--917, 1996.

[3]

L. Bos and S. Leroy, "Toward an all-IP-based UMTS system architecture", In IEEE Network, Vol. 15 (1), pp. 36--45, 2001.

Digital Library

[4]

D. T. Chappell and J. H. L. Hansen, "A comparison of spectral smoothing methods for segment concatenation based speech synthesis", Speech Communication, Vol. 36, pp. 343--374, 2002.

Digital Library

[5]

F. Charpentier, E. Moulines, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", In Proceedings of EUROSPEECH, Vol. 2, 1989.

[6]

S. Deorowicz, and M. G. Ciura, "Correcting spelling errors by modelling their causes", In International Journal of Applied Mathematics and Computer Science, Vol. 15 (2), pp. 275--285, 2005.

[7]

FreeTTS: http://freetts.sourceforge.net/

[8]

A. Helal, S. E. Moore, and B. Ramachandran, "Drishti: An Integrated Navigation System for Visually Impaired and Disabled", In Fifth International Symposium on Wearable Computers (ISWC01), 2001.

Digital Library

[9]

X. S. Hua, P. Yin, H. J. Zhang, "Efficient Video Text Recognition Using Multiple Frame Integration", In International Conference on Image Processing (ICIP2002), 2002.

[10]

Q. Huo, Y. Ge, and Z. D. Feng, "High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training", In ICASSP2001, pp. 1517--1520, 2001.

Digital Library

[11]

S. Kahan, T. Pavlidis, and H. Baird, "On the Recognition of Printed Characters of any Font or Size", In IEEE Transactions PAMI, pp. 274--287, 1987.

Digital Library

[12]

W. B. Kleijn, and K. K. Paliwal, "Speech Coding and Synthesis", Published by Elsevier Science B. V., 1995.

Digital Library

[13]

S. G. Kong, J. Heo, B. R. Abidi, J. Paik, M. A. Abidi, "Recent advances in visual and infrared face recognition - a review", In Computer Vision and Image Understanding, Vol. 97 (1), pp.103--135, 2005.

Digital Library

[14]

G. A Miller, R. Bechwith, C. Fellbaum, D. Gross, and K. Miller, "Introduction to WordNet: An On-line Lexical Database", In International Journal of Lexicography, pp. 235--312, 1993.

[15]

N. Sadato, A. P. Leone, J. Grafmani, V. Ibanez, M. P. Deiber, G. Dold, and M. Hallett, "Activation of the primary visual cortex by Braille reading in blind subjects", In Nature 380, pp. 526--528, 1996.

[16]

H. Sheikhzadeh, E. Cornu, R. Brennan, and T. Schneider, "Real-Time Speech Synthesis on an Ultra Low-Resource, Programmable DSP System", In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP02, pp. 433--436, 2002.

[17]

E. Wallenius and T. Hmlinen, "Pricing model for 3G/4G networks", In 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 1, pp. 187--191, 2002.

Cited By

Keefer RLiu YBourbakis N(2013)The Development and Evaluation of an Eyes-Free Interaction Model for Mobile Reading DevicesIEEE Transactions on Human-Machine Systems10.1109/TSMCA.2012.221041343:1(76-91)Online publication date: Jan-2013
https://doi.org/10.1109/TSMCA.2012.2210413
Alabi HGooch BHirschfeld RVisser E(2011)The accessibility toolkitProceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software10.1145/2089131.2089136(145-148)Online publication date: 22-Oct-2011
https://dl.acm.org/doi/10.1145/2089131.2089136
Nagarajan KHolland BGeorge ASlatton KLam H(2011)Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based DecompositionJournal of Signal Processing Systems10.1007/s11265-008-0337-962:1(43-63)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1007/s11265-008-0337-9
Show More Cited By

Index Terms

Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Real-time Assistive Reader Pen for Arabic Language
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Disability is an impairment affecting an individual's livelihood and independence. Assistive technology enables the disabled cohort of the community to break the barriers to learning, access information, contribute to the community, and live ...
Automatic processing of Arabic text
IIT'09: Proceedings of the 6th international conference on Innovations in information technology

Automatic recognition of printed and handwritten documents remains an active area of research. Arabic is one of the languages that present special problems. Arabic is cursive and therefore necessitates a segmentation process to determine the boundaries ...
A segmentation-free approach to text recognition with application to Arabic text

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

i-CREATe '07: Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting

April 2007

272 pages

ISBN:9781595938527

DOI:10.1145/1328491

General Chair:
Zen T. H. KOH
START Centre (Singapore)

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

i-CREATe07

Sponsor:

SIGACCESS

i-CREATe07: International Conference for Rehabilitation Engineering & Assistive Technology

April 23 - 26, 2007

Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
378
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Keefer RLiu YBourbakis N(2013)The Development and Evaluation of an Eyes-Free Interaction Model for Mobile Reading DevicesIEEE Transactions on Human-Machine Systems10.1109/TSMCA.2012.221041343:1(76-91)Online publication date: Jan-2013
https://doi.org/10.1109/TSMCA.2012.2210413
Alabi HGooch BHirschfeld RVisser E(2011)The accessibility toolkitProceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software10.1145/2089131.2089136(145-148)Online publication date: 22-Oct-2011
https://dl.acm.org/doi/10.1145/2089131.2089136
Nagarajan KHolland BGeorge ASlatton KLam H(2011)Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based DecompositionJournal of Signal Processing Systems10.1007/s11265-008-0337-962:1(43-63)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1007/s11265-008-0337-9
Simonin JCarbonell N(2008)Proactive Versus Multimodal Online HelpProceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems10.1007/978-3-540-70987-9_21(183-192)Online publication date: 29-Jul-2008
https://dl.acm.org/doi/10.1007/978-3-540-70987-9_21

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents