Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1328491.1328541acmconferencesArticle/Chapter ViewAbstractPublication Pagesi-createConference Proceedingsconference-collections
research-article

Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Published: 23 April 2007 Publication History

Abstract

One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.

References

[1]
A. S. Andreou, C. Leonidou, C. Chrysostomou, A. Pitsillides, G. Samaras, C. N. Schizas, and S. M. Mavromous, "Key issues for the design and development of mobile commerce services and applications", In International Journal of Mobile Communications, Vol. 3 (3), pp.303--323, 2005.
[2]
G. C. M Black, K. Morten, A. Laborde, J. Poulton, "Leber's hereditary optic neuropathy: heteroplasmy is likely to be significant in the expression of LHON in families with the 3460 ND1 mutation", In British Journal of Ophthalmology, Vol. 80, pp. 915--917, 1996.
[3]
L. Bos and S. Leroy, "Toward an all-IP-based UMTS system architecture", In IEEE Network, Vol. 15 (1), pp. 36--45, 2001.
[4]
D. T. Chappell and J. H. L. Hansen, "A comparison of spectral smoothing methods for segment concatenation based speech synthesis", Speech Communication, Vol. 36, pp. 343--374, 2002.
[5]
F. Charpentier, E. Moulines, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", In Proceedings of EUROSPEECH, Vol. 2, 1989.
[6]
S. Deorowicz, and M. G. Ciura, "Correcting spelling errors by modelling their causes", In International Journal of Applied Mathematics and Computer Science, Vol. 15 (2), pp. 275--285, 2005.
[7]
FreeTTS: http://freetts.sourceforge.net/
[8]
A. Helal, S. E. Moore, and B. Ramachandran, "Drishti: An Integrated Navigation System for Visually Impaired and Disabled", In Fifth International Symposium on Wearable Computers (ISWC01), 2001.
[9]
X. S. Hua, P. Yin, H. J. Zhang, "Efficient Video Text Recognition Using Multiple Frame Integration", In International Conference on Image Processing (ICIP2002), 2002.
[10]
Q. Huo, Y. Ge, and Z. D. Feng, "High performance Chinese OCR based on Gabor features, discriminative feature extraction and model training", In ICASSP2001, pp. 1517--1520, 2001.
[11]
S. Kahan, T. Pavlidis, and H. Baird, "On the Recognition of Printed Characters of any Font or Size", In IEEE Transactions PAMI, pp. 274--287, 1987.
[12]
W. B. Kleijn, and K. K. Paliwal, "Speech Coding and Synthesis", Published by Elsevier Science B. V., 1995.
[13]
S. G. Kong, J. Heo, B. R. Abidi, J. Paik, M. A. Abidi, "Recent advances in visual and infrared face recognition - a review", In Computer Vision and Image Understanding, Vol. 97 (1), pp.103--135, 2005.
[14]
G. A Miller, R. Bechwith, C. Fellbaum, D. Gross, and K. Miller, "Introduction to WordNet: An On-line Lexical Database", In International Journal of Lexicography, pp. 235--312, 1993.
[15]
N. Sadato, A. P. Leone, J. Grafmani, V. Ibanez, M. P. Deiber, G. Dold, and M. Hallett, "Activation of the primary visual cortex by Braille reading in blind subjects", In Nature 380, pp. 526--528, 1996.
[16]
H. Sheikhzadeh, E. Cornu, R. Brennan, and T. Schneider, "Real-Time Speech Synthesis on an Ultra Low-Resource, Programmable DSP System", In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP02, pp. 433--436, 2002.
[17]
E. Wallenius and T. Hmlinen, "Pricing model for 3G/4G networks", In 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Vol. 1, pp. 187--191, 2002.

Cited By

View all
  • (2013)The Development and Evaluation of an Eyes-Free Interaction Model for Mobile Reading DevicesIEEE Transactions on Human-Machine Systems10.1109/TSMCA.2012.221041343:1(76-91)Online publication date: Jan-2013
  • (2011)The accessibility toolkitProceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software10.1145/2089131.2089136(145-148)Online publication date: 22-Oct-2011
  • (2011)Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based DecompositionJournal of Signal Processing Systems10.1007/s11265-008-0337-962:1(43-63)Online publication date: 1-Jan-2011
  • Show More Cited By

Index Terms

  1. Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    i-CREATe '07: Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
    April 2007
    272 pages
    ISBN:9781595938527
    DOI:10.1145/1328491
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multimedia content processing
    2. optical character recognition
    3. speech synthesizer

    Qualifiers

    • Research-article

    Conference

    i-CREATe07
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)The Development and Evaluation of an Eyes-Free Interaction Model for Mobile Reading DevicesIEEE Transactions on Human-Machine Systems10.1109/TSMCA.2012.221041343:1(76-91)Online publication date: Jan-2013
    • (2011)The accessibility toolkitProceedings of the 10th SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software10.1145/2089131.2089136(145-148)Online publication date: 22-Oct-2011
    • (2011)Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based DecompositionJournal of Signal Processing Systems10.1007/s11265-008-0337-962:1(43-63)Online publication date: 1-Jan-2011
    • (2008)Proactive Versus Multimodal Online HelpProceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems10.1007/978-3-540-70987-9_21(183-192)Online publication date: 29-Jul-2008

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media