Daniel Mashao

Followers

Following

Public Views

Uploads

Papers by Daniel Mashao

Error Correction Block Coding Can Combine With Adaptive Transform Coding To Allow Coherent Speech Transmission Over Channels With Bit Error Rates of 1 in 100

IEEE South African Symposium on Communications and Signal Processing, 1990

Adapting Web Content for Telephone Users by transcoding XSLT

A Tutorial on Distributed Speech Recognition for Wireless Mobile Devices

With the expansion in wireless communication technology and the introduction of powerful smart-ph... more With the expansion in wireless communication technology and the introduction of powerful smart-phones, users are demanding systems which will allow for ubiquitous computing. A critical requirement is a simpler means of interacting with mobile devices. Instead of struggling with small keypads on smart-phones or a stylus on a PDA it would be much simpler if we could use a more natural and familiar medium of communication, speech. There are currently 3 architectures, Embedded Speech Recognition, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR), each with their own pros and cons, which aim to incorporate an Automatic Speech Recognition (ASR) system on mobile devices. DSR proposes to be the best solution due to its superior performance in the presence of transmission errors and noisy environments. The main aim of this paper is to give the reader a broad outline of the DSR architecture, but focuses mainly on the front-end system, which literature suggests is the m...

Evaluation of the Quality of Microphone Array Enhanced Speech

Microphone arrays offer the possibility of hands free speech acquisition. This increases the conv... more Microphone arrays offer the possibility of hands free speech acquisition. This increases the convenience for those using speech technologies as they do not need to hold a microphone in order to interact with a speech system. In addition, a microphone array also has the advantage of potential gains in signal-tonoise ratio in noisy and reverberant environments. In this paper we evaluate the quality of a locally designed four element linear microphone array. The microphone array enhanced speech is evaluated on distortion, noise and speaker identification performance. The reported results show the noise canceling beamformer with post filter as having produced low distortion, high signal-tonoise ratio speech and the best speaker identification rate when compared to other general beamforming techniques.

A Hybrid Text-To-Speech system for Afrikaans

A combination of Speaker-and Channel-Normalization for recognition of Telephone and GSM speech

Performance of speech recognition systems on speech that has been transmitted through GSM and Tel... more Performance of speech recognition systems on speech that has been transmitted through GSM and Telephone channels is generally very poor. The poor performance in recognition is mainly due to channel effects, which puts limitations on the use of speech recognition applications over telecommunication networks. In an effort to reduce the degradation in speech recognition performance, Speaker normalization and channel normalization, which are two strategies to tackle the variation from speaker, channel and environments are investigated. In this paper two techniques are examined: vocal tract length normalization (VTLN) for speaker normalization and cepstral mean normalization (CMN) for channel normalization. In addition a combination of VTLN and CMN was implemented to account for both the channel effects and variation in vocal tract length effects. Experiments showed that applying speaker normalization and channel normalization in speech recognition systems leads to relative reduction in ...

Comparing SVM and GMM on parametric feature-sets

State of the art speaker identification systems use the Gaussian mixture models (GMM) classifier.... more State of the art speaker identification systems use the Gaussian mixture models (GMM) classifier. Support vector machines (SVM) offers a competing classification algorithm. Both classification methods have been evaluated on speaker recognition tasks and have shown to produce uncorrelated errors with sometimes similar performance. In this paper their performance is compared on different parametric feature-sets, in particular on their response to spectral compression in the feature-sets. It was found that both classifiers respond to spectral compression, with the SVM performance levelling off at higher compressions. Even though for the limited dataset used SVM performance was better than that of GMM, the SVM required several orders of magnitude more in computation time as compared to GMM.

Daniel Mashao

Uploads

Papers by Daniel Mashao

Log In