Terbeh 2016
Terbeh 2016
Terbeh 2016
1
Abstract—The literature seems rich with studies addressing the assistance to people with disabilities, etc.) [13]. However, the
detection of pronunciation disorders. The features contained in the literature presents several factors posing pronunciation defects
speech signal and natural language processing techniques present that falsify the voice command to transmit. Accordingly, the
famous parameters used for this objective. Despite the diversity of desired results can be erroneous.
factors posing pronunciation disorders (vocal pathologies, non-
native speakers, psychological state, age, etc.), no work has been The numerical accessibility is a search area whose aim is to
extended to identify these factors and to assist speakers with introduce new techniques to not exclude people with vocal
pronunciation defects in learning spoken languages. The current pathologies from human-machine communication. To assist
work presents an original approach based on the probabilistic- this population in improving their pronunciation, speech
phonetic modeling of Arabic speech to detect vocal disorders [1]. If therapists, biologists and computer scientists try to develop
the analyzed speech presents some degradations, the forced new platforms whose objective is to rectify mispronunciations
alignment score technique will be introduced to distinguish between contained in a speech signal. However, to apply the appropriate
two main factors that pose mispronunciations. Pronunciation rectifying treatment, the identification of the factor posing
defects can be from a native speaker suffering from vocal pathology pronunciation disorders is necessary.
or from a non-native speaker who learns the spoken Arabic
language as an L2. Also, a platform is developed to assist speakers The literature counts several studies addressing the vocal
with degraded speeches in learning the spoken Arabic language. pathology detection. Features extracted from the acoustic
The present work accounts five steps. The first step consists in signal and natural language processing techniques present the
calculating the referenced phonetic model of the Arabic speech. main measures used for this treatment. For instance, we can
This model will be used in detecting the vocal defects contained in mention:
the Arabic speech. Second, the referenced forced alignment scores
for Arabic phonemes are calculated. In the third phase, for each • Speech classification works:
new speaker with vocal disorders, their forced alignment scores of ü In [1], Terbeh et al. suggested a novel approach to
non-problematic phonemes are calculated [10]. In the fourth step, classify the Arabic continuous speech into normal
the two previous scores are compared to distinguish between the (healthy) or pathological. For this treatment, the
pronunciation disorders caused by native speakers suffering from authors proposed a new notion: the phonetic
vocal pathologies and by non-native speakers who do not master distance which is the angular distance that separates
Arabic-phoneme pronunciation. The last phase consists in two different phonetic models.
developing a platform to assist speakers with pronunciation defects
to learn the spoken Arabic language. We are satisfied with the ü Supporting on the hidden Markov model [6] and
obtained results. We have attained an identification rate of factors the LBG algorithm [5], Vahid et al. put forward an
posing pronunciation disorders of 95%, and the speakers using our original approach that consisted in detecting vocal
platform have shown a good progression. Speech therapists, pathologies contained in speech signals. It was the
biologists and computer scientists can benefit from this work to objective of the work in [8], too.
develop performant systems of pathological speech processing:
pathological speech recognition, accent evaluation, e-learning, etc. ü Vahid et al. suggested in [7] a new methodology
based on artificial neural networks [9] to detect
Keywords—Arabic healthy/pathological speech; phonetic pronunciation defects contained in speech signals
modeling; forced alignment score; vocal pathology; non-native and distinguish between healthy and pathological
speakers; spoken Arabic language learning speeches.
• Spoken language learning works:
I. INTRODUCTION AND STATE OF THE ART
Human-machine communication practically covers
different areas (voice services, quality control, avionics,
978-1-5090-5579-1/16/$31.00 ©2016 IEEE
ü The work in [15] described the proposed approach corpus (pronunciation rules on text, audio and video
invented by Elshafei et al. whose objective is to representations).
learn the Quran recitation. This system segmented
Our proposed methodology can be summarized in five
the signal in syllabic units. Each test segment was
compared to the reference; the system would accept main steps:
or reject the syllabic segment. • Generation of a referenced phonetic model of the
spoken Arabic language
ü Based on the hidden Markov model, Elshafei put
forward in [16] a system that could identify the • Generation of a numerical model of Arabic speech
pronunciation of the learner. In this work, he • Speech classification into healthy or pathological
grouped all Arabic phonemes and their
pronunciations, which would be can be accepted in • Identification of factors posing pronunciation disorders
the language, and compared it with the
• Development of a platform to assist speakers with
pronunciation of the speaker.
pronunciation defects to learn Arabic vocabulary
In spite of the richness of the literature with studies
These points are going to be detailed in the following
addressing pathological speech processing, some advanced
subsections.
tasks have not been solved yet, like:
• Identification of factors posing pronunciation disorders
A. Probabilistic-phonetic model
• Specific learning of spoken language using multimodal The current subsection is dedicated to generate a referenced
information (audio, video, text, etc.) phonetic model for the spoken Arabic language. We need for
Our contribution consists, for each speech classified as this task an Arabic acoustic model and a large corpus of Arabic
pathological, in introducing a new methodology whose target is speech recorded by native and healthy speakers. After that, the
to identify the factor posing pronunciation defects. We use the Sphinx_align tool will be combined with the acoustic model to
forced alignment score to distinguish between two principal generate the phonetic transcription of our speech corpus. The
factors: vocal pathologies and non-native speech. Our tests’ phonetic model is defined by the vector summarizing the
base is formed by an Arabic continuous speech classified into a probability occurrences of each Arabic bi-phoneme. An extract
pathological one. from the referenced phonetic model of the Arabic speech is
shown in the following figure.
Speakers with vocal pathologies will benefit from a
developed platform to improve their mispronunciations and to
master the Arabic vocabulary.
This paper is organized as follows. An overview of the
proposed approach is presented in section 2. The details about
the test conditions and the obtained results are described in
section 3. The concluding remarks and future works are given
in section 4.
=
II. PROPOSED METHODOLOGY
Statistics show that the number of speakers with phonemic
disabilities is incremental [11]. Each pronunciation defect case
is different from another depending on the factor that poses this
disability. To help the concerned speakers to practice the
human-machine communication, a speech diagnostic step is
necessary to identify the factor causing this vocal problem.
Our objective is to identify the factor posing pronunciation Fig. 1. An extract from the referenced phonetic model of the Arabic speech
defects contained in the Arabic continuous speech. The
principal idea consists in comparing, for each non-problematic The second figure presents the followed procedure in
phoneme, between the forced alignment scores referring to providing the phonetic model of the Arabic speech.
healthy native speakers and to speakers who suffer from
pronunciation disorders. Based on this comparison, we
distinguish two main factors: a native speaker with vocal
pathology or a non-native speaker who learns the spoken
Arabic language as an L2.
We extend this task to develop a platform that offers
speakers suffering from vocal disorders the possibility of
improving their pronunciations based on a multimodal Arabic
test base in identifying the problematic phonemes. For this
task, a speech base containing 138 speech sequences is used, as
shown in the following table:
TABLE I. SUMMARY OF THE SPEECH CLASSIFICATION TASK
Speech corpus Classification results Speakers
18 speech 18 are pathological Non-native speakers
records
120 speech 40 are healthy Native speakers
records 80 are pathological
C. Forced alignment
The forced alignment is the treatment that aligns the
Fig. 2. The procedure in providing the referenced phonetic model of the acoustic signals with the corresponding phonetic transcriptions.
Arabic speech
According to the distance that separates the speech signal from
For each new speaker, we follow the previous procedure to the norm (which is the acoustic model) a forced alignment
generate their proper phonetic model. These two models score is affected for each phoneme. The Sphinx_align tool will
(referenced phonetic model and that of the speaker) are be used to realize this treatment.
compared to assure:
• first, speech classification into healthy or pathological, D. Numerical Arabic-speech model generation
This task is dedicated to calculate the numerical model of
• second, identification of problematic phonemes posing the spoken Arabic language. To generate this model, we must
pronunciation defects for the concerned speakers, at first calculate the forced alignment score for all Arabic
• and third, generation of the appropriate pronunciation phonemes by different native speakers. For this objective we
rules for each speaker suffering from vocal pathology use a speech base containing 2,450 Arabic words recorded by
to enhance their vocabulary. five healthy native speakers (for each phoneme, we use 70
words selected by an Arabic linguistic expert). In the following
B. Speech classification procedure, we express step by step the calculation of the
referenced Numerical Arabic-Speech Model (NASM):
This subsection is consecrated to classify the Arabic speech
into healthy or pathological. Pathological speeches formed the
1. n words containing this phoneme are recorded by m healthy native speakers (we use 70 words recorded by five native speakers).
2. Calculate the forced alignment score (average of forced alignment scores calculated for different speakers).
3. Calculate the standard deviation between forced alignment scores of different speakers.
- (1)
*********************************************Speech Classification***********************************************
End
III. TESTS AND EXPERIMENTAL RESULTS • 70 Arabic words were recorded by five healthy and
native speakers. These formed the base used in
A. Test conditions calculating the referenced forced alignment scores.
Tests are realized in the following conditions: • Arabic words were selected by an Arabic linguistic
• During our work, we used 36 Arabic phonemes, as expert.
cited in [2].
• For each Arabic phoneme: ü 80 were recorded by native speakers suffering from
ü The Arabic selected words were recorded by five vocal pathologies.
native and healthy speakers.
ü 18 were recorded by non-native speakers learning
ü We calculated the forced alignment score for every the spoken Arabic language as an L2.
phoneme.
• The pronunciation rules base contained multimodal
ü The reference forced alignment score (Sp) was the representations:
average of these five scores. ü A base of 35 text messages describing the correct
pronunciation of Arabic phonemes and their
ü We calculated the standard deviation (δp) between articulatory points was prepared (one message for
these five scores.
each phoneme).
• The reference model (NASM) was the combination of ü 525 audio files (*.wav isolated words) that could be
two vectors arranging respectively the forced listened by speakers suffering from pronunciation
alignment scores (Spi, i=1..35) and the standard disorders to improve their pronunciation (we used
deviations (δpi, i=1..35) between the forced alignment 15 wav files for each phonemes).
scores calculated by different speakers.
ü 35 video files describing the movement of the
• The speech classification system elaborated by Terbeh phonetic system and the mouth for each phoneme
et al [1] was utilized to classify speech into healthy or pronunciation.
pathological.
• The test base contained 98 Arabic speech sequences B. Experimental results
classified as pathological using the classification
system suggested by Terbeh et al. [1]:
The results of identifying the factors posing pronunciation
disorders can be summarized in the following table: