Nothing Special   »   [go: up one dir, main page]

Terbeh 2016

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Arabic Speech Analysis to Identify Factors Posing

Pronunciation Disorders and to Assist Learners with


Vocal Disabilities
Naim TERBEH Ayman TRIGUI Mohsen MARAOUI Mounir ZRIGUI
LaTICE Lab. LaTICE Lab. Computational Mathematics Lab. LaTICE Lab.
Monastir, Tunisia Monastir, Tunisia Monastir, Tunisia Monastir, Tunisia
naim.terbeh@gmail.com ayman.trigui@gmail.com maraoui.mohsen@gmail.com mounir.zrigui@fsm.rnu.tn

1
Abstract—The literature seems rich with studies addressing the assistance to people with disabilities, etc.) [13]. However, the
detection of pronunciation disorders. The features contained in the literature presents several factors posing pronunciation defects
speech signal and natural language processing techniques present that falsify the voice command to transmit. Accordingly, the
famous parameters used for this objective. Despite the diversity of desired results can be erroneous.
factors posing pronunciation disorders (vocal pathologies, non-
native speakers, psychological state, age, etc.), no work has been The numerical accessibility is a search area whose aim is to
extended to identify these factors and to assist speakers with introduce new techniques to not exclude people with vocal
pronunciation defects in learning spoken languages. The current pathologies from human-machine communication. To assist
work presents an original approach based on the probabilistic- this population in improving their pronunciation, speech
phonetic modeling of Arabic speech to detect vocal disorders [1]. If therapists, biologists and computer scientists try to develop
the analyzed speech presents some degradations, the forced new platforms whose objective is to rectify mispronunciations
alignment score technique will be introduced to distinguish between contained in a speech signal. However, to apply the appropriate
two main factors that pose mispronunciations. Pronunciation rectifying treatment, the identification of the factor posing
defects can be from a native speaker suffering from vocal pathology pronunciation disorders is necessary.
or from a non-native speaker who learns the spoken Arabic
language as an L2. Also, a platform is developed to assist speakers The literature counts several studies addressing the vocal
with degraded speeches in learning the spoken Arabic language. pathology detection. Features extracted from the acoustic
The present work accounts five steps. The first step consists in signal and natural language processing techniques present the
calculating the referenced phonetic model of the Arabic speech. main measures used for this treatment. For instance, we can
This model will be used in detecting the vocal defects contained in mention:
the Arabic speech. Second, the referenced forced alignment scores
for Arabic phonemes are calculated. In the third phase, for each • Speech classification works:
new speaker with vocal disorders, their forced alignment scores of ü In [1], Terbeh et al. suggested a novel approach to
non-problematic phonemes are calculated [10]. In the fourth step, classify the Arabic continuous speech into normal
the two previous scores are compared to distinguish between the (healthy) or pathological. For this treatment, the
pronunciation disorders caused by native speakers suffering from authors proposed a new notion: the phonetic
vocal pathologies and by non-native speakers who do not master distance which is the angular distance that separates
Arabic-phoneme pronunciation. The last phase consists in two different phonetic models.
developing a platform to assist speakers with pronunciation defects
to learn the spoken Arabic language. We are satisfied with the ü Supporting on the hidden Markov model [6] and
obtained results. We have attained an identification rate of factors the LBG algorithm [5], Vahid et al. put forward an
posing pronunciation disorders of 95%, and the speakers using our original approach that consisted in detecting vocal
platform have shown a good progression. Speech therapists, pathologies contained in speech signals. It was the
biologists and computer scientists can benefit from this work to objective of the work in [8], too.
develop performant systems of pathological speech processing:
pathological speech recognition, accent evaluation, e-learning, etc. ü Vahid et al. suggested in [7] a new methodology
based on artificial neural networks [9] to detect
Keywords—Arabic healthy/pathological speech; phonetic pronunciation defects contained in speech signals
modeling; forced alignment score; vocal pathology; non-native and distinguish between healthy and pathological
speakers; spoken Arabic language learning speeches.
• Spoken language learning works:
I. INTRODUCTION AND STATE OF THE ART
Human-machine communication practically covers
different areas (voice services, quality control, avionics,
978-1-5090-5579-1/16/$31.00 ©2016 IEEE
ü The work in [15] described the proposed approach corpus (pronunciation rules on text, audio and video
invented by Elshafei et al. whose objective is to representations).
learn the Quran recitation. This system segmented
Our proposed methodology can be summarized in five
the signal in syllabic units. Each test segment was
compared to the reference; the system would accept main steps:
or reject the syllabic segment. • Generation of a referenced phonetic model of the
spoken Arabic language
ü Based on the hidden Markov model, Elshafei put
forward in [16] a system that could identify the • Generation of a numerical model of Arabic speech
pronunciation of the learner. In this work, he • Speech classification into healthy or pathological
grouped all Arabic phonemes and their
pronunciations, which would be can be accepted in • Identification of factors posing pronunciation disorders
the language, and compared it with the
• Development of a platform to assist speakers with
pronunciation of the speaker.
pronunciation defects to learn Arabic vocabulary
In spite of the richness of the literature with studies
These points are going to be detailed in the following
addressing pathological speech processing, some advanced
subsections.
tasks have not been solved yet, like:
• Identification of factors posing pronunciation disorders
A. Probabilistic-phonetic model
• Specific learning of spoken language using multimodal The current subsection is dedicated to generate a referenced
information (audio, video, text, etc.) phonetic model for the spoken Arabic language. We need for
Our contribution consists, for each speech classified as this task an Arabic acoustic model and a large corpus of Arabic
pathological, in introducing a new methodology whose target is speech recorded by native and healthy speakers. After that, the
to identify the factor posing pronunciation defects. We use the Sphinx_align tool will be combined with the acoustic model to
forced alignment score to distinguish between two principal generate the phonetic transcription of our speech corpus. The
factors: vocal pathologies and non-native speech. Our tests’ phonetic model is defined by the vector summarizing the
base is formed by an Arabic continuous speech classified into a probability occurrences of each Arabic bi-phoneme. An extract
pathological one. from the referenced phonetic model of the Arabic speech is
shown in the following figure.
Speakers with vocal pathologies will benefit from a
developed platform to improve their mispronunciations and to
master the Arabic vocabulary.
This paper is organized as follows. An overview of the
proposed approach is presented in section 2. The details about
the test conditions and the obtained results are described in
section 3. The concluding remarks and future works are given
in section 4.
=
II. PROPOSED METHODOLOGY
Statistics show that the number of speakers with phonemic
disabilities is incremental [11]. Each pronunciation defect case
is different from another depending on the factor that poses this
disability. To help the concerned speakers to practice the
human-machine communication, a speech diagnostic step is
necessary to identify the factor causing this vocal problem.
Our objective is to identify the factor posing pronunciation Fig. 1. An extract from the referenced phonetic model of the Arabic speech
defects contained in the Arabic continuous speech. The
principal idea consists in comparing, for each non-problematic The second figure presents the followed procedure in
phoneme, between the forced alignment scores referring to providing the phonetic model of the Arabic speech.
healthy native speakers and to speakers who suffer from
pronunciation disorders. Based on this comparison, we
distinguish two main factors: a native speaker with vocal
pathology or a non-native speaker who learns the spoken
Arabic language as an L2.
We extend this task to develop a platform that offers
speakers suffering from vocal disorders the possibility of
improving their pronunciations based on a multimodal Arabic
test base in identifying the problematic phonemes. For this
task, a speech base containing 138 speech sequences is used, as
shown in the following table:
TABLE I. SUMMARY OF THE SPEECH CLASSIFICATION TASK
Speech corpus Classification results Speakers
18 speech 18 are pathological Non-native speakers
records
120 speech 40 are healthy Native speakers
records 80 are pathological

We use for this treatment the speech classification system


elaborated by Terbeh et al. [1].

C. Forced alignment
The forced alignment is the treatment that aligns the
Fig. 2. The procedure in providing the referenced phonetic model of the acoustic signals with the corresponding phonetic transcriptions.
Arabic speech
According to the distance that separates the speech signal from
For each new speaker, we follow the previous procedure to the norm (which is the acoustic model) a forced alignment
generate their proper phonetic model. These two models score is affected for each phoneme. The Sphinx_align tool will
(referenced phonetic model and that of the speaker) are be used to realize this treatment.
compared to assure:
• first, speech classification into healthy or pathological, D. Numerical Arabic-speech model generation
This task is dedicated to calculate the numerical model of
• second, identification of problematic phonemes posing the spoken Arabic language. To generate this model, we must
pronunciation defects for the concerned speakers, at first calculate the forced alignment score for all Arabic
• and third, generation of the appropriate pronunciation phonemes by different native speakers. For this objective we
rules for each speaker suffering from vocal pathology use a speech base containing 2,450 Arabic words recorded by
to enhance their vocabulary. five healthy native speakers (for each phoneme, we use 70
words selected by an Arabic linguistic expert). In the following
B. Speech classification procedure, we express step by step the calculation of the
referenced Numerical Arabic-Speech Model (NASM):
This subsection is consecrated to classify the Arabic speech
into healthy or pathological. Pathological speeches formed the

Algorithm 1. Calculation of the NASM


For each Arabic phoneme Pi (1≤i≤35)

1. n words containing this phoneme are recorded by m healthy native speakers (we use 70 words recorded by five native speakers).
2. Calculate the forced alignment score (average of forced alignment scores calculated for different speakers).
3. Calculate the standard deviation between forced alignment scores of different speakers.

The first algorithm results in two sets:


• S contains the forced alignment scores of all Arabic
phonemes
• δ contains the standard deviation between the forced
alignment scores
These two sets form the referenced NASM, as explained in
the following figure:

Fig. 3. General form of the referenced NASM


For each new speaker who suffers from pronunciation E. Identification of factors posing pronunciation disorders
defects, we compare between their forced alignment score and The current subsection consists in identifying if the
that of the referenced NASM to determine the factor posing the pronunciation disorders are caused by vocal pathology or a
vocal disorders. non-native speech. We follow the next procedure to realize this
task:

For each new speaker who suffers from pronunciation disorders:


1. We generate the set B={P1, P2, …, Pm}(m<35) of non-problematic phonemes (phonemes does not pose pronunciation
disorders).
2. We suppose k the number of Arabic non-problematic phonemes such that their forced alignment scores verifies:

- (1)

We distinguish two cases:


* If k (the most of non-problematic phonemes are well mastered), then the speaker is native and suffers from vocal
pathology.
* Else (k ), the speaker is non-native who learns Arabic spoken language as an L2.

We note by: If the speech sequence is classified as pathological, the


• Si the forced alignment score reference obtained by speaker will use our application to ameliorate their Arabic
healthy native speakers for the phoneme Pi vocabulary. In the case of healthy speech classification, a
message is displayed informing that the speech in the input is
• δi the standard deviation between the forced alignment healthy. The following figures illustrate our application results
scores of different healthy native speakers for the for the cases of healthy or pathological speeches.
phoneme Pi

F. Spoken Arabic language learning


Speakers with vocal disabilities can accede to a base
containing pronunciation rules so as to improve their Arabic
vocabulary. This base contains pronunciation rules in different
modalities as explained in the following table:
TABLE II. A DESCRIPTION OF THE PRONUNCIATION RULES’ BASE
Base Prepared by Description
Pronunciation Linguistic Gives idea on the mouth
rules (text expert behavior for each phoneme
message) pronunciation
Audio Linguistic Fifteen records for each
Fig. 4. Result of our application in the case of healthy speech
records expert phoneme (all positions in the
word are covered)
Video [12] Show the phonetic system and
records the mouth behavior for each
phoneme pronunciation

In the following table, we present an extract from our


pronunciation rules base (the case of the Arabic phoneme “‫)”ب‬:
TABLE III. EXTRACT FROM THE PRONUNCIATION RULES’ BASE
Text base Audio base Video base
‫ ب‬:‫اﻟﺼﻮت‬ ( ،ٌ‫ ﺑَ ْﻠ َﺪة‬،ٌ‫ ﺑَﻄﱠﺔ‬، ٌ‫ ﺑَﺎب‬،‫ﺑَﻠَ ٌﺪ‬ One video file
‫ اﻟﺸﻔﺘﺎن‬:‫اﻟﻤﺨﺮج‬ ،ٌ‫ﺻﺔ‬ َ ْ‫ﺑﻮر‬ ،ٌ‫ﺑُ ْﻨﯿَﺔ‬ describing the
‫ وﺿﻊ اﻟﺴﺒﺎﺑﺔ‬:‫اﻹﺷﺎرة‬ ،‫ ﺑِ َﻼ ٌد‬،‫ ﺑُ ْﻐﯿَﺔ‬، ُ‫ُﻄ َﻼن‬ ْ ‫ﺑ‬ movement of Fig. 5. Result of our application in the case of pathological speech (case of
‫ﻋﻠﻰ اﻟﺸﻔﺎه‬ ٌ،‫ َرﺑُ َﻮة‬،ٌ‫ ﺑَﺮْ ﺑَﺮ‬،‫ﺑِﺴ ِْﻢ‬ the phonetic mispronunciation of the phoneme “‫)”س‬
‫ ﻧﻀﻊ اﻟﺴﺒﺎﺑﺔ ﻋﻠﻰ‬:‫اﻟﺘﺪرب‬ ، ‫ َربﱡ‬، ‫ إِ ْﻧﻜَﺐﱠ‬،ٌ‫ﻟَﺒِﻨَﺔ‬ system for
‫اﻟﺸﻔﺎه ﻣﻊ ﻣﻼﺣﻈﺔ ﺗﻄﺎﺑﻖ أو‬ ‫) َﻋ َﺮﺑِﻲ‬.wav pronunciation of G. General overview of proposed approach
‫ﻗﻔﻞ اﻟﺸﻔﺘﯿﻦ ﻓﻨﺸﻌﺮ ﺑﺨﺮوج‬ the phoneme
To recapitulate, our suggested methodology in identifying
‫اﻟﮭﻮاء‬ “‫”ب‬
the factor posing pronunciation disorders in the Arabic
continuous speech can be summarized in the following
algorithm:

Algorithm 2. Spoken Arabic language learning


Begin
1. Use 36 Arabic phonemes {P1, P2, …, Pi, …, P36} [2].

*********************************************Speech Classification***********************************************

2. Generation of the referenced phonetic model of spoken Arabic language (N)


3. Generation of the phonetic model proper to speaker (H)
4. Comparison between N and H to speech classification into healthy or pathological

******************************************Generation of the NASM model*****************************************

5. Consider two sets S=E=∅


6. For each phoneme Pi (i=1,35):
* Consider the set Ai={Si1, Si2, …, Sij, …, Sin} containing the forced alignment scores of phoneme Pi for all n speakers (Sij is the forced
alignment score of phoneme Pi for the speaker n°j)
* Si=Average(Ai)
* δi=Standard-deviation(Ai)
* S=SᴜSi
* E=Eᴜδi
End for

********************************Forced alignment score of non-problematic phonemes********************************

7. Calculate for each new speaker with pronunciation disorders:


*B={P1, P2, …, Pi, …, Pm}: set of non-problematic phonemes (m<35).
*D={Sd1, Sd2, …, Sdi, …, Sdm}: set of forced alignment scores for the m non-problematic phonemes.

***********************************Number of non-problematic mastered phonemes**********************************

8. Consider k as a counter (k=0)


9. For each Arabic non-problematic phoneme Pi, compare between its scores Si (1<=i<=35) in the set S (forced alignment scores of
native speakers) and Sdi in the set D (forced alignment scores proper to the speaker with pronunciation disorders):
* if Sdi Si-δi then k=k+1
End for

*************************************The factor posing pronunciation disorders *************************************

10. Distinguish two cases:


* If =1 then the speaker is native and suffers from vocal pathology
* Else, the speaker is non-native who learns Arabic spoken language as L2

****************************************Spoken Arabic language learning *****************************************

11. Generate the appropriate pronunciation rules


Concerned speaker tries to apply pronunciation rules generated to improve their Arabic vocabulary

End

In the following block diagram you find a deepen


explanation of our proposed approach in identifying
problematic phonemes and assisting people with vocal
pathologies in improving their Arabic vocabulary
pronunciation:
Fig. 6. General form of proposed approach in identifying the factor posing pronunciation disorders and spoken Arabic language learning

III. TESTS AND EXPERIMENTAL RESULTS • 70 Arabic words were recorded by five healthy and
native speakers. These formed the base used in
A. Test conditions calculating the referenced forced alignment scores.
Tests are realized in the following conditions: • Arabic words were selected by an Arabic linguistic
• During our work, we used 36 Arabic phonemes, as expert.
cited in [2].
• For each Arabic phoneme: ü 80 were recorded by native speakers suffering from
ü The Arabic selected words were recorded by five vocal pathologies.
native and healthy speakers.
ü 18 were recorded by non-native speakers learning
ü We calculated the forced alignment score for every the spoken Arabic language as an L2.
phoneme.
• The pronunciation rules base contained multimodal
ü The reference forced alignment score (Sp) was the representations:
average of these five scores. ü A base of 35 text messages describing the correct
pronunciation of Arabic phonemes and their
ü We calculated the standard deviation (δp) between articulatory points was prepared (one message for
these five scores.
each phoneme).
• The reference model (NASM) was the combination of ü 525 audio files (*.wav isolated words) that could be
two vectors arranging respectively the forced listened by speakers suffering from pronunciation
alignment scores (Spi, i=1..35) and the standard disorders to improve their pronunciation (we used
deviations (δpi, i=1..35) between the forced alignment 15 wav files for each phonemes).
scores calculated by different speakers.
ü 35 video files describing the movement of the
• The speech classification system elaborated by Terbeh phonetic system and the mouth for each phoneme
et al [1] was utilized to classify speech into healthy or pronunciation.
pathological.
• The test base contained 98 Arabic speech sequences B. Experimental results
classified as pathological using the classification
system suggested by Terbeh et al. [1]:
The results of identifying the factors posing pronunciation
disorders can be summarized in the following table:

TABLE IV. IDENTIFICATION RESULTS OF FACTORS POSING PRONUNCIATION DISORDERS


Speech bases Results Identification rate
80 speech sequences recorded by native speakers with vocal 76 native speakers with vocal pathology
pathologies 4 non-native speakers 95%
18 speech sequences recorded by non-native speakers One native speaker with vocal 94.89%
pathology 94.44%
17 non-native speakers

C. Discussion IV. CONCLUSION AND FUTURE WORKS


The third table presents some misclassifications (native We can conclude that the NLP techniques present an
speaker classified as non-native and vice versa). We can efficient method in spoken language processing, especially in
explain this misidentification by the fact that: pathological speech diagnostics. In this paper, we propose a
• Speakers residing for a long time in an Arabic country novel approach to identify the factors posing pronunciation
can master most of the Arabic phonemes. disorders and to assist the population with vocal pathologies to
improve their pronunciation. For this purpose, a corpus of
• Speakers who attain an advanced learning level of the 2,450 speech records (*.wav isolated words) has been prepared
spoken Arabic language can master the pronunciation to generate the referenced forced alignment scores for Arabic
of most phonemes. phonemes to identify the factors posing pronunciation
disorders. Another corpus containing pronunciation rules on
Native speakers identified as non-native can suffer from
different modalities (text, audio, and video) has been prepared,
pronunciation problems in phonemes’ duration, so the forced
which aims to assist speakers with vocal disabilities to
alignment score will be falsified.
ameliorate their pronunciation.
Except the previous points, our proposed approach shows a
The experiment results have shown that the suggested
high performance in identifying the factors posing
approach has attained a high identification accuracy. Indeed,
pronunciation defects.
we have got 95% as an identification rate and we have
The developed platform whose goal is to assist people with observed a considerable progression in the elocution of several
vocal disabilities to ameliorate their pronunciation presents a speakers.
good support for this population; several users present a good
To the best of our knowledge, this work presents the first
progression.
attempt addressing the identification of factors posing
pronunciation defects and the specific learning of the spoken
language according to the vocal pathology of each speaker. We
are satisfied with obtained results, and our suggested approach
can present an important reference for works focalizing on [4] Terbeh, N., Zrigui, M.: Vers la Correction Automatique de la Parole
automatic pathological speech processing. Arabe. Citala 2014, November 26-27, 2014, Oujda-Morocco, 2014.
[5] G. Patane, M. Russo. The enhanced LBG Algorithm, Neural networks,
As future works, we can mention what follows: Vol. 14 (n°.9), November 2001.
• We can use the Arabic speeches classification [6] Laurent Bréhilin, Olivier Gascuel. Modèles de Markov caches et
methodology (healthy or pathological) in texts apprentissage de sequences.
classification [17]. [7] Majidnezhad, V., Kheidorov, I.: An ANN-based Method for Detecting
Vocal Fold Pathology. International Journal of Computer Applications,
• We can extend this work to develop security Volume 62– No.7, January 2013.
applications addressing the identification of the [8] Majidnezhad, V., Kheidorov, I.: A HMM-Based Method for V ocal Fold
speaker nationality. Pathology Diagnosis. IJCSI International Journal of Computer Science
Issues, Vol. 9, Issue 6, No 2, November 2012.
• It may be possible also to use our approach in the [9] Philippe Paquet. L’utilisation des réseaux de neurones artificiels en
analysis of phone communications between terrorist finance. Document de recherche n° 1997-1, 1997.
groups. [10] Terbeh, N., Zrigui, M.: Vocal Pathologies Detection and Mispronounced
Phonemes Identification: Case of Arabic Continuous Speech. LREC
• It may be possible to benefit from this work to develop 2016, May 23-28, 2016, Portorož-Slovenia, 2016.
an automatic speech correction system for people [11] Online:
suffering from pronunciation defects [3, 4] and assist http://www.un.org/french/disabilities/default.asp?navid=35&pid=833
non-native speakers to improve their Arabic [consulted April 6th, 2016].
vocabulary [10]. [12] Online:
http://kenanaonline.com/users/dkkhaledelnagar/photos/1238136361
• An automatic dialect and speaker identification [14] [consulted April 24th, 2016]
can also benefit from our proposed methodology. [13] Blanc-Brude, T. (2004). Intégration de commandes vocales dans un
environnement d'apprentissage par l'action: enjeux
• A developed platform in assisting speakers with vocal ergonomiques (Doctoral dissertation, Grenoble 1), 2004.
disabilities to learn the spoken Arabic language can be [14] Biadsy, F., Hirschberg, J., & Habash, N.:Spoken Arabic dialect
used in integration centers for this population. identification using phonotactic modeling. In Proceedings of the eacl
2009 workshop on computational approaches to semitic languages (pp.
53-61). Association for Computational Linguistics, 2009.
REFERENCES [15] Elshafei, M., Almuhtasib, H., Alghamdi, M.: Techniques for High
[1] Terbeh, N., Maraoui, M., Zrigui, M.: Probabilistic Approach for Quality Text-to-speech. Information Science, 2002.
Detection of Vocal Pathologies in the Arabic Speech. CICLing 2015, [16] Elshafei, M., Almuhtasib, H., Alghamdi, M.: Statistical Methods for
April 14-20, 2015, Cairo-Egypt, 2015. Automatic Diacritization of Arabic text. In Proceedings of 18th National
[2] Alghamdi, M., Almuhtasib, H., Elshafei, M.: Arabic Phonological Rules. computer Conference, Riyadh, 2006.
King Saud University Journal: Computer Sciences and Information, Vol. [17] B. Hawashin, A. Mansour, S. Aljawarneh, A. A. Fahmy, F. Al Raddady,
16, 1-25, 2004. A. Shrivastava, A. S. Rajawat, C. R. Malika, S. Mishra, and R. N.
[3] Terbeh, N., Labidi, M., Zrigui, M.: Automatic speech correction: A step Yadav, “An Efficient Feature Selection Method for Arabic text
to speech recognition for people with disabilities. ICTA 2013, October Classification”, Int. J. Comput. Appl., Vol. 83, no. 17, pp. 1-6, 2013.
23-26, 2013, Hammamet-Tunisia, 2013.

You might also like