A Multi-Modal Recognition System Using Face and Speech: Samir Akrouf, Yahia Belayadi, Messaoud Mostefai, Youssef Chahir
A Multi-Modal Recognition System Using Face and Speech: Samir Akrouf, Yahia Belayadi, Messaoud Mostefai, Youssef Chahir
A Multi-Modal Recognition System Using Face and Speech: Samir Akrouf, Yahia Belayadi, Messaoud Mostefai, Youssef Chahir
Speech
Samir Akrouf, Yahia Belayadi, Messaoud Mostefai, Youssef Chahir
Finally we present the evaluation results and the main method. We use PCA with coefficients vectors instead of
conclusions. pixels vectors. We notice that this technique requires more
time than PCA (because of the calculation of the
Face coefficients) in particular with data bases of average or
Face
Identification matching reduced size but it should be noted that it requires less
Score
memory what makes its use advantageous with bases of
significant size.
Speaker Speech
Identification matching
System Score 2.2 Experimental Results
The tests were performed by using the image data bases
Fusion ORL, Yale Faces and BBAFaces. The latter was created at
Modul
the University Center of Bordj Bou Arreridj in 2008. It is
Fig. 1. User access scenario based on speech and face composed by 23 people with 12 images for each one of
Information. them (for the majority of the people, the images were
Accept/ taken during various sessions). The images reflect various
Reject facial expressions with different intensity variations and
2. Face Recognition different light sources. To facilitate the tests, the faces
were selected thereafter manually in order to get images of
This paper uses a hybrid method combining principal 124 X 92 pixels, we then convert them into gray levels
components analysis (PCA) [11] and the discrete cosine and store them with JPG format. Fig. 3. represents a
transform (DCT) [12] for face identification [13]. typical example of the data. It should be noted that certain
categories of this data are not retained for the tests.
Extraction of
Images from
information from Each
Training Data Base
Image
Calculus
Saving
Training (a) (b) (c) (d) (e) (f)
Phase
Saving
Extracted
Images
(g) (h) (i) (j) (k) (l)
Identification Phase
Fig. 3. Example from BBAFaces. (a): normal, (b): happy,
Input Detection and Calculation of a (c): glasses, (d): sad, (e): sleepy, (f): surprised, (g): wink,
Image Normalisation metric distance (h): dark, (i): top light, (j): bottom light, (k): left light, (l):
D(Pi, P1)
D(Pi, P2) right light.
.
In the following we will expose the results obtained for
Result D(Pi, Pm) the tests realized with Yale Faces and BBA Faces.
Best
Score
Table 1: Rates of Recognition
3. Speaker Recognition System or identification in an open unit for which the speaker to
be identified does not belong inevitably to this unit [16].
Nowadays The Automatic Treatment of speech is
progressing, in particular in the fields of Automatic
Speech Recognition "ASR" and Speech Synthesis. Automatic Speaker Identification :
The automatic speaker recognition is represented like a
particular pattern recognition task. It associates the
problems relating to the speaker identification or Speaker1
verification using information found in the acoustic signal: Reference
x Systems with free text "or free-text": the speaker is constitutes the state of the art in ASR. The decision of an
free to pronounce what he wants. In this mode, the automatic speaker recognition system is based on the two
sentences of training and test are different. processes of speaker identification and/or checking
x Systems with suggested text "or text-prompted": a whatever the application or the task is concerned with.
text, different on each session and for each person,
is imposed to the speaker and is determined by the
machine. The sentences of training and test can be
different. 4. Performance of Biometric Systems
x Systems dependent on the vocabulary "or
vocabulary-dependent": the speaker pronounces a The most significant and decisive argument which makes
sequence of words resulting from a limited the difference between a biometric system and another is
vocabulary. In this mode, the training and the test its error rate, a system is considered ideal if its:
are carried out on texts made up and starting from
the same vocabulary. False Rejection Rate= False Acceptance Rate= 0;
x Personalized systems dependent on the text (or to
use-specific text dependent): each speaker has his
own password. In this mode, the training and the
test are carried out on the same text.
The vocal message makes the task of ASR systems
easier and the performances are better. The recognition in
text mode independent requires more time than the text
mode dependent [17].
9 P(FA1)=0.1.
9 P (FR1)=0.6.
In the speaker recognition system we obtained:
9 P(FA2)=0.3.
9 P (FR2)=0.2.
1. Main Interface
4. Verification Process
5. IdentificationProcess
3. Acquisition Module for Speaker
7. Conclusions
This paper provides results obtained on a multi-modal
biometric system that uses face and voice features for
recognition purposes. We used fusion at the decision level
with OR and AND operators. We showed that the
resulting system (multi-modal) considered here provide
better performance than the individual biometrics. For the
near future we are collecting data corresponding to three
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 3, No. 1, May 2011
ISSN (Online): 1694-0814
www.IJCSI.org 236
biometric indicators - fingerprint, face and voice in order Application of the DCT Energy Histogram for Face
to conceive a better multi-modal recognition system. Recognition. 2nd International Conference on Information
Technology for Application (ICITA 2004) PP 305-310
[13] Samir Akrouf, Sehili Med Amine, Chakhchoukh
Acknowledgments Abdesslam, Messaoud Mostefai and Youssef Chahir
2009 Fifth International Conference on Mems Nano
Special thanks to Benterki Mebarka and Bechane Louiza and Smart Systems 28-30 December 2009 Dubai UAE.
for their contribution to this project. [14] N Morizet, Thomas Ea, Florence Rossant, Frédéric Amiel
Samir Akrouf thanks the Ministry of Higher Education for Et Amara Amara, Revue des algorithmes PCA, LDA et
the financial support of this project (project code: EBGM utilisés en reconnaissance 2D du visage pour la
biométrie, Tutoriel Reconnaissance d'images, MajecStic
B*0330090009 ) .
2006 Institut Supérieur d’Electronique de Paris (ISEP).
[15] Akrouf Samir, Mehamel Abbas, Benhamouda
References Nacéra, Messaoud Mostefai
[1] A. K. Jain, R. Bolle, and S. Pankanti, Biometrics: Personal An Automatic Speaker Recognition System, 2009 the 2nd
Identification in Networked Society. Boston, MA: Kluwer, International Conference on Advanced Computer Theory
1998. Engineering (ICACTE 2009) Cairo, Egypt September 25-
[2] A. K. Jain, S. Prabhakar, and S. Chen, \Combining multiple 27 2009
matchers for a high security fingerprint verification system," [16] Approche Statistique pour la Reconnaissance Automatique
Pattern Recognition Letters, vol. 20, pp. 1371-1379, 1999. du Locuteur : Informations Dynamiques et Normalisation
[3] R. Brunelli and D. Falavigna, “Person identification using Bayesienne des Vraisemblances Ǝ, October, 2000.
Multiple cues,” IEEE Trans. Pattern Anal. Machine Intell., [17] Yacine Mami “Reconnaissance de locuteurs par localisation
vol. 17, pp. 955–966, Oct. 1995. dans un espace de locuteurs de référence“ Thèse de
[4] B. Duc, G. Maitre, S. Fischer, and J. Big¨un, “Person doctorat, soutenue le 21 octobre 2003.
Authentication by fusing face and speech information,” in 1st
Int. Conf. Audio- Video- Based Biometric Person Samir Akrouf was born in Bordj Bou Arréridj, Algeria in 1960. He
Authentication AVBPA’97, J. Big¨un, G. Chollet, and G. received his Engineer degree from Constantine University, Algeria
Borgefors, Eds. Berlin, Germany: Springer-Verlag, in 1984. He received his Master’s degree from University of
Minnesota, USA in 1988. Currently; he is an assistant professor at
Mar. 12–14, 1997, vol. 1206 of Lecture Notes in Computer
the Computer department of Bordj Bou Arréridj University, Algeria.
Science, pp. 311–318. He is an IACSIT member and is a member of LMSE laboratory (a
[5] E. Big¨un, J. Big¨un, B. Duc, and S. Fischer, “Expert research laboratory in Bordj Bou Arréridj University). He is also the
conciliation for multi modal person authentication systems director of Mathematics and Computer Science Institute of Bordj
by Bou Arréridj University. His main research interests are focused on
Bayesian statistics,” in Proc. 1st Int. Conf. Audio-Video- Biometric Identification, Computer Vision and Computer Networks.
Based Biometric Person Authentication AVBPA’97. Berlin,
Germany: Springer-Verlag, Lecture Notes in Computer Yahia Belayadi was born in Bordj Bou Arréridj, Algeria in 1961. He
received his Engineer degree from Setif University Algeria in 1987.
Science, 1997, pp. 291–300.
He received his magister from Setif University Algeria in 1991.
[6] L. Hong and A. K. Jain, “Integrating faces and fingerprint for Currently; he is an assistant professor at the Computer department
Personal identification,” IEEE Trans. Pattern Anal. Machine of Bordj Bou Arréridj University, Algeria. He also is the director of
Intell., vol. 20, 1997. University Center of Continuous Education in Bordj Bou Arreridj.
[7] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On
Combining classifiers,” IEEE Trans. Pattern Anal. Machine Messaoud Mostefai was born in Bordj Bou Arréridj, Algeria in
Intell., vol. 20, pp. 226–239, 1998. 1967. He received his Engineer degree from Algiers University, Algeria
[8] A. K. Jain, L. Hong, and Y. Kulkarni, “A multimodal in 1990. He received a DEA degree en Automatique et Traitement
biometric system using fingerprints, face and speech,” in Numérique du Signal (Reims - France) in 1992. He received his doctorate
degree en Automatique et Traitement Numérique du Signal (Reims -
Proc. 2nd Int. Conf. Audio-Video Based Biometric Person
France) in 1995. He got his HDR Habilitation Universitaire : Theme :
Authentication, Washington, D.C., Mar. 22–23, 1999, « Adéquation Algorithme /Architecture en traitement d’images » in
pp. 182–187. (UFAS Algeria) in 2006. Currently; he is a professor at the Computer
[9] T. Choudhury, B. Clarkson, T. Jebara, and A. Pentland, department of Bordj Bou Arréridj University, Algeria. He is a member of
“Multimodal person recognition using unconstrained audio LMSE laboratory (a research laboratory in Bordj Bou Arréridj
and video,” in Proc. 2nd Int. Conf. Audio-Video Based University). His main research interests are focused on classification and
Person Authentication, Washington, D.C., Mar. 22–23, Biometric Identification, Computer Vision and Computer Networks.
1999, pp. 176–180.
[10] S. Ben-Yacoub, “Multimodal data fusion for person Youssef Chahir is an Associate Professor (since '00) at GREYC
authentication using SVM,” in Proc. 2nd Int. Conf. Audio- Laboratory CNRS UMR 6072, Department of Computer Science,
University of Caen Lower-Normandy France.
Video Based Biometric Person Authentication, Washington,
D.C., Mar. 22–23, 1999, pp. 25–30.
[11] M. Turk and A. Pentland. Eigenfaces for recognition.
Journal of Cognitive Science, pages 71–86, 1991.
[12] Ronny Tjahyadi, Wanquan Liu, Svetha Venkatesh.