Nothing Special   »   [go: up one dir, main page]

Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter July 26, 2013

Classification of Infant Cries Using Dynamics of Epoch Features

  • Avinash Kumar Singh , Jayanta Mukhopadhyay , K. Sreenivasa Rao and Kapinaiah Viswanath EMAIL logo

Abstract

In this article, epoch-based dynamic features such as sequence of epoch interval values and epoch strength values are explored to classify infant cries. Epoch is the instant of significant excitation of the vocal tract system during the production of speech. For voiced speech, the most significant excitation takes place around the instant of glottal closure. The different types of infant cries considered in this work are hunger, pain, and wet diaper. In this work, epoch strength and epoch interval features are used to represent infant cry-specific information from the acoustic signal. In this study, the proposed features such as epoch interval and epoch strength values are determined using zero-frequency filter-based method. Gaussian mixture models (GMMs) are used to classify the above-mentioned cries from the features proposed in this work. GMMs are developed separately for each of the cries using the proposed features. The infant cry database collected under a telemedicine project at the Indian Institute of Technology Kharagpur has been used for this study. In the first step, infant cry recognition accuracy is investigated separately using epoch interval and epoch strength features. To enhance recognition performance, GMMs developed using various features are combined through score level fusion techniques. The recognition performance using a combination of evidence is found to be superior over individual systems.

1 Introduction

For infants, cry is the basic communication tool to express their physical, emotional, and psychological states. An infant may cry for a variety of reasons, for example, in pain or feeling tired, thirsty, lonely, uncomfortable due to a full diaper, etc. Similar to adult speech, the infant’s crying is a way of communication, although more limited. A newborn infant’s cry consists of very high fundamental frequency (F0) with sudden changes and voiced or unvoiced features of very short duration. The highly varying vocal tract resonance frequencies need to be tracked accurately. There are a number of ways to analyze infant cries. In [2] and our work [16], spectral, prosodic, and epoch features are used to recognize infant cries. In [3], linear prediction coefficients (LPC) and intensity features are used to discriminate cries due to pain and hunger using neural networks. In [14, 24], mel-frequency cepstral coefficients (MFCC) and LPC features were used to discriminate normal vs. pathological cry. In [1], MFCC and linear prediction cepstral coefficients (LPCC) are explored to discriminate pain vs. nonpain. Vempada et al. [18] have explored spectral and prosodic features to characterize infant cries. In their work, MFCCs are used to represent the spectral information and short-time frame energies, and pause duration are used to represent the prosodic information. They have considered three cries, namely, hunger, wet diaper, and pain. Support vector machines are used to capture the discriminative information with respect to cries from spectral and prosodic features. In our earlier work, we have explored source, system, and suprasegmental features to classify the cries [17]. In that work, MFCC features are used to represent the vocal tract system characteristics, residual mel-frequency cepstral coefficients (RMFCCs) and linear prediction (LP) residual samples are used to represent the excitation source information, and modulation spectral features are used to represent suprasegmental information.

It is observed that most of the researchers discriminated the cries as biclassification problem such as pain vs. no-pain, pain vs. hunger, and normal vs. disease cries. In most of the works, only spectral features such as MFCC and LPCC are used to classify the cries. Few studies have explored source and prosodic features for discriminating infant cry samples. However, none of the studies have explored subsegmental features such as the characteristics of glottal closure regions present around the instant of significant excitation.

From the existing studies [6–9, 11, 13, 22], it is already evident that instants of significant excitation (epochs) or glottal closure instants are very crucial in speech analysis. In [4, 9], epoch knowledge is used for extracting the emotion-specific features. Instants of significant excitation are exploited in [6, 8, 10, 11] for developing signal processing methods to modify the duration and pitch for generating high-quality speech. In [8], pitch-synchronous speaker-specific features are extracted using instants of glottal closure for voice conversion application. In [13], a computationally efficient method was proposed to determine the instants of significant excitation. In [19–22], instants of significant excitation are used for accurate detection of vowel onset points. The instants of significant excitation are used as anchor points in [12] to extract language-specific spectral features. Therefore, in this work, we have explored two important features related to instants of significant excitation for discriminating infant cries: epoch interval and epoch strength. In this study, we have used the sequence of epoch interval and epoch strength values to characterize the cry-specific information. The sequence of epoch interval values [epoch interval contour (EIC)] represents the dynamics of excitation source characteristics (i.e., rate of vocal fold vibration). The epoch strength indicates the strength of excitation at the instant of glottal closure. In this work, the strength of excitation is determined using the slope of zero-frequency-filtered (ZFF) signal at the instant of glottal closure. In this study, we considered three cries, namely, hunger, pain, and wet diaper. Gaussian mixture models (GMMs) are used for developing the models to capture the distribution of features specific to each cry.

The rest of the article is organized as follows: Section 2 describes the database used for infant cry discrimination. Section 3 describes the proposed features for discriminating infant cries. Development of GMMs using proposed features is discussed in Section 4. Recognition performance of the developed GMMs with proposed features is explained in Section 5. The summary of this article and the future work needed to improve the recognition performance are described in the final section.

2 Infant Cry Database

The infant cry speech corpus used for this study is collected from 120 infants in the neonatal intensive care unit (NICU) of SSKM Hospital located in Kolkata. The database consists of three different types of cries, namely, wet diaper, hunger, and pain. The total number of cry clips collected for hunger, pain, and wet diaper are 60, 30, and 30, respectively. The total duration of the hunger, pain, and wet-diaper cry used in this work is 725 s for each cry. The infants selected for recording the cries are within the age group of 12–40 weeks old. Recording has been done using a Sony digital recorder with sampling rate of 44.1 kHz. For this study, the recorded data are down-sampled to 16 kHz and represented each sample as a 16-bit number.

3 Features

In this work, we have proposed epoch-based features such as sequence of epoch interval values (EIC) and sequence of epoch strength values [epoch strength contour (ESC)] for classifying infant cries. The following subsections discuss the details of extraction of the proposed features.

3.1 Epoch Interval Contour

In the context of infant cry, the information contributed by the excitation source is different compared with vocal tract system characteristics [23]. The specific excitation information is due to the shape, size, and dynamics associated with the vocal folds and associated muscle structure.

Instants of significant excitation (i.e., instants of glottal closure) of speech signal are referred as epochs. In the context of speech, most of the significant excitation takes place due to glottal vibration. During glottal vibration, the major impulse-like excitation takes place during the closing phase of the glottal cycle. The discontinuity due to impulse excitation is reflected across all the frequencies including zero frequency. Knowledge of epoch locations is useful for the accurate estimation of the F0. Epochs can be used as pitch markers for prosody manipulation, which is useful in applications such as text-to-speech system, voice and speech rate conversion, and speech recognition [6–9, 11, 13, 22]. In this study, we will explore one of the epoch features, EIC, for discriminating infant cries.

In this work, EIC is extracted using the ZFF method. The ZFF method determines the instants of significant excitation (epochs) from the given speech signal. From the epoch sequence extracted from ZFF, the epoch interval associated to each epoch is determined by calculating the time difference between the present epoch and its immediate following epoch. The sequence of the epoch interval values constitutes the EIC, which is the proposed source feature in the present study.

The ZFF approach is adopted in this work to derive the epoch sequence from the speech signal. The ZFF is achieved using the zero-frequency-resonator (ZFR)-based technique, which relies on the observation that the impulsive nature of the excitation at glottal closure instants or epochs is reflected across all frequencies [5]. The output of the ZFRs is mainly controlled by the excitation pulses. The ZFF [5] method consists of following sequence of steps:

  1. Find the difference between the successive samples of input speech signal to remove any time-varying low-frequency bias in the signal:

  2. Compute the output of cascade of two ideal digital resonators at 0 Hz, i.e.,

    where a1 = +4, a2 = –6, a3 = +4, and a4 = –1. It may be noted that this is equivalent to passing the signal x(n) through a digital filter given by

  3. Remove the trend, i.e.,

    where

    where 2N + 1 corresponds to the size of the window used for computing the local mean, which is typically the average pitch period computed over a long segment of speech.

  4. The removed trend signal is termed as the ZFF signal. The instants of significant excitation correspond to the positive zero crossings in the ZFF signal.

Epoch extraction using the ZFF method for the segment of voiced speech is shown in Figure 1. Figure 1A shows the segment of the voiced speech; its ZFF signal and the derived epoch locations are shown in Figure 1B and C, respectively. Among the existing epoch extraction methods, the ZFF method determines the epoch locations with the highest accuracy.

Figure 1 Epoch (GCI) Extraction Using the ZFF Method: (A) Segment of Voiced Speech Signal, (B) Local Mean Subtracted ZFF Signal, and (C) Epoch Locations from ZFF Signal.
Figure 1

Epoch (GCI) Extraction Using the ZFF Method: (A) Segment of Voiced Speech Signal, (B) Local Mean Subtracted ZFF Signal, and (C) Epoch Locations from ZFF Signal.

3.2 Epoch Strength Contour

The manner in which vocal folds vibrate influences the glottal airflow that serves as an excitation source for the vocal-tract filter. A sharper closure of the vocal folds corresponds to a stronger excitation of the vocal tract system. In this work, we exploit the narrow-band nature of the ZFR to measure the strength of excitation at each instant. Because the effect due to an impulse is spread uniformly across the frequency range, the relative strengths of impulses can be derived from a narrow band around any frequency, including the zero frequency. Hence, the information about the strength of excitation can also be derived from the output signal of ZFR. It is observed that the slope of the zero-frequency-filtered signal around the zero crossings corresponding to the epoch locations gives a measure of strength of excitation or the epoch strength. It is determined from the cry signal using the following steps.

  1. Derive the ZFF signal corresponding to the given cry signal.

  2. The negative-to-positive zero crossings of the ZFF signal are identified as the epoch locations.

  3. At each epoch location, determine its strength by computing the slope of the ZFF signal in the vicinity of the positive zero crossing.

  4. Here, the slope is determined by computing the magnitude difference of the ZFF signal samples present on either side of the zero crossing.

  5. The sequence of these slope values corresponds to ESC.

4 Gaussian Mixture Model

The GMM is one of the statistically mature methods for unsupervised clustering [15]. The complete GMM is parametrized by the mean vector, diagonal of covariance matrix, and the mixture weight of each component. These parameters are collectively represented by the following notation:

where M was set as 64, ρi is the weight, is the mean vector, and Σi is the covariance matrix of ith component. The mixture weights satisfy the constraint that These parameters are initialized using k-means clustering on the training set of feature vectors, with k = M.

A well-known expectation–maximization (EM) algorithm is used for finding the maximum likelihood (ML) estimates of the parameters. EM is an iterative method that alternates between performing an expectation (E) step, which computes an expectation of the log likelihood with respect to the current estimate of distribution, and a maximization (M) step, which computes the parameters that maximize the expected likelihood found in the E step. These parameters are then used for the E step of the next iteration. We performed 100 iterations of EM steps in our experiment. Once the parameters are re-estimated by EM algorithm, the training phase is completed. Now, given a test vector of dimension D, the score is generated by

where is the component density of the ith component, given by

In this work, approximately 70% of the data are used for developing GMMs, and the remaining 30% of the data are used for validation. For each cry category, a separate GMM is developed to represent the distribution of feature vectors belonging to that particular cry. In this work, the infant cry recognition system (IRCS) consists of three GMMs followed by a decision logic (Figure 2). Figure 2A shows the training phase and Figure 2B shows the testing phase for a given input cry utterance. Separate recognition systems are developed for analyzing the discriminating capability of each of the proposed features. To enhance recognition performance, evidence from the individual systems are combined with appropriate weights. In this work, three ICR systems (IRCSs) are developed using individual and combination proposed features.

  • ICRS-1: IRCS developed using EIC features.

  • ICRS-2: IRCS developed using ESC features.

  • ICRS-3: IRCS developed using score level fusion of EIC and ESC features.

Figure 2 Infant Cry Recognition Architecture: (A) Training Phase and (B) Testing Phase.
Figure 2

Infant Cry Recognition Architecture: (A) Training Phase and (B) Testing Phase.

Given the training vectors and the GMM configuration, we want to estimate the parameters of the GMMs, which, in some sense, best matches the distribution of the training feature vectors. There are several techniques available for estimating the parameters of a GMM. In this article, the most popular and well-established method is used for developing GMMs, the ML estimation.

5 Results and Discussion

In this work, we have analyzed the recognition performance using 2-, 6-, and 12-s cry samples. Recognition performance is observed to be improved using 6- and 12-s samples compared with 2-s samples. Among the 6- and 12-s samples, not much variation in recognition performance is observed. The results mentioned in this article are confined to 6-s cry samples. In this study with 6-s duration, the number of test samples corresponding to hunger, pain, and wet diaper are 167, 157, and 161, respectively. The performance of the IRCS using the proposed features is represented in the form of a confusion matrix. The diagonal elements of the confusion matrix represent the correct classification performance of cries. Elements other than the diagonal elements indicate the misclassification performance. The accuracy of recognition is analyzed using different sizes of feature vectors. In this work, the sequence of the 5, 10, and 20 values of epoch intervals and epoch strengths are examined to analyze the recognition performance of developed cry recognition systems. In each case, the recognition performance is also analyzed for different numbers of Gaussian components varying from 4 to 14. The recognition performance of ICRS-1 and ICRS-2 for different lengths of feature vectors and different number of Gaussian components is given in Table 1. The first column of the table indicates the number of Gaussian components used for analyzing recognition accuracy. Columns 2–3, 4–5, and 6–7 indicate the recognition accuracy of ICRS-1 and ICRS-2 for 5-, 10-, and 20-dimensional feature vectors. From the results presented in Table 1, it is observed that the recognition accuracy of ICRS-1 is better with 10-dimensional feature vectors and 10 Gaussian components. In the case of ICRS-2, a better recognition accuracy is observed with 10-dimensional feature vectors and 12 Gaussian components.

Table 1

Recognition Performance of ICRS-1 and ICRS-2 for Different Lengths of Feature Vectors and Different Number of Gaussian Components.

No. of Gaussian componentsAverage recognition performance (%)
5-Dimensional feature vector10-Dimensional feature vector20-Dimensional feature vector
ICRS-1ICRS-2ICRS-1ICRS-2ICRS-1ICRS-2
441.126.150.343.652.630.3
850.244.251.349.154.739.6
1063.543.170.152.667.547.4
1261.342.657.754.062.345.6
1458.941.561.450.759.839.6

In this work, IRCSs using EIC and ESC are explored with feature vectors derived from cry signals of 5, 10, and 20 dimensions. For each type of feature vectors (5, 10, and 20 dimensions), various numbers of Gaussian components ranging from 4 to 14 were explored. It is observed that the performance of both cry systems is better for feature vector of 10 dimensions. The comparative recognition performance of ICRS-1 and ICRS-2 is given in Table 1. The details of the recognition performance using the proposed features are discussed in following subsections in which the number of dimensions used for the feature vector is 10 and the number of Gaussian components used in ICRS-1 and ICRS-2 is 10 and 12, respectively.

5.1 Performance Evaluation of ICRS-1 Developed Using EIC Features

The recognition performance of the ICRS-1 developed using EIC features is shown in Table 2. Here, the cry recognition system is developed using 10-dimensional EIC feature vectors with 10-component GMM. The average recognition accuracy is found to be approximately 70.1%. Among the three cries, hunger and wet diaper are recognized with 84.4% and 89.4%, respectively, and pain is recognized very poorly with 35% accuracy. From this result, it is observed that the EIC feature with respect to pain may not have distinct characteristics, and it may be similar to hunger as well as wet diaper.

Table 2

Performance of the IRCS Developed Using EIC Features Extracted from Cry Signal.

CryRecognition performance (%)
HungerPainWet diaper
Hunger84.47.28.4
Pain36.935.028.1
Wet diaper7.53.189.4

5.2 Performance Evaluation of ICRS-2 Developed Using ESC Features

The recognition performance of the ICRS-2 developed using ESC features is shown in Table 3. Here, the cry recognition system is developed using 10-dimensional ESC feature vectors with 12-component GMM. The average recognition accuracy is found to be approximately 54%. Among the three cries, hunger and wet diaper are recognized with 52.1% and 82.6%, respectively, and pain is recognized very poorly with 29.3% accuracy. From this result, it is observed that the ESC feature with respect to pain may not have distinct characteristics, and it may be similar to wet diaper. From the results, it is observed that approximately 38.3% of hunger samples and 52.2% of pain samples are both misclassified as wet diaper.

Table 3

Performance of the IRCS Developed Using ESC Features Extracted from Cry Signal.

CryRecognition performance (%)
HungerPainWet diaper
Hunger52.115.638.3
Pain18.529.352.2
Wet diaper6.610.882.6

5.3 Combination of the ICRS-1 and ICRS-2 Systems Using Score Level Fusion

The score level fusion is achieved by summing the weighted probability scores of individual IRCSs developed using EIC and ESC features. The weighting rule used in this study is

with

where P is the combined normalized probability score, PEIC is the mean probability score of ICRS-1, PESC is the mean probability score of ICRS-2, and WEIC and WESC are the weighting factors of the ICRS-1 and ICRS-2 systems, respectively. By varying WEIC and WESC from 0.1 to 0.9, we will get nine combinations of weighting factors. It is observed that recognition performance is approximately 73% for the weight factors of 0.2 and 0.8 for ICRS-1 and ICRS-2, respectively. The recognition performance of the ICRS-3 developed using the score level combination of evidence from the ICRS-1 and ICRS-2 systems is shown in Table 4. Wet diaper is recognized with highest accuracy of approximately 94.7%, whereas hunger and pain are recognized with 76.6% and 48.4%, respectively. The details of the confusion matrix indicating the recognition performance of ICRS-3 (score level combination of ICRS-1 and ICRS-2) are given in Table 4. From the results, it is observed that the recognition performance of wet diaper and pain is improved by combining the evidence of ICRS-1 and ICRS-2 using score level fusion. From this observation, we may hypothesize that the EIC and ESC features may contain some nonoverlapping cry-specific information. The recognition accuracy of ICRS-1, ICRS-2, and ICRS-3 is shown in the form of a bar chart in Figure 3 for the purpose of comparison. The overall recognition accuracy of individual IRCSs (ICRS-1 and ICRS-2) and their score level fusion system (ICRS-3) are given in Table 5.

Table 4

Performance of the IRCS Developed Using Score Level Fusion of EIC and ESC Features.

CryRecognition performance (%)
HungerPainWet diaper
Hunger76.612.311.1
Pain29.348.422.3
Wet diaper4.01.394.7
Table 5

Recognition Performance of Different Cry Recognition Systems Developed Using the Proposed Features and Their Combinations.

CryRecognition performance (%)
EICESEIC + ES
ICRS-1ICRS-2ICRS-3
Hunger84.452.176.6
Pain35.029.348.4
Wet diaper89.482.694.7
Average70.154.073.0
Figure 3 Average Recognition Performance of All Proposed Features and their Combination.
Figure 3

Average Recognition Performance of All Proposed Features and their Combination.

6 Summary and Conclusions

In this work, temporal dynamics in epoch parameters are explored for classifying infant cries. In this work, the temporal dynamics in epoch parameters are represented by EIC and ESC features. GMMs are used as classification models for developing different ICRSs. The recognition performance of the ICRSs developed by the EIC and ESC features is observed to be 70.1% and 54.0%, respectively. To enhance recognition accuracy, evidence of individual ICRSs are combined at the score level. The performance of the combined ICRS was improved by score level fusion of evidence from the individual systems. The overall recognition performance of the combined system using score level fusion is found to be 73.0%. The performance of the ICRS may be further improved by combining evidence from system and prosodic features in addition to the proposed features.


Corresponding author: Kapinaiah Viswanath, Department of Telecommunication Engineering, Siddaganga Institute of Technology, Tumkur 572103, Karnataka, India

The work presented in this article was performed at the Indian Institute of Technology Kharagpur as a part of the project “Development of a Web Enabled e-Healthcare System for Neonatal Patient Care Services (eNPCS)”, sponsored by the Ministry of Communication and Information Technology (MCIT), Government of India. Our special thanks to the project team at the Indian Institute of Technology Kharagpur and doctors at the Department of Neonatology, SSKM Hospital, for collecting the cry samples of the infants.

Bibliography

[1] Y. Abdulaziz and S. M. S. Ahmad, Infant cry recognition system: a comparison of system performance based on mel frequency and linear prediction cepstral coefficients, in: 2010 International Conference on Information Retrieval and Knowledge Management (CAMP), pp. 260–263, Shah Alam, Selangor, Malaysia, IEEE, March 2010.10.1109/INFRKM.2010.5466907Search in Google Scholar

[2] S. A. K. Buddi, Infant cry recognition using spectral and prosodic features, Master’s thesis, School of Information Technology, IIT Kharagpur, Kharagpur, India, May 2012.Search in Google Scholar

[3] O. O. García and C. A. R. García, Applying scaled conjugate gradient for the classification of infant cry with neural networks, in: European Symposium on Artificial Neural Networks, pp. 349–354, Bruges, Belgium, April 2003.Search in Google Scholar

[4] S. G. Koolagudi and K. S. Rao, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features, Int. J. Speech Technol.15 (2012), 495–511.10.1007/s10772-012-9150-8Search in Google Scholar

[5] K. S. R. Murthy and B. Yegnanarayana, Epoch extraction from speech signals, IEEE Trans. Audio Speech Lang. Process.16 (2008), 1602–1613.10.1109/TASL.2008.2004526Search in Google Scholar

[6] K. S. Rao, Real time prosody modification, J. Signal Inf. Process.1 (2010), 50–62.10.4236/jsip.2010.11006Search in Google Scholar

[7] K. S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach, Comput. Speech Lang.24 (2010), 474–494.10.1016/j.csl.2009.03.003Search in Google Scholar

[8] K. S. Rao, Unconstrained pitch contour modification using instants of significant excitation, Circuits Syst. Signal Process31 (2012), 2133–2152.10.1007/s00034-012-9428-8Search in Google Scholar

[9] K. S. Rao and S. G. Koolagudi, Characterization and recognition of emotions from speech using excitation source information, Int. J. Speech Technol.16 (2013), 181–201.10.1007/s10772-012-9175-zSearch in Google Scholar

[10] K. S. Rao and A. K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points, J. Speech Commun.55 (2013), 745–756.10.1016/j.specom.2013.03.002Search in Google Scholar

[11] K. S. Rao and B. Yegnanarayana, Prosody modification using instants of significant excitation, IEEE Trans. Speech Audio Process.14 (2006), 972–980.10.1109/TSA.2005.858051Search in Google Scholar

[12] K. S. Rao, S. Maity and R. R. Vempada, Pitch synchronous and glottal closure based speech analysis for language recognition, Int. J. Speech Technol. (2013), DOI: 10.1007/s10772-013-9193-5.10.1007/s10772-013-9193-5Search in Google Scholar

[13] K. S. Rao, S. R. M. Prasanna and B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function, IEEE Signal Process. Lett.14 (2007), 762–765.10.1109/LSP.2007.896454Search in Google Scholar

[14] O. F. Reyes-Galaviz and C. A. Reyes-Garcia, A system for the processing of infant cry to recognize pathologies in recently born babies with neural networks, in: 9th Conference and Computer (SPECOM), pp. 1–6, St. Petersburg, Russia, September 2004.Search in Google Scholar

[15] D. A. Reynolds and R. C. Rose, Robust text independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process.3 (1995), 72–83.10.1109/89.365379Search in Google Scholar

[16] A. K. Singh, J. Mukhopadhyay and K. S. Rao, Classification of infant cries using epoch and spectral features, in: National Conference on Communication (NCC-2013), IIT Delhi, New Delhi, India, February 2013.10.1109/NCC.2013.6487999Search in Google Scholar

[17] A. K. Singh, J. Mukhopadhyay and K. S. Rao, Classification of infant cries using source, system and supra-segmental features, in: Indian Conference on Medical Informatics and Telemedicine (ICMIT-2013), IIT Kharagpur, Kharagpur, India, March 2013.10.1109/IndianCMIT.2013.6529409Search in Google Scholar

[18] R. R. Vempada, B. S. A. Kumar and K. S. Rao, Characterization of infant cries using spectral and prosodic features, in: National Conference on Communications (NCC-2012), IIT Kharagpur, Kharagpur, India, February 2012.10.1109/NCC.2012.6176851Search in Google Scholar

[19] A. K. Vuppala and K. S. Rao, Vowel onset point detection for noisy speech using spectral energy at formant frequencies, Int. J. Speech Technol.16 (2013), 229–235.10.1007/s10772-012-9179-8Search in Google Scholar

[20] A. K. Vuppala, K. S. Rao and S. Chakrabarti, Improved vowel onset point detection using epoch intervals, Int. J. Electron. Commun.66 (2012), 697–700.10.1016/j.aeue.2011.12.013Search in Google Scholar

[21] A. K. Vuppala, K. S. Rao and S. Chakrabarti, Spotting and recognition of consonant–vowel units from continuous speech using accurate vowel onset points, Circuits Syst. Signal Process,31 (2012), 1459–1474.10.1007/s00034-012-9391-4Search in Google Scholar

[22] A. K. Vuppala, J. Yadav, S. Chakrabarti and K. S. Rao, Vowel onset point detection for low bit rate coded speech, IEEE Trans. Audio Speech Lang. Process.20 (2012), 1894–1903.10.1109/TASL.2012.2191284Search in Google Scholar

[23] J. J. Wolf, Efficient acoustic parameters for speaker recognition, J. Acoust. Soc. Am.51 (1972), 2044–2056.10.1121/1.1913065Search in Google Scholar

[24] F.-L. Yuan and X.-N. Huang, Infant cry recognition based on feature extraction, in: International Conference on Information, Networking and Automation (ICINA) vol. 2, pp. v2 166–v2 169, October 2010.Search in Google Scholar

Received: 2013-6-12
Published Online: 2013-07-26
Published in Print: 2013-09-01

©2013 by Walter de Gruyter Berlin Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 14.12.2024 from https://www.degruyter.com/document/doi/10.1515/jisys-2013-0050/html
Scroll to top button