Novel acoustic features for detection of hypernasality in cleft palate speech

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
This thesis aims towards the development of an objective method for the assessment of hypernasality in cleft palate speech based on the spectral analysis of speech. The proposed method is based on the detection and severity grading of vowels present in the hypernasal speech. The vowels get nasalized in hypernasal speech due to the mixing of nasal resonances with the oral resonances during the production of voiced sound. The mixing of resonances happens due to the air leakage through the nose because of the inability of velum to completely and consistently close the velopharyngeal gap. Hence the spectral characteristics of hypernasal vowels deviate from the normal. Attempts have been made in the thesis for hypernasality detection using the features which can capture the spectral deviation mainly present in the low-frequency region in hypernasal vowels. In the first work, hypernasality detection is attempted using the temporal features, vocal tract constriction (VTC), and peak to side-lobe ratio (PSR). In the second work, detection of hypernasality is done using the normalized harmonics amplitude (NHA), harmonics amplitude ratio (HAR) and dominant harmonics frequency (DHF) features. The features are based on harmonics intensity of the spectrum, and they are measured using the sinusoidal model of speech. In the third work, three cepstral features namely, Hilbert envelope of numerator of group delay feature (HNGDF), pitch-adaptive Melfrequency cepstral coefficients (PAMFCC) and spectral moment features augmented with low-order cepstral coefficients (SMAC) are individually used for the hypernasality detection. The performance of these features is compared with the baseline features. In the last work, a system is developed for the clinical application to assess the hypernasality. The systems are based on the severity grading of hypernasal speech. The system gives a nasality score to a speaker's speech between 0 to 1 based on which severity grading is done.
Supervisors: S R M Prasanna and S Dandapat