Speaker Verification Under Degraded Conditions Using Vowel-Like and Nonvowel-Like Regions
No Thumbnail Available
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis proposes a speaker verification system by independent processing of vowel-like regions (VLRs) and non-vowel-like regions (non-VLRs) for achieving better SV perfor- mance under clean and degraded conditions. VLRs are defined as the speech regions belonging to vowels, diphthongs and semivowels, and rest of the consonants as non-VLRs. Methods are proposed for detecting VLRs and non-VLRs using excitation source informa- tion. The VLR onset point (VLROPs) and end points (VLREPs) are hypothesized and used in an iterative algorithm for detecting the VLRs. Next, for detection of non-VLRs, the linear prediction (LP) residual samples in the VLRs are attenuated significantly to indirectly emphasize the residual samples in the non-VLRs. The modified LP residual samples excite the time varying all pole filter to reconstruct non-VLRs enhanced speech and used for detecting non-VLRs. For any practical application of a text-independent speaker verification (SV) system, along with phonetic variability, the speech signal may be affected by background noise, sensor mismatch and channel mismatch. To reduce the effect of these variabilities, three different methods are proposed for processing the VLRs and non-VLRs during training and testing of a SV system. First, a SV system is developed by using only the VLRs to demonstrate the significance of the VLRs for SV under degraded conditions. Then, the VLRs and non- VLRs are used independently during training and testing of a SV system, and the scores are combined with higher weight on VLRs, for a better SV system under clean and degraded conditions. Finally, a SV system is developed by implicit modeling of VLRs and non-VLRs information to reduce the computational complexity involved in the explicit segmentation of these regions. The experimental results presented in this thesis work shows that the VLRs are more speaker specific and relatively less affected under degraded conditions. A better SV system can be developed under clean and degraded conditions by independent processing of VLRs and non-VLRs with emphasis on the VLRs.
Description
Supervisor: Prasanna, S R Mahadeva
Keywords
ELECTRONICS AND ELECTRICAL ENGINEERING