Approaches for robust text-dependent speaker verification under degraded conditions
No Thumbnail Available
The objective of this thesis work is to develop a robust text-dependent speaker verification (TDSV) system by using robust techniques for achieving better system performance under clean and degraded speech conditions. To achieve this, three different directions are explored for a TDSV task. The existing TDSV system employs energy based end point detection, mel frequency cepstral coefficients (MFCCs) as features and dynamic time warping (DTW) for template matching. The same is treated as baseline system in this work. The performance of the baseline system affected depending on operating conditions in practice. The work attempts to improve the performance by providing robustness at different levels.In practice, the speech signal is affected by the acoustic degradation present in the recording environment. This results in poor performance at different stages. One way is to first enhance the speech signal and then perform TDSV. The first novel contribution proposes combined temporal and spectral speech enhancement for enhancing speech regions embedded in background noise. The efficacy of the proposed framework is demonstrated by comparing the performance with the baseline system.The spectral or cepstral based features, mainly MFCCs are used in the baseline system. In the next exploration, the goal is to develop new features. A new approach for feature extraction based on modified empirical mode decomposition (MEMD) is attempted. The Hilbert spectrum (HS) based features are extracted from the intrinsic mode functions (IMFs) of MEMD and used as features for TDSV.
Supervisor: S. R. Mahadeva Prasanna
ELECTRONICS AND ELECTRICAL ENGINEERING