Limited Data Speaker Recognition

No Thumbnail Available
Date
2009
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This work demonstrates the task of recognizing the speaker with the constraint of limited data. The performance of the speaker recognition system depends on the techniques employed in the analysis, feature extraction, modelling and testing stages. Existing limited data speaker recognition techniques mostly concentrate on modelling techniques to improve the performance. It is also possible to improve the performance using efficient techniques for speech analysis, feature extraction, modelling and testing. We have developed techniques for each stage of the speaker recognition system to improve the performance. In the analysis stage, speech signal is analyzed using Single Frame Size and Rate (SFSR), Multiple Frame Size (MFS), Multiple Frame Rate (MFR) and Multiple Frame Size and Rate (MFSR) analysis techniques. For this study, theMel Frequency Cepstral Coefficients (MFCC) are used as features and Vector Quantization (VQ) as modeling technique. In order to verify the effectiveness of various techniques, we carried out initial experiment using 3 sec training and testing data for a set of first 30 speakers taken from the YOHO database. The SFSR, MFS, MFR and MFSR provide 70%, 73%, 87% and 90% identification performance, respectively. The experiments are later extended for other data sizes and databases. The same trend is observed in these experiments also. Thus this study demonstrates that the performance of speaker recognition under limited data condition can be improved using MFSR analysis technique. In the feature extraction stage, different feature extraction techniques like MFCC, Delta MFCC, Delta-Delta MFCC, Linear Prediction Residual (LPR), Linear Pre- diction Residual Phase (LPRP) and their combinations are explored. For this study, SFSR is used as analysis technique and VQ as modeling technique. The combina- tion of MFCC, Delta MFCC, Delta-Delta MFCC, LPR and LPRP features provided 83% performance against 70% for only MFCC in the initial experiment. The same trend is observed for other data sizes and databases. This infers that the combi- nation of features is effective for improving the performance of speaker recognition under limited data condition. In the modeling stage, experimental evaluation of the modelling techniques like Crisp Vector Quantization (CVQ), Fuzzy Vector Quantization (FVQ), Self-Organizing Map (SOM), Learning Vector Quantization (LVQ), GaussianMixtureModel (GMM) and Gaussian Mixture Model-Universal Background Model (GMM-UBM) is made. In addition, the combined classifiers evaluation is also made on the basis of exper- imental knowledge of individual modeling techniques. This includes 1) LVQ and FVQ, 2) LVQ and GMM, and 3) LVQ and GMM-UBM classifiers. In this study, SFSR analysis technique is used for extracting the MFCC features. The combined LVQ and GMM-UBM classifier provides 87% performance against 70% for CVQ in the initial experiment. The same experimental set up is extended for other data sizes and databases and observed the same trend. This study demonstrates that the combined LVQ and GMM-UBM modelling can be used for speaker recognition to improve its performance under limited data condition. The above studies are made independently. That is, proposed technique is used in the respective stage and the existing techniques in the remaining stages of the speaker recognition system. To analyze the effectiveness of proposed techniques, integrated systems are...
Description
Supervisor: Prasanna, S R Mahadeva
Keywords
ELECTRONICS AND ELECTRICAL ENGINEERING
Citation