Improving children’s speech recognition under mismatched condition using artificial band width extension

Sunil Y

Improving children’s speech recognition under mismatched condition using artificial band width extension

Files

Abstract-TH-1705_08610211.pdf (192.78 KB)

TH-1705_08610211.pdf (12.16 MB)

Date

2017

Authors

Sunil Y

Abstract

Children’s speech production system distinguishes itself from the adults’ by shorter vocal tract length and higher pitch value. Due to shorter vocal tract length, formant frequency values shift to higherband (3400-8000 Hz) region. The higher pitch value results in relatively more fluctuations in the spectrum compared to adults. Narrowband (NB, 300-3400 Hz) automatic speech recognition (ASR) performance of children’s speech degrades significantly due to loss of information in higher band. This work develops artificial bandwidth extension (ABWE) methods that restore higher band spectral information. The ASR is a connected digit recognition task which has models trained using adults’ speech and tested using children’s speech, termed as mismatched condition. The ABWE methods using class-specific, age-specific and delta features are developed and used in the children’s speech recognition under mismatched condition. All of them show improvement in performance. A computationally efficient architecture for mel frequency cepstral coefficients (MFCC) based ABWE for ASR is developed that avoids vocoder framework for bandwidth extension. In the proposed method, the narrowband MFCC is directly converted into wideband MFCC thus avoiding the synthesis process. Sparse representation based ABWE (SR-ABWE) algorithm is proposed using coupled dictionaries. To further enhance SR-ABWE, least square transformation has been developed to estimate wideband codes from NB interpolated codes. Existing semi-coupled dictionary learning (SCDL) method has been explored for ABWE (SC-ABWE). An improvement in the performance of SC-ABWE is observed in terms of objective quality measures. The significance of SR-ABWE is also demonstrated in children’s ASR.

Description

Supervisors: Sinha, R and Prasanna, S R Mahadeva

Keywords

ELECTRONICS AND ELECTRICAL ENGINEERING

URI

https://gyan.iitg.ac.in/handle/123456789/960

Collections

PhD Theses (Electronics and Electrical Engineering)

Full item page

Gyan-IR

Improving children’s speech recognition under mismatched condition using artificial band width extension

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By