Improving children’s speech recognition under mismatched condition using artificial band width extension

dc.contributor.authorSunil Y
dc.date.accessioned2018-05-30T09:33:33Z
dc.date.accessioned2023-10-20T07:29:12Z
dc.date.available2018-05-30T09:33:33Z
dc.date.available2023-10-20T07:29:12Z
dc.date.issued2017
dc.descriptionSupervisors: Rohit Sinha and S. R. Mahadeva Prasannaen_US
dc.description.abstractChildren’s speech production system distinguishes itself from the adults’ by shorter vocal tract length and higher pitch value. Due to shorter vocal tract length, formant frequency values shift to higherband (3400-8000 Hz) region. The higher pitch value results in relatively more fluctuations in the spectrum compared to adults. Narrowband (NB, 300-3400 Hz) automatic speech recognition (ASR) performance of children’s speech degrades significantly due to loss of information in higher band. This work develops artificial bandwidth extension (ABWE) methods that restore higher band spectral information. The ASR is a connected digit recognition task which has models trained using adults’ speech and tested using children’s speech, termed as mismatched condition. The ABWE methods using class-specific, age-specific and delta features are developed and used in the children’s speech recognition under mismatched condition. All of them show improvement in performance. A computationally efficient architecture for mel frequency cepstral coefficients (MFCC) based ABWE for ASR is developed that avoids vocoder framework for bandwidth extension. In the proposed method, the narrowband MFCC is directly converted into wideband MFCC thus avoiding the synthesis process. Sparse representation based ABWE (SR-ABWE) algorithm is proposed using coupled dictionaries. To further enhance SR-ABWE, least square transformation has been developed to estimate wideband codes from NB interpolated codes. Existing semi-coupled dictionary learning (SCDL) method has been explored for ABWE (SC-ABWE). An improvement in the performance of SC-ABWE is observed in terms of objective quality measures. The significance of SR-ABWE is also demonstrated in children’s ASR.en_US
dc.identifier.otherROLL NO.08610211
dc.identifier.urihttps://gyan.iitg.ac.in/handle/123456789/960
dc.language.isoenen_US
dc.relation.ispartofseriesTH-1705;
dc.subjectELECTRONICS AND ELECTRICAL ENGINEERINGen_US
dc.titleImproving children’s speech recognition under mismatched condition using artificial band width extensionen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Abstract-TH-1705_08610211.pdf
Size:
192.78 KB
Format:
Adobe Portable Document Format
Description:
Abstract
No Thumbnail Available
Name:
TH-1705_08610211.pdf
Size:
12.16 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: