Improving quality Of statistical parametric speech synthesis using sonority information
No Thumbnail Available
This thesis aims towards improving naturalness and intelligibility of synthesized speech obtained from statistical parametric speech synthesis (SPSS). Along with the conventional source and spectral information, some additional significant features can also be derived from the speech signal to preserve its characteristics in parametric form. The sonority information represents spectral prominence, higher energy and periodicity aspects, which are related to human speech perception, that change with the varying vocal-tract constriction and glottal source amplitude during speech production. Therefore, this information is extracted from the speech signal in terms of sonority feature. It is capable to delineate the degree of sonority associated with a sound unit. The sonority feature is incorporated in the SPSS framework to use it in the studies related to this thesis.To alleviate the over-smoothing effect from parameter sequences generated from SPSS, post-filtering mechanisms are found to be effective. By considering the fact that the characteristics of the speech parameters may extensively vary based on the broad categories of sound units, a class based dynamic post-filtering method is proposed. The excitation source (fundamental frequency and strength of excitation (SoE)) and spectral parameters (sharpness of peaks and valleys of the spectrum) corresponding to each frame are enhanced using post-filtering factors that change with sonorant sound categories. The sonorant class information is derived from a support vector machine based classifier trained using sonority feature associated with each frame. This method improves the temporal variation, fine spectral structure as well as reduces the deviation with the natural counterpart leading to improvement in synthesized speech quality.
Supervisor: S R Mahadeva Prasanna
ELECTRONICS AND ELECTRICAL ENGINEERING