Speech subspace modelling with speaker adaptation for stress normalization

This thesis work is an investigation on the normalization of stress information for the effective processing of stressed speech. Speakers change the speech production system to communicate the information about the adverse environmental factors and to retain the intelligibility of speech signals. Any diversifications in the environmental condition from the normal or neutral state lead to an adverse condition and it is referred as the stress condition. The speech signal produced under stress condition by any modification in the speech production system is called as the stressed speech. The speech produced under normal or neutral condition is generally referred to as the neutral speech. Stress induces a large acoustic mismatch between the different speech units of neutral and stressed speech. These mismatched properties severely affect various real life applications. Thus, there is an essential need of stress normalization, that can reduce the acoustic mismatch between the neutral and the stressed speech and help the users with a better robust practical application. The present thesis aims at developing robust and computationally efficient algorithms to normalize the stress information.First, novel linear and non-linear subspace modelling approaches are proposed to reduce the acoustic mismatch between the neutral and the stressed speech signals. The linear characteristic of stressed speech has been studied on the linear subspace. The linear subspace is modelled by exploiting an orthogonal projection and linear transformation techniques. The non-linearity between the speech and the stress information has been investigated on the non-linear data space by exploring the subspace projection through the non-linear transformation using the polynomial function.
Supervisor: Samarendra Dandapat