Foreground speech segmentation and enhanecement

Speech enhancement is one of the active areas of research and a challenging task when the signal is recorded in natural environments. In a typical recording scenario using a single microphone, it is safe to assume that the desired speaker is closer to the microphone sensor, relative to other interfering acoustic sources. In this work, the speech signal from close speaking person is regarded as foreground speech and rest of the interfering sources as {\it background noise}. Due to the close proximity of the desired speaker to the microphone, compared to other background sources, there are differences in the signal characteristics. When the speech signal is recorded in natural environments, the production characteristics tend to vary depending on the levels of interfering sources. The objective of this thesis work is to exploit such unique characteristics of speech production to temporally segment foreground speech from rest of the background and further enhance it. The high signal to noise ratio (SNR) regions of foreground speech are robust to interfering noise. The high SNR region around glottal closure instants (GCIs) in the time domain and vocal tract information in the spectral domain is used to derive certain features to segment and enhance foreground speech.
Supervisor: S. R. M. Prasanna