Devi, Thiyam Susma2024-07-312024-07-312024ROLL NO.186101010https://gyan.iitg.ac.in/handle/123456789/2678Supervisor: Das, Pradip KSpeech is a natural and intuitive mode of human communication, underscoring the essence of interpersonal interaction. Automatic Speech Recognition (ASR) is a pivotal innovation in digital technology, empowering devices to comprehend and process spoken language seamlessly. ASR’s applications span various domains, including dictation software, voice-activated assistants and automated call centers, thus revolutionizing how we engage with technology. Its significance extends further to the development of assistive devices for individuals with disabilities and the preservation of endangered languages, wherein ASR catalyzes documentation and linguistic conservation. Manipuri is a low-resource Tibeto-Burman tonal language primarily spoken in the northeastern state of Manipur, India. Tone identification is crucial to speech comprehension for tonal languages, where tone defines the word’s meaning. ASR for those languages can perform better by including tonal information from a powerful tone detection system. Despite extensive research on tonal languages such as Mandarin, Thai, Cantonese and Vietnamese, there is a significant gap in exploring Manipuri’s tonal features. This thesis presents the development of a meticulously crafted speech corpus called ManiTo, explicitly designed to analyze the tones of Manipuri. Comprising 17,837 labeled audio samples from twenty native speakers, ManiTo facilitates a nuanced examination of Manipuri’s tonal contrasts. Initial investigations reveal the presence of two distinct tones: Falling and Level. A comprehensive acoustic feature analysis was conducted to differentiate between the two tones to deepen our understanding. Two sets of features, focusing on pitch contours, jitter and shimmer measurements, were explored to delineate Manipuri’s tonal nuances. Various classification algorithms were employed to validate the selected feature sets, including Support Vector Machine, Long Short-Term Memory, Random Forest and k-Nearest Neighbors. Results demonstrate that the second feature set consistently outperformed the first, especially when utilizing the Random Forest classifier. These findings provide crucial insights for advancing speech recognition technology in low-resource tonal languages like Manipuri. This thesis contributes to the broader understanding of tonal languages through the development of ManiTo and the insights gained from acoustic feature analysis. It sets the stage for future research to enhance speech recognition technologies in linguistically diverse and underrepresented languages.enSpeech ProcessingTone RecognitionLow-resourceTonal LanguageManipuri(An) Acoustic Study of Tone Contrasts in Manipuri LanguageThesis