Analysis of Speech and Music Content for Movie Genre Classification

dc.contributor.authorBhattacherjee, Mrinmoy
dc.date.accessioned2024-08-01T09:56:22Z
dc.date.available2024-08-01T09:56:22Z
dc.date.issued2023
dc.descriptionSupervisors: Guha, Prithwijit and Prasanna, S R Mahadevaen_US
dc.description.abstractMovies are a popular mode of entertainment around the world. The consistent rise in the production and consumption of movies demands more efficient automatic movie content analysis applications. Movie Genre Classification (MGC) is vital for underage censorship, search, retrieval, and targeted publicity. Current trends in MGC literature indicate a focus on short trailers instead of full movies and a multimodal approach. The audio modality is generally used only as an auxiliary channel. However, due to its rich genre-specific information, the audio signal deserves a dedicated study in the current context. Hence, this thesis aims to perform only audio-specific MGC. The thesis has four principal contributions. First, spectral peak tracking-based magnitude spectrum features are proposed for isolated speech and music classification. Second, the underexplored phase component of the audio signals is utilized for discriminating speech and music. The third contribution involves using harmonic-percussive sourceseparated features and classifiers in the multi-task learning framework for identifying speech overlapped with music. Finally, the above proposals are employed for the MGC task. The spectral peak trackingbased method performs better than the other proposals and the baselines. Specific combinations of all the proposed and baseline features provide the overall best performance, even in the cross-dataset scenario. The thesis work can be extended in the future by analyzing the individual constituents of speech and music for a more nuanced representation of movie genres.en_US
dc.identifier.otherROLL NO.156102026
dc.identifier.urihttps://gyan.iitg.ac.in/handle/123456789/2683
dc.language.isoenen_US
dc.relation.ispartofseriesTH-2976;
dc.subjectMovie Genre Classificationen_US
dc.subjectSpeech Music Classificationen_US
dc.subjectAudio Signal Processingen_US
dc.subjectMagnitude Spectrum Featuresen_US
dc.subjectPhase Spectrum Featuresen_US
dc.subjectHarmonic Percussive Source Separationen_US
dc.subjectMulti-task Learningen_US
dc.subjectTime and Feature Attentionen_US
dc.titleAnalysis of Speech and Music Content for Movie Genre Classificationen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
TH-2976_156102026.pdf
Size:
5.4 MB
Format:
Adobe Portable Document Format
Description:
THESIS
No Thumbnail Available
Name:
Abstract-TH-2976_156102026.pdf
Size:
66.1 KB
Format:
Adobe Portable Document Format
Description:
ABSTRACT
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: