Significance of Hashtags for Improved Topic Modeling on Tweets

dc.contributor.authorKumar, Durgesh
dc.date.accessioned2023-01-06T06:35:23Z
dc.date.accessioned2023-10-20T04:37:31Z
dc.date.available2023-01-06T06:35:23Z
dc.date.available2023-10-20T04:37:31Z
dc.date.issued2022
dc.descriptionSupervisor: Singh, Sanasam Ranbiren_US
dc.description.abstractWith the increase in Twitter's popularity, topic modeling on Twitter has become an important problem with applications in diverse fields such as text summarization, document clustering, information retrieval, and sentiment analysis. The short and noisy tweets with informal writing style make topic modeling on tweets more challenging due to increased data sparsity and under-specificity. Latent Dirichlet Allocation (LDA), one of the widely used topic models, suffers from data sparsity and under-specificity. Researchers have tried to counter the data sparsity and under-specificity in tweets by adding related content from external sources such as News pages and Wikipedia or pooling related tweets to pseudo documents. Adding the content from external resources is non-trivial due to differences in writing styles and vocabulary. Moreover, Topic modeling on pooled documents may lose the distribution of topics over the individual tweet and increase the corpus size due to duplicate tweets in different pools. From earlier studies and our preliminary investigation, hashtags are found to provide necessary meta-information in linking tweets to the underlying topics. Motivated by the above observation, this thesis proposes two approaches to counter the data sparsity and under-specificity in tweets for topic modeling tasks: i) expanding tweets with semantically related hashtags, and ii) prioritization of selected hashtags. From various experimental results, it is evident that our proposed methods enhance the topic modeling performance either by i) tweet expansion with semantically related hashtags or ii) incorporating prioritized hashtags in LDA. Furthermore, this thesis investigates the effect of LDA in relation prediction as a case study by exploiting topic and entity relation. It is observed that event-centric relations are effectively predicted using topic modeling over news articles.en_US
dc.identifier.otherROLL NO.126101002
dc.identifier.urihttps://gyan.iitg.ac.in/handle/123456789/2238
dc.language.isoenen_US
dc.relation.ispartofseriesTH-2775;
dc.subjectTopic Modelingen_US
dc.subjectHashtagsen_US
dc.subjectTweet Expansionen_US
dc.subjectHashtag Prioritizationen_US
dc.subjectSocial Network Anlaysisen_US
dc.titleSignificance of Hashtags for Improved Topic Modeling on Tweetsen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Abstract-TH-2775_126101002.pdf
Size:
106.63 KB
Format:
Adobe Portable Document Format
Description:
ABSTRACT
No Thumbnail Available
Name:
TH-2775_126101002.pdf
Size:
4.59 MB
Format:
Adobe Portable Document Format
Description:
THESIS
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: