Multimodal Attention Variants for Visual Question Answering

dc.contributor.author	Mishra, Aakansha
dc.date.accessioned	2023-12-18T06:36:22Z
dc.date.available	2023-12-18T06:36:22Z
dc.date.issued	2023
dc.description	Supervisors: Anand, Ashish and Guha, Prithwijit	en_US
dc.description.abstract	Visual Question Answering (VQA) is an exciting field of research that involves answering natural language questions asked about an image. This multimodal task requires models to understand the syntax and semantics of the question, interact with the relevant objects in the image, and infer the answer using both image and text semantics. Due to its complex behavior, VQA has gained considerable attention from both vision and natural language research community.	en_US
dc.identifier.other	ROLL NO.166101010
dc.identifier.uri	https://gyan.iitg.ac.in/handle/123456789/34
dc.language.iso	en	en_US
dc.relation.ispartofseries	TH-3229
dc.subject	Department of Computer Science and Engineering	en_US
dc.title	Multimodal Attention Variants for Visual Question Answering	en_US
dc.type	Thesis	en_US

Files

Now showing 1 - 2 of 2

Now showing 1 - 1 of 1