Multimodal Attention Variants for Visual Question Answering

dc.contributor.authorMishra, Aakansha
dc.date.accessioned2023-12-18T06:36:22Z
dc.date.available2023-12-18T06:36:22Z
dc.date.issued2023
dc.descriptionSupervisors: Anand, Ashish and Guha, Prithwijit
dc.description.abstractVisual Question Answering (VQA) is an exciting field of research that involves answering natural language questions asked about an image. This multimodal task requires models to understand the syntax and semantics of the question, interact with the relevant objects in the image, and infer the answer using both image and text semantics. Due to its complex behavior, VQA has gained considerable attention from both vision and natural language research community.
dc.identifier.otherROLL NO.166101010
dc.identifier.urihttps://gyan.iitg.ac.in/handle/123456789/34
dc.language.isoen
dc.relation.ispartofseriesTH-3229
dc.subjectDepartment of Computer Science and Engineeringen_US
dc.titleMultimodal Attention Variants for Visual Question Answering
dc.typeThesis
Files
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
Abstract-TH-3229_166101010.pdf
Size:
116.74 KB
Format:
Adobe Portable Document Format
Description:
ABSTRACT
No Thumbnail Available
Name:
TH-3229_166101010.pdf
Size:
18.34 MB
Format:
Adobe Portable Document Format
Description:
THESIS
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: