Multimodal Attention Variants for Visual Question Answering
dc.contributor.author | Mishra, Aakansha | |
dc.date.accessioned | 2023-12-18T06:36:22Z | |
dc.date.available | 2023-12-18T06:36:22Z | |
dc.date.issued | 2023 | |
dc.description | Supervisors: Anand, Ashish and Guha, Prithwijit | en_US |
dc.description.abstract | Visual Question Answering (VQA) is an exciting field of research that involves answering natural language questions asked about an image. This multimodal task requires models to understand the syntax and semantics of the question, interact with the relevant objects in the image, and infer the answer using both image and text semantics. Due to its complex behavior, VQA has gained considerable attention from both vision and natural language research community. | en_US |
dc.identifier.other | ROLL NO.166101010 | |
dc.identifier.uri | https://gyan.iitg.ac.in/handle/123456789/34 | |
dc.language.iso | en | en_US |
dc.relation.ispartofseries | TH-3229 | |
dc.subject | Department of Computer Science and Engineering | en_US |
dc.title | Multimodal Attention Variants for Visual Question Answering | en_US |
dc.type | Thesis | en_US |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: