Multimodal Attention Variants for Visual Question Answering

Mishra, Aakansha

Multimodal Attention Variants for Visual Question Answering

Files

Abstract-TH-3229_166101010.pdf (116.74 KB)

TH-3229_166101010.pdf (18.34 MB)

Date

2023

Authors

Mishra, Aakansha

Abstract

Visual Question Answering (VQA) is an exciting field of research that involves answering natural language questions asked about an image. This multimodal task requires models to understand the syntax and semantics of the question, interact with the relevant objects in the image, and infer the answer using both image and text semantics. Due to its complex behavior, VQA has gained considerable attention from both vision and natural language research community.

Description

Supervisors: Anand, Ashish and Guha, Prithwijit

Keywords

Department of Computer Science and Engineering

URI

https://gyan.iitg.ac.in/handle/123456789/34

Collections

PhD Theses (Computer Science and Engineering)

Full item page

Gyan-IR

Multimodal Attention Variants for Visual Question Answering

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By