Information fusion in visual question answering: A Survey

Volume: 52, Pages: 268 - 280
Published: Dec 1, 2019
Abstract
Visual question answering automatically answers natural language questions according to the content of an image or video. The task is challenging because it requires the understanding of semantic information in the textual and visual channels, as well as their interplay. A typical solver is composed of three components: feature extraction from singular modality, feature fusion between visual and textual channels, and answer prediction based on...
Paper Details
Title
Information fusion in visual question answering: A Survey
Published Date
Dec 1, 2019
Volume
52
Pages
268 - 280
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.