Image captions: global-local and joint signals attention model (GL-JSAM)

Volume: 79, Issue: 33-34, Pages: 24429 - 24448
Published: Jun 22, 2020
Abstract
For automated visual captioning, existing neural encoder-decoder methods commonly use a simple sequence-to-sequence or an attention-based mechanism. The attention-based models pay attention to specific visual areas or objects; using a single heat map that indicates which portion of the image is most important rather than treating the objects (within the image) equally. These models are usually a mixture of Convolutional Neural Network (CNN) and...
Paper Details
Title
Image captions: global-local and joint signals attention model (GL-JSAM)
Published Date
Jun 22, 2020
Volume
79
Issue
33-34
Pages
24429 - 24448
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.