Image captions: global-local and joint signals attention model (GL-JSAM)

Nuzhat Naqvi; Zhongfu Ye

doi:https://doi.org/10.1007/s11042-020-09128-6

doi.org/10.1007/s11042-020-09128-6

Original paper

Image captions: global-local and joint signals attention model (GL-JSAM)

,

Multimedia Tools and Applications3.00

Volume: 79, Issue: 33-34, Pages: 24429 - 24448

Published: Jun 22, 2020

Abstract

For automated visual captioning, existing neural encoder-decoder methods commonly use a simple sequence-to-sequence or an attention-based mechanism. The attention-based models pay attention to specific visual areas or objects; using a single heat map that indicates which portion of the image is most important rather than treating the objects (within the image) equally. These models are usually a mixture of Convolutional Neural Network (CNN) and...

Paper Fields

Paper Details

Title

Image captions: global-local and joint signals attention model (GL-JSAM)

DOI

doi.org/10.1007/s11042-020-09128-6

Published Date

Jun 22, 2020

Journal

Multimedia Tools and Applications

Volume

79

Issue

33-34

Pages

24429 - 24448

Citation AnalysisPro

You’ll need to upgrade your plan to Pro

Looking to understand the true influence of a researcher’s work across journals & affiliations?

Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.

More information

Notes

History