Compositional Temporal Visual Grounding of Natural Language Event Descriptions.

Abstract
Temporal grounding entails establishing a correspondence between natural language event descriptions and their visual depictions. Compositional modeling becomes central: we first ground atomic descriptions girl null an apple, batter hitting the ball to short video segments, and then establish the temporal relationships between the segments. This compositional structure enables models to recognize a wider variety of events not seen during...
Paper Details
Title
Compositional Temporal Visual Grounding of Natural Language Event Descriptions.
Published Date
Dec 4, 2019
Citation AnalysisPro
  • Scinapse’s Top 10 Citation Journals & Affiliations graph reveals the quality and authenticity of citations received by a paper.
  • Discover whether citations have been inflated due to self-citations, or if citations include institutional bias.