Here we extend the task by considering pronouns as well. First, we construct a dataset of phrase grounding with both noun phrases and pronouns to image regions.
Oct 23, 2022 · Abstract:Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, ...
We extend the task of phrase grounding by taking account of pronouns, and correspond- ingly establish a new dataset manually, named. VD-Ref, which is the first ...
Experiments show that pronouns are easier to ground than noun phrases, where the possible reason might be that these pronouns are much less ambiguous, ...
Phrase-Grounding-with-Pronoun-baseline ... Code and data for [EMNLP 22] Extending Phrase Grounding with Pronouns in Visual Dialogues.
Sep 20, 2024 · This task has seen significant progress since the introduction of the Flickr30k Entities dataset [48] and has played a crucial role in learning ...
Given an image and a corresponding caption, the **Phrase Grounding** task aims to ground each entity mentioned by a noun phrase in the caption to a region ...
Oct 22, 2022 · VD-Ref is a dataset with ground-truth mappings from both noun phrases and pronouns to image regions. This dataset contains a set of 10k complete sets.
We present details about our two-level LLM prompts to obtain high-quality paraphrases for region-centric phrases, provide examples of such extracted ...
Consider image grounding noun phrases from a given sentence: “Alady sitting on acolorfuldecoration with a bouquet of flowers, that match her hair, in her hand.”.
Missing: Dialogues. | Show results with:Dialogues.