default search action
17th ECCV 2022: Tel Aviv, Israel - Volume 36
- Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner:
Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVI. Lecture Notes in Computer Science 13696, Springer 2022, ISBN 978-3-031-20058-8 - Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie L. Hyland, Maria Wetscherek, Tristan Naumann, Aditya V. Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay:
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing. 1-21 - Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, Xuming He:
Generative Negative Text Replay for Continual Vision-Language Pretraining. 22-38 - Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan:
Video Graph Transformer for Video Question Answering. 39-58 - Kun Yan, Lei Ji, Chenfei Wu, Jianmin Bao, Ming Zhou, Nan Duan, Shuai Ma:
Trace Controlled Text to Image Generation. 59-75 - A. J. Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova:
Video Question Answering with Iterative Video-Text Co-tokenization. 76-94 - Long Chen, Yuhang Zheng, Jun Xiao:
Rethinking Data Augmentation for Robust Visual Question Answering. 95-112 - Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao:
Explicit Image Caption Editing. 113-129 - Jiachang Hao, Haifeng Sun, Pengfei Ren, Jingyu Wang, Qi Qi, Jianxin Liao:
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding. 130-147 - Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach:
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly. 148-166 - Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani:
GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features. 167-184 - Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyeong Kim, Hee Suk Yoon, Chang D. Yoo:
Selective Query-Guided Debiasing for Video Corpus Moment Retrieval. 185-200 - Cheng Shi, Sibei Yang:
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding. 201-218 - Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim:
Object-Centric Unsupervised Image Captioning. 219-235 - Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu, Osamu Yoshie, Yubo Chen:
Contrastive Vision-Language Pre-training with Limited Resources. 236-253 - Sheng Fang, Shuhui Wang, Junbao Zhuo, Xinzhe Han, Qingming Huang:
Learning Linguistic Association Towards Efficient Text-Video Retrieval. 254-270 - Zanming Huang, Zhongkai Shangguan, Jimuyang Zhang, Gilad Bar, Matthew Boyd, Eshed Ohn-Bar:
ASSISTER: Assistive Navigation via Conditional Instruction Generation. 271-289 - Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto:
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks. 290-308 - Wenhao Cheng, Xingping Dong, Salman H. Khan, Jianbing Shen:
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. 309-329 - Qingpei Guo, Kaisheng Yao, Wei Chu:
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input. 330-346 - Bowen Li:
Word-Level Fine-Grained Story Visualization. 347-362 - Qi Zhang, Yuqing Song, Qin Jin:
Unifying Event Detection and Captioning as Sequence Generation via Pre-training. 363-379 - Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan:
Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation. 380-397 - Christopher Thomas, Yipeng Zhang, Shih-Fu Chang:
Fine-Grained Visual Entailment. 398-416 - Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki:
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds. 417-433 - Yifeng Zhang, Ming Jiang, Qi Zhao:
New Datasets and Models for Contextual Reasoning in Visual Dialog. 434-451 - Joanna Hong, Minsu Kim, Yong Man Ro:
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. 452-468 - Matan Levy, Rami Ben-Ari, Dani Lischinski:
Classification-Regression for Chart Comprehension. 469-484 - Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant. 485-501 - Weicheng Kuo, Fred Bertsch, Wei Li, A. J. Piergiovanni, Mohammad Saffar, Anelia Angelova:
FindIt: Generalized Localization with Natural Language Queries. 502-520 - Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling. 521-539 - Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin:
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels. 540-557 - Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi:
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning. 558-575 - Minsu Kim, Hyunjun Kim, Yong Man Ro:
Speaker-Adaptive Lip Reading with User-Dependent Padding. 576-593 - Tan M. Dinh, Rang Nguyen, Binh-Son Hua:
TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation. 594-609 - Morgan Heisler, Amin Banitalebi-Dehkordi, Yong Zhang:
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding. 610-626 - Myungsub Choi:
Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance. 627-643 - Reuben Tan, Bryan A. Plummer, Kate Saenko, J. P. Lewis, Avneesh Sud, Thomas Leung:
NewsStories: Illustrating Articles with Visual Summaries. 644-661 - Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi:
Webly Supervised Concept Expansion for General Purpose Vision Models. 662-681 - Kaiwen Zhou, Xin Eric Wang:
FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation. 682-699 - Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang:
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval. 700-716 - Tsu-Jui Fu, Xin Eric Wang, William Yang Wang:
Language-Driven Artistic Style Transfer. 717-734 - Zaid Khan, B. G. Vijay Kumar, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu:
Single-Stream Multi-level Alignment for Vision-Language Pretraining. 735-751
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.