default search action
17th ECCV 2022: Tel Aviv, Israel - Volume 35
- Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner:
Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXV. Lecture Notes in Computer Science 13695, Springer 2022, ISBN 978-3-031-19832-8 - Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson:
Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency. 1-16 - Guodong Ding, Angela Yao:
Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation. 17-32 - James Hong, Haotian Zhang, Michaël Gharbi, Matthew Fisher, Kayvon Fatahalian:
Spotting Temporally Precise, Fine-Grained Events in Video. 33-51 - Nadine Behrmann, S. Alireza Golestaneh, Zico Kolter, Jürgen Gall, Mehdi Noroozi:
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation. 52-68 - Junke Wang, Xitong Yang, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang:
Efficient Video Transformers with Spatial-Temporal Token Selection. 69-86 - Md Mohaiminul Islam, Gedas Bertasius:
Long Movie Clip Classification with State-Space Video Models. 87-104 - Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, Weidi Xie:
Prompting Visual-Language Models for Efficient Video Understanding. 105-124 - Huan Li, Ping Wei, Jiapeng Li, Zeyu Ma, Jiahui Shang, Nanning Zheng:
Asymmetric Relation Consistency Reasoning for Video Relation Grounding. 125-141 - Jiacheng Li, Ruize Han, Haomin Yan, Zekun Qian, Wei Feng, Song Wang:
Self-supervised Social Relation Representation for Human Group Detection. 142-159 - Seong Hyeon Park, Jihoon Tack, Byeongho Heo, Jung-Woo Ha, Jinwoo Shin:
K-centered Patch Sampling for Efficient Video Recognition. 160-176 - Guy Erez, Ron Shapira Weber, Oren Freifeld:
A Deep Moving-Camera Background Model. 177-194 - Eitan Kosman, Dotan Di Castro:
GraphVid: It only Takes a Few Nodes to Understand a Video. 195-212 - Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios Gavves, Fatih Porikli:
Delta Distillation for Efficient Video Processing. 213-229 - David Junhao Zhang, Kunchang Li, Yali Wang, Yunpeng Chen, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou:
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning. 230-248 - Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf:
COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality. 249-266 - Zizhang Li, Mengmeng Wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, Yong Liu:
E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context. 267-284 - Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson:
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks. 285-301 - Woobin Im, Sebin Lee, Sung-Eui Yoon:
Semi-supervised Learning of Optical Flow by Flow Supervisor. 302-318 - Nikita Dvornik, Isma Hadji, Hai X. Pham, Dhaivat Bhatt, Brais Martínez, Afsaneh Fazly, Allan D. Jepson:
Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization. 319-335 - Yiheng Li, Connelly Barnes, Kun Huang, Fang-Lue Zhang:
Deep 360$^\circ $ Optical Flow Estimation Based on Multi-projection Fusion. 336-352 - Fanyi Xiao, Joseph Tighe, Davide Modolo:
MaCLR: Motion-Aware Contrastive Learning of Representations for Videos. 353-370 - Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar:
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection. 371-387 - Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li:
Frozen CLIP Models are Efficient Video Learners. 388-404 - Jiafei Duan, Samson Yu, Soujanya Poria, Bihan Wen, Cheston Tan:
PIP: Physical Interaction Prediction via Mental Simulation with Span Selection. 405-421 - Heeseung Yun, Sehun Lee, Gunhee Kim:
Panoramic Vision Transformer for Saliency Detection in 360$^\circ $ Videos. 422-439 - Aditi Basu Bal, Ramy Mounir, Sathyanarayanan N. Aakur, Sudeep Sarkar, Anuj Srivastava:
Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration. 440-456 - Jingcheng Ni, Nan Zhou, Jie Qin, Qian Wu, Junqi Liu, Boxun Li, Di Huang:
Motion Sensitive Contrastive Learning for Self-supervised Video Representation. 457-474 - Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei:
Dynamic Temporal Filtering in Video Models. 475-492 - Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li:
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification. 493-510 - Lianyu Hu, Liqing Gao, Zekang Liu, Wei Feng:
Temporal Lift Pooling for Continuous Sign Language Recognition. 511-527 - Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang:
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes. 528-545 - Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei:
SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding. 546-562 - Jun Wang, Abhir Bhalerao, Yulan He:
Cross-Modal Prototype Driven Network for Radiology Report Generation. 563-579 - Chuan Guo, Xinxin Zuo, Sen Wang, Li Cheng:
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts. 580-597 - Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji:
SeqTR: A Simple Yet Universal Network for Visual Grounding. 598-615 - Laura Hanu, James Thewlis, Yuki M. Asano, Christian Rupprecht:
VTC: Improving Video-Text Retrieval with User Comments. 616-633 - Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang:
FashionViL: Fashion-Focused Vision-and-Language Representation Learning. 634-651 - Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah:
Weakly Supervised Grounding for VQA in Vision-Language Transformers. 652-670 - Liliane Momeni, Hannah Bull, K. R. Prajwal, Samuel Albanie, Gül Varol, Andrew Zisserman:
Automatic Dense Annotation of Large-Vocabulary Sign Language Videos. 671-690 - Yuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo:
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval. 691-708 - Yuxuan Wang, Difei Gao, Licheng Yu, Weixian Lei, Matt Feiszli, Mike Zheng Shou:
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval. 709-725 - Wei Suo, Mengyang Sun, Kai Niu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu:
A Simple and Robust Correlation Filtering Method for Text-Based Person Search. 726-742
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.