Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3475372acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ASFD: Automatic and Scalable Face Detector

Published: 17 October 2021 Publication History

Abstract

Along with current multi-scale based detectors, Feature Aggregation and Enhancement (FAE) modules have shown superior performance gains for cutting-edge object detection. However, these hand-crafted FAE modules show inconsistent improvements on face detection, which is mainly due to the significant distribution difference between its training and applying corpus, i.e. COCO vs. WIDER Face. To tackle this problem, we essentially analyse the effect of data distribution, and consequently propose to search an effective FAE architecture, termed AutoFAE by a differentiable architecture search, which outperforms all existing FAE modules in face detection with a considerable margin. Upon the found AutoFAE and existing backbones, a supernet is further built and trained, which automatically obtains a family of detectors under the different complexity constraints. Extensive experiments conducted on popular benchmarks, i.e. WIDER Face and FDDB, demonstrate the state-of-the-art performance-efficiency trade-off for the proposed automatic and scalable face detector (ASFD) family. In particular, our strong ASFD-D6 outperforms the best competitor with AP 96.7/96.2/92.1 on WIDER Face test, and the lightweight ASFD-D0 costs about 3.1 ms, i.e. more than 320 FPS, on the V100 GPU with VGA-resolution images.

References

[1]
Han Cai, Chuang Gan, and Song Han. 2020. Once for all: Train one network and specialize it for efficient deployment. In ICLR.
[2]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv (2017).
[3]
Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019 a. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV. 1294--1303.
[4]
Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. 2019 b. DetNAS: Backbone search for object detection. In NIPS. 6642--6652.
[5]
Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2019 a. Selective refinement network for high performance face detection. In AAAI, Vol. 33. 8231--8238.
[6]
Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2019 b. Selective refinement network for high performance face detection. In AAAI, Vol. 33. 8231--8238.
[7]
Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. 2019. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv (2019).
[8]
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019).
[9]
Xuanyi Dong and Yi Yang. 2019. Searching for a robust neural architecture in four gpu hours. In CVPR. 1761--1770.
[10]
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In CVPR. 7036--7045.
[11]
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single path one-shot neural architecture search with uniform sampling. ECCV.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[13]
Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. 2020. Curricularface: adaptive curriculum learning loss for deep face recognition. In CVPR. 5901--5910.
[14]
Vidit Jain and Erik Learned-Miller. 2010. FDDB: A benchmark for face detection in unconstrained settings. Technical Report. UMass Amherst technical report.
[15]
Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2019. DSFD: Dual shot face detector. In CVPR. 5060--5069.
[16]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In CVPR. 2117--2125.
[17]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal loss for dense object detection. In ICCV. 2980--2988.
[18]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.
[19]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. Darts: Differentiable architecture search. ICLR.
[20]
Songtao Liu, Di Huang, et al. 2018a. Receptive field block net for accurate and fast object detection. In ECCV. 385--400.
[21]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018b. Path aggregation network for instance segmentation. In CVPR. 8759--8768.
[22]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In ECCV. Springer, 21--37.
[23]
Yang Liu and Xu Tang. 2020. BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector. In CVPR. 13568--13577.
[24]
Hongyu Pan, Hu Han, Shiguang Shan, and Xilin Chen. 2018. Mean-variance loss for deep age estimation from a face. In CVPR. 5285--5294.
[25]
Siyuan Qiao, Liang-Chieh Chen, and Alan Yuille. 2020. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution. arXiv (2020).
[26]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[27]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv (2014).
[28]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
[29]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818--2826.
[30]
Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, and Yu Chen. 2019. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI, Vol. 33. 8893--8900.
[31]
Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv (2019).
[32]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In CVPR. 10781--10790.
[33]
Xu Tang, Daniel K Du, Zeqiang He, and Jingtuo Liu. 2018. Pyramidbox: A context-assisted single shot face detector. In ECCV. 797--813.
[34]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE international conference on computer vision. 9627--9636.
[35]
Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International journal of computer vision, Vol. 57, 2 (2004), 137--154.
[36]
Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, and Chunhua Shen. 2019. NAS-FCOS: Fast neural architecture search for object detection. arXiv preprint arXiv:1906.04423 (2019).
[37]
Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, and Zhenguo Li. 2019. Auto-FPN: Automatic network architecture adaptation for object detection beyond classification. In ICCV. 6649--6658.
[38]
Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2020. PC-darts: Partial channel connections for memory-efficient differentiable architecture search. ICLR (2020).
[39]
Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. Wider face: A face detection benchmark. In CVPR. 5525--5533.
[40]
YoungJoon Yoo, Dongyoon Han, and Sangdoo Yun. 2019. EXTD: Extremely tiny face detector via iterative filter reuse. arXiv (2019).
[41]
Bin Zhang, Jian Li, Yabiao Wang, Zhipeng Cui, Yili Xia, Chengjie Wang, Jilin Li, and Feiyue Huang. 2020 b. ACFD: Asymmetric Cartoon Face Detector. arXiv (2020).
[42]
Shifeng Zhang, Cheng Chi, Zhen Lei, and Stan Z Li. 2020 a. RefineFace: Refinement neural network for high performance face detection. IEEE TPAMI (2020).
[43]
Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z Li. 2017. S3FD: Single shot scale-invariant face detector. In ICCV. 192--201.
[44]
Barret Zoph and Quoc V Le. 2017. Neural architecture search with reinforcement learning. ICLR.
[45]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. CVPR, 8697--8710.

Cited By

View all
  • (2025)FeatherFace: Robust and Lightweight Face Detection via Optimal Feature IntegrationElectronics10.3390/electronics1403051714:3(517)Online publication date: 27-Jan-2025
  • (2025)FAQIVS: Face Query-based Interactive Video Synopsis*Automatika10.1080/00051144.2025.245998766:2(217-236)Online publication date: 17-Feb-2025
  • (2024)PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient GuidanceSensors10.3390/s2422711224:22(7112)Online publication date: 5-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compound scaling
  2. face detection
  3. multi-task loss
  4. neural architecture search

Qualifiers

  • Research-article

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)FeatherFace: Robust and Lightweight Face Detection via Optimal Feature IntegrationElectronics10.3390/electronics1403051714:3(517)Online publication date: 27-Jan-2025
  • (2025)FAQIVS: Face Query-based Interactive Video Synopsis*Automatika10.1080/00051144.2025.245998766:2(217-236)Online publication date: 17-Feb-2025
  • (2024)PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient GuidanceSensors10.3390/s2422711224:22(7112)Online publication date: 5-Nov-2024
  • (2024)Online Learning State Evaluation Method Based on Face Detection and Head Pose EstimationSensors10.3390/s2405136524:5(1365)Online publication date: 20-Feb-2024
  • (2024)Task-aware image quality estimators for face detectionJournal on Image and Video Processing10.1186/s13640-024-00660-12024:1Online publication date: 20-Dec-2024
  • (2024)Prompting Industrial Anomaly Segment with Large Vision-Language ModelsProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700192(1-1)Online publication date: 3-Dec-2024
  • (2023)PVG: Progressive Vision Graph for Vision RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612122(2477-2486)Online publication date: 26-Oct-2023
  • (2023)YuNet: A Tiny Millisecond-level Face DetectorMachine Intelligence Research10.1007/s11633-023-1423-y20:5(656-665)Online publication date: 19-Apr-2023
  • (2023)EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face DetectionPattern Recognition10.1007/978-3-031-47637-2_6(74-87)Online publication date: 5-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media