research-article

ASFD: Automatic and Scalable Face Detector

Authors:

Xiaoming Huang,

Yili XiaAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2139 - 2147

https://doi.org/10.1145/3474085.3475372

Published: 17 October 2021 Publication History

Abstract

Along with current multi-scale based detectors, Feature Aggregation and Enhancement (FAE) modules have shown superior performance gains for cutting-edge object detection. However, these hand-crafted FAE modules show inconsistent improvements on face detection, which is mainly due to the significant distribution difference between its training and applying corpus, i.e. COCO vs. WIDER Face. To tackle this problem, we essentially analyse the effect of data distribution, and consequently propose to search an effective FAE architecture, termed AutoFAE by a differentiable architecture search, which outperforms all existing FAE modules in face detection with a considerable margin. Upon the found AutoFAE and existing backbones, a supernet is further built and trained, which automatically obtains a family of detectors under the different complexity constraints. Extensive experiments conducted on popular benchmarks, i.e. WIDER Face and FDDB, demonstrate the state-of-the-art performance-efficiency trade-off for the proposed automatic and scalable face detector (ASFD) family. In particular, our strong ASFD-D6 outperforms the best competitor with AP 96.7/96.2/92.1 on WIDER Face test, and the lightweight ASFD-D0 costs about 3.1 ms, i.e. more than 320 FPS, on the V100 GPU with VGA-resolution images.

References

[1]

Han Cai, Chuang Gan, and Song Han. 2020. Once for all: Train one network and specialize it for efficient deployment. In ICLR.

[2]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv (2017).

[3]

Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019 a. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV. 1294--1303.

[4]

Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. 2019 b. DetNAS: Backbone search for object detection. In NIPS. 6642--6652.

Digital Library

[5]

Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2019 a. Selective refinement network for high performance face detection. In AAAI, Vol. 33. 8231--8238.

[6]

Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z Li, and Xudong Zou. 2019 b. Selective refinement network for high performance face detection. In AAAI, Vol. 33. 8231--8238.

[7]

Xiangxiang Chu, Bo Zhang, Ruijun Xu, and Jixiang Li. 2019. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv (2019).

[8]

Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019).

[9]

Xuanyi Dong and Yi Yang. 2019. Searching for a robust neural architecture in four gpu hours. In CVPR. 1761--1770.

[10]

Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In CVPR. 7036--7045.

[11]

Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single path one-shot neural architecture search with uniform sampling. ECCV.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[13]

Yuge Huang, Yuhan Wang, Ying Tai, Xiaoming Liu, Pengcheng Shen, Shaoxin Li, Jilin Li, and Feiyue Huang. 2020. Curricularface: adaptive curriculum learning loss for deep face recognition. In CVPR. 5901--5910.

[14]

Vidit Jain and Erik Learned-Miller. 2010. FDDB: A benchmark for face detection in unconstrained settings. Technical Report. UMass Amherst technical report.

[15]

Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2019. DSFD: Dual shot face detector. In CVPR. 5060--5069.

[16]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In CVPR. 2117--2125.

[17]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal loss for dense object detection. In ICCV. 2980--2988.

[18]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.

[19]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. Darts: Differentiable architecture search. ICLR.

[20]

Songtao Liu, Di Huang, et al. 2018a. Receptive field block net for accurate and fast object detection. In ECCV. 385--400.

[21]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018b. Path aggregation network for instance segmentation. In CVPR. 8759--8768.

[22]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. SSD: Single shot multibox detector. In ECCV. Springer, 21--37.

[23]

Yang Liu and Xu Tang. 2020. BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector. In CVPR. 13568--13577.

[24]

Hongyu Pan, Hu Han, Shiguang Shan, and Xilin Chen. 2018. Mean-variance loss for deep age estimation from a face. In CVPR. 5285--5294.

[25]

Siyuan Qiao, Liang-Chieh Chen, and Alan Yuille. 2020. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution. arXiv (2020).

[26]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

Digital Library

[27]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv (2014).

[28]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.

Digital Library

[29]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818--2826.

[30]

Ying Tai, Yicong Liang, Xiaoming Liu, Lei Duan, Jilin Li, Chengjie Wang, Feiyue Huang, and Yu Chen. 2019. Towards highly accurate and stable face alignment for high-resolution videos. In AAAI, Vol. 33. 8893--8900.

[31]

Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv (2019).

[32]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In CVPR. 10781--10790.

[33]

Xu Tang, Daniel K Du, Zeqiang He, and Jingtuo Liu. 2018. Pyramidbox: A context-assisted single shot face detector. In ECCV. 797--813.

[34]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE international conference on computer vision. 9627--9636.

[35]

Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International journal of computer vision, Vol. 57, 2 (2004), 137--154.

Digital Library

[36]

Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, and Chunhua Shen. 2019. NAS-FCOS: Fast neural architecture search for object detection. arXiv preprint arXiv:1906.04423 (2019).

[37]

Hang Xu, Lewei Yao, Wei Zhang, Xiaodan Liang, and Zhenguo Li. 2019. Auto-FPN: Automatic network architecture adaptation for object detection beyond classification. In ICCV. 6649--6658.

[38]

Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2020. PC-darts: Partial channel connections for memory-efficient differentiable architecture search. ICLR (2020).

[39]

Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. Wider face: A face detection benchmark. In CVPR. 5525--5533.

[40]

YoungJoon Yoo, Dongyoon Han, and Sangdoo Yun. 2019. EXTD: Extremely tiny face detector via iterative filter reuse. arXiv (2019).

[41]

Bin Zhang, Jian Li, Yabiao Wang, Zhipeng Cui, Yili Xia, Chengjie Wang, Jilin Li, and Feiyue Huang. 2020 b. ACFD: Asymmetric Cartoon Face Detector. arXiv (2020).

[42]

Shifeng Zhang, Cheng Chi, Zhen Lei, and Stan Z Li. 2020 a. RefineFace: Refinement neural network for high performance face detection. IEEE TPAMI (2020).

[43]

Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z Li. 2017. S3FD: Single shot scale-invariant face detector. In ICCV. 192--201.

[44]

Barret Zoph and Quoc V Le. 2017. Neural architecture search with reinforcement learning. ICLR.

[45]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. CVPR, 8697--8710.

Cited By

Kim DJung JKim J(2025)FeatherFace: Robust and Lightweight Face Detection via Optimal Feature IntegrationElectronics10.3390/electronics1403051714:3(517)Online publication date: 27-Jan-2025
https://doi.org/10.3390/electronics14030517
Chandrasekaran NBalasubramanian Y(2025)FAQIVS: Face Query-based Interactive Video Synopsis*Automatika10.1080/00051144.2025.245998766:2(217-236)Online publication date: 17-Feb-2025
https://doi.org/10.1080/00051144.2025.2459987
Min GLee THeo Y(2024)PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient GuidanceSensors10.3390/s2422711224:22(7112)Online publication date: 5-Nov-2024
https://doi.org/10.3390/s24227112
Show More Cited By

Index Terms

ASFD: Automatic and Scalable Face Detector
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Face Recognition Based Person Specific Identification for Video Surveillance Applications
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

Face detection is an important aspect for applications like biometrics, video surveillance and human computer interaction. Videos provide abundant information and also that can be leveraged by temporal variations in pose, expression changes and ...
Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

We propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP)...
YOLO-face: a real-time face detector
Abstract
Face detection is one of the important tasks of object detection. Typically detection is the first stage of pattern recognition and identity authentication. In recent years, deep learning-based algorithms in object detection have grown rapidly. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
148
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim DJung JKim J(2025)FeatherFace: Robust and Lightweight Face Detection via Optimal Feature IntegrationElectronics10.3390/electronics1403051714:3(517)Online publication date: 27-Jan-2025
https://doi.org/10.3390/electronics14030517
Chandrasekaran NBalasubramanian Y(2025)FAQIVS: Face Query-based Interactive Video Synopsis*Automatika10.1080/00051144.2025.245998766:2(217-236)Online publication date: 17-Feb-2025
https://doi.org/10.1080/00051144.2025.2459987
Min GLee THeo Y(2024)PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient GuidanceSensors10.3390/s2422711224:22(7112)Online publication date: 5-Nov-2024
https://doi.org/10.3390/s24227112
Li BLiu P(2024)Online Learning State Evaluation Method Based on Face Detection and Head Pose EstimationSensors10.3390/s2405136524:5(1365)Online publication date: 20-Feb-2024
https://doi.org/10.3390/s24051365
Singh PReibman A(2024)Task-aware image quality estimators for face detectionJournal on Image and Video Processing10.1186/s13640-024-00660-12024:1Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1186/s13640-024-00660-1
zhou jLiu WYang GZhao Hyuan f(2024)Prompting Industrial Anomaly Segment with Large Vision-Language ModelsProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700192(1-1)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700192
Wu JLi JZhang JZhang BChi MWang YWang CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)PVG: Progressive Vision Graph for Vision RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612122(2477-2486)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612122
Wu WPeng HYu S(2023)YuNet: A Tiny Millisecond-level Face DetectorMachine Intelligence Research10.1007/s11633-023-1423-y20:5(656-665)Online publication date: 19-Apr-2023
https://doi.org/10.1007/s11633-023-1423-y
Wang GLi JXie JXu JYang B(2023)EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face DetectionPattern Recognition10.1007/978-3-031-47637-2_6(74-87)Online publication date: 5-Nov-2023
https://doi.org/10.1007/978-3-031-47637-2_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten