UAV (unmanned aerial vehicle) captured images have small pedestrian targets and loss of key information after multiple down sampling, which are difficult to overcome by existing methods. We propose an improved YOLOv4 model for pedestrian detection and counting in UAV images, named YOLO-CC. We used the lightweight YOLOv4 for pedestrian detection, which replaces the backbone with CSPDarknet-34, and two feature layers are fused by FPN (Feature Pyramid Networks). We expanded the perception field using multiscale convolution based on the high-level feature map and generated the population density map by feature dimension reduction. By embedding the density map generation method into the network for end-to-end training, our model can effectively improve the accuracy of detection and counting and make feature extraction more focused on small targets. Our experiments demonstrate that YOLO-CC achieves 21.76 points AP50 higher than that of the original YOLOv4 on the VisDrone2021-counting data set while running faster than the original YOLOv4.
Copyright © 2022 Hao Kong et al.