Abstract
Vehicle detection is still a challenge in complex traffic scenes, especially for vehicles of tiny scales. Though RCNN based two-stage detectors have demonstrated considerably good performance, less attention has been paid to the quality of the first stage, where, however, tiny vehicles are very likely to be missed. In this paper, we propose a deep network for accurate vehicle detection, with the main idea of using a relatively large feature map for proposal generation, and keeping ROI feature's spatial layout to represent and detect tiny vehicles. However, large feature maps in lower levels of a deep network generally contain limited discriminant information. To address this, we introduce a backward feature enhancement operation, which absorbs higher level information step by step to enhance the base feature map. By doing so, even with only 100 proposals, the resulting proposal network achieves an encouraging recall over 99%. Furthermore, unlike a common practice which flatten features after ROI pooling, we argue that for a better detection of tiny vehicles, the spatial layout of the ROI features should be preserved and fully integrated. Accordingly, we use a multi-path light-weight processing chain to effectively integrate ROI features, while preserving the spatial layouts. Experiments done on the challenging DETRAC vehicle detection benchmark show that the proposed method largely improves a competitive baseline (ResNet50 based Faster RCNN) by 16.5% mAP, and it outperforms all previously published and unpublished results.