Abstract
Convolutional neural networks are a popular choice for current object detection and classification systems. Their performance improves constantly but for effective training, large, hand-labeled datasets are required. We address the problem of obtaining customized, yet large enough datasets for CNN training by synthesizing them in a virtual world, thus eliminating the need for tedious human interaction for ground truth creation. We developed a CNN-based multi-class detection system that was trained solely on virtual world data and achieves competitive results compared to state-of-the-art detection systems.