Abstract
Over the last decade, a large number of methods have been proposed for human fall detection. Most existing methods were evaluated based on trimmed datasets. More importantly, these datasets lack variety of falls, subjects, views and modalities. This paper makes two contributions in the topic of automatic human fall detection. Firstly, to address the above issues, we introduce a large continuous multimodal multivew dataset of human fall, namely CMDFALL. Our CMDFALL dataset was built by capturing activities from 50 subjects, with seven overlapped Kinect sensors and two wearable accelerometers. Each subject performs 20 activities including 8 falls of different styles and 12 daily activities. All multi-modal multi-view data (RGB, depth, skeleton, acceleration) are time-synchronized and annotated for evaluating performance of recognition algorithms of human activities or human fall in indoor environment. Secondly, based on the multimodal property of the dataset, we investigate the role of each modality to get the best results in the context of human activity recognition. To this end, we adopt existing baseline techniques which have been shown to be very efficient for each data modality such as C3D convnet on RGB; DMM-KDES on depth; Res-TCN on skeleton and 2D convnet on acceleration data. We analyze to show which modalities and their combination give the best performance.