Abstract
We propose a joint learning method for object classification and localization using 3D color texture features and geometry-based segmentation from weakly-labeled 3D color datasets. Recently, new consumer cameras such as Microsoft's Kinect produce not only color images but also depth images. These reduce the difficulty of object detection dramatically for the following reasons: (a) reasonable candidates for object segments can be given by detecting spatial discontinuity, and (b) 3D features that are robust to view-point variance can be extracted. The proposed method lists candidate segments by evaluating difference in angle between the surface normals of 3D points, extracts global 3D features from each segment, and learns object classifiers using Multiple Instance Learning with object labels attached to 3D color scenes. Experimental results show that the rotation invariance and scale invariance of features are crucial for solving this problem.