Abstract
Piecewise planar models for stereo have recently become popular for modeling indoor and urban outdoor scenes. The strong planarity assumption overcomes the challenges presented by poorly textured surfaces, and results in low complexity 3D models for rendering, storage, and transmission. However, such a model performs poorly in the presence of non-planar objects, for example, bushes, trees, and other clutter present in many scenes. We present a stereo method capable of handling more general scenes containing both planar and non-planar regions. Our proposed technique segments an image into piecewise planar regions as well as regions labeled as non-planar. The non-planar regions are modeled by the results of a standard multi-view stereo algorithm. The segmentation is driven by multi-view photoconsistency as well as the result of a color-and texture-based classifier, learned from hand-labeled planar and non-planar image regions. Additionally our method links and fuses plane hypotheses across multiple overlapping views, ensuring a consistent 3D reconstruction over an arbitrary number of images. Using our system, we have reconstructed thousands of frames of street-level video. Results show our method successfully recovers piecewise planar surfaces alongside general 3D surfaces in challenging scenes containing large buildings as well as residential houses.