Abstract
We propose an efficient algorithm to jointly estimate geometry and semantics for a given geographical region observed by multiple satellite images. Our joint estimation leverages an efficient PatchMatch inference framework defined over lattice discretization of the environment. Our cost function relies on the local planarity assumption to model scene geometry and neural network classification to determine semantic (e.g. land use) labels for geometric structures. By utilizing the commonly available direct (i.e. space to image) rational polynomial coefficients (RPC) satellite camera models, our approach effectively circumvents the need for estimating or refining inverse RPC models. Experiments illustrate both the computational efficiency and high quality scene geometry estimates attained by our approach for satellite imagery. To further illustrate the generality of our representation and inference framework, experiments on standard benchmarks for ground-level imagery are also included.