Abstract
Many applications are characterized by having naturally incomplete data on customers - where data on only some fixed set of local variables is gathered. However, having a more complete picture can help build better models. The na?ve solution to this problem - acquiring complete data for all customers - is often impractical due to the costs of doing so. A possible alternative is to acquire complete data for "some" customers and to use this to improve the models built. The data acquisition problem is determining how many, and which, customers to acquire additional data from. In this paper we suggest using active learning based approaches for the data acquisition problem. In particular, we present initial methods for data acquisition and evaluate these methods experimentally on web usage data and UCI datasets. Results show that the methods perform well and indicate that active learning based methods for data acquisition can be a promising area for data mining research.