2015 IEEE 31st International Conference on Data Engineering (ICDE)
Download PDF

Abstract

Big data will enable the development of novel services that enhance a company's market advantage, competition, or productivity. At the same time, the utilization of such a service could disclose sensitive data in the process, which raises significant privacy concerns. To protect individuals, various policies, such as the Code of Fair Information Practices, as well as recent laws require organizations to capture only the minimal amount of data necessary to support a service. While this is a notable goal, choosing the minimal data is a non-trivial process, especially while considering privacy and utility constraints. In this paper, we introduce a technique to minimize sensitive data disclosure by focusing on privacy-aware feature selection. During model deployment, the service provider requests only a subset of the available features from the client, such that it can produce results with maximal confidence, while minimizing its ability to violate a client's privacy. We propose an iterative approach, where the server requests information one feature at a time until the client-specified privacy budget is exhausted. The overall process is dynamic, such that the feature selected at each step depends on the previously selected features and their corresponding values. We demonstrate our technique with three popular classification algorithms and perform an empirical analysis over three real world datasets to illustrate that, in almost all cases, classifiers that select features using our strategy have the same error-rate as state-of-the art static feature selection methods that fail to preserve privacy.
Like what you’re reading?
Already a member?Sign In
Member Price
$11
Non-Member Price
$21
Add to CartSign In
Get this article FREE with a new membership!

Related Articles