Abstract
Database driven IP geolocation is a convenient and common way to determine geographic location of an IP address. However, the underlying problem is that it is often difficult for users to determine which provider is reliable enough to meet their own scenarios. In this paper, we tackle this challenge in a data fusion perspective. We first evaluate the consistency degree of data entries among 5 free geolocation databases and employ it as an indicator of data quality assessment. We find that this indicator varies by geographic scope and granularity for a certain provider. Therefore we are able to evaluate data quality for different parts and dimensions within a database. Then a data fusion method utilizing data consistency degree and quota-based votes is proposed and analyzed. Over 40 million IP geolocation ground truth data in China, i.e., more than 10% of the total address space allocated to China, is applied to verify the effectiveness and advantage of the proposed method. In this work, we provide insights into comprehensive utilization of multi-databases characteristics for data entry fusion in the absence of enough priori knowledge.