Size estimation of non-cooperative data collections

by Mohammadreza Khelghati, Djoerd Hiemstra, and Maurice van Keulen

In this paper, approaches for estimating the size of non-cooperative databases and search engines are categorized and reviewed. The most recent approaches are implemented and compared in a real environment. Finally, four methods based on the modification of the available techniques are introduced and evaluated. In one of the modifications, the estimations from other approaches could be improved ranging from 35 to 65 percent.

To be presented at the 14th International Conference on Information Integration and Web-based Applications and Services (iiWAS 2012) on 3-5 December 2012 in Bali, Indonesia

[download pdf]