Deep Web Entity Monitoring

by Mohammad Khelghati

Search engines do not cover all the data available on the Web. In addition to the fact that none of these search engines cover all the webpages existing on the Web, they miss the data behind web search forms. This data is defined as hidden web or deep web which is not accessible through search engines. It is estimated that deep web contains data in a scale several times bigger than the data accessible through search engines which is referred to as surface web. Although this information on deep web could be accessed through their own interfaces, finding and querying all the interesting sources of information that might be useful could be a difficult, time-consuming and tiring task. Considering the huge amount of information that might be related to one’s information needs, it might be even impossible for a person to cover all the deep web sources of his interest. Therefore, there is a great demand for applications which can facilitate accessing this big amount of data being locked behind web search forms. Realizing approaches to meet this demand is one of the main issues targeted in this PhD project. Having provided the access to deep web data, different techniques can be applied to provide users with additional values out of this data. Analyzing data, finding patterns and relationships among different data items and also data sources are considered as some of these techniques. However, in this research, monitoring entities existing in deep web sources is targeted.

To be presented at the World Wide Web Conference Doctorial Consortium on 13 May in Rio de Janeiro, Brasil.