Crawling for Domain-Speci.c Hidden Web Resources

André Bergholz; Boris Chidlovskii

doi:10.1109/WISE.2003.1254476

Web Information Systems Engineering, International Conference on

Crawling for Domain-Speci.c Hidden Web Resources

Year: 2003, Pages: 125

DOI Bookmark: 10.1109/WISE.2003.1254476

Authors

André Bergholz, Xerox Research Centre Europe
Boris Chidlovskii, Xerox Research Centre Europe

Abstract

The Hidden Web, the part of the Web that remains unavailable for standard crawlers, has become an important research topic during recent years. Its size is estimated to 400 to 500 times larger than that of the Publicly Indexable Web (PIW). Furthermore, the information on the Hidden Web is assumed to be more structured, because it is usually stored in databases. In this paper we describe a crawler which starting from the PIW finds entry points into the Hidden Web. The crawler is domain-specific and is initialized with pre-classified documents and relevant keywords. We describe our approach to the automatic identification of Hidden Web resources among encountered HTML forms. We conduct a series of experiments using the top-level categories in the Google Directory and report our analysis of the discovered Hidden Web resources.

Like what you’re reading?

Already a member?Sign In

Member Price

$11

Non-Member Price

$21

Add to Cart Sign In

Get this article FREE with a new membership!

Application of VM-Based Computations to Speed Up the Web Crawling Process on Multi-core Processors
2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES)
HiCrawl: A Hidden Web Crawler for Medical Domain
2013 International Symposium on Computational and Business Intelligence (ISCBI)
Hidden-Web Database Exploration
Intelligent Systems Design and Applications, International Conference on
A Proposal of Distributed Autonomous Cooperative System about Exclusive Web Crawling for Cyber Security
2016 19th International Conference on Network-Based Information Systems (NBiS)
Hybrid Focused Crawling for Homemade Explosives Discovery on Surface and Dark Web
2016 11th International Conference on Availability, Reliability and Security (ARES )
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries
Cooperative crawling
Web Congress, Latin American
Optimizing Query Processing for the Hidden Web
Conference, International Asia-Pacific Web
The Implementation of Crawling News Page Based on Incremental Web Crawler
2016 4th Intl. Conf. on Applied Computing and Information Technology (ACIT), 3rd Intl. Conf. on Computational Science/Intelligence and Applied Informatics (CSII), and 1st Intl. Conf. on Big Data, Cloud Computing, Data Science & Engineering (BCD)
Query Interface Schema Extraction for Hidden Web Resources Searching
2020 7th International Conference on Information Science and Control Engineering (ICISCE)

Crawling for Domain-Speci.c Hidden Web Resources

Authors

Abstract

Related Articles