Advanced Search
CS Search Google Search
Subscribers, please login

Published Articles >> Table of Contents >> Abstract

9th International Database Engineering & Application Symposium (IDEAS'05)   pp. 105-114
Automatically Maintaining Wrappers for Web Sources

Full Article Text: Download PDF of full textBuy this article

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IDEAS.2005.13
Send link to a friend

Abstract
A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents. A program able to provide software applications with a structured view of those semi-structured web sources is usually called a wrapper. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access web data in a similar manner to that of information from databases. A significant problem in this approach arises because web sources may experiment changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. Our approach is based on collecting some query results during wrapper operation. Then, when the source changes, they are used to generate a set of labeled examples that are then provided as input to a wrapper induction algorithm able to regenerate the wrapper. We have tested our methods in several real-world web data extraction domains, obtaining high accuracy in all the steps of the process.
Additional Information

Citation:  Juan Raposo, Alberto Pan, Manuel Álvarez, Justo Hidalgo, "Automatically Maintaining Wrappers for Web Sources," ideas, pp. 105-114,  9th International Database Engineering & Application Symposium (IDEAS'05),  2005

Similar Articles

Abstract Contents
Abstract
Citation




Free access to

  • Abstracts
  • Selected PDFs

Electronic subscribers login to:

  • Access HTML/PDFs of full text articles

Subscription information

Get a Web account

PDFs require Adobe Acrobat Reader.

Peer Review Notice

Give us Feedback