Abstract
The semantic Web will bring meaning to the Internet, making it possible for web agents to understand the information it contains. However, current trends seem to suggest that it is not likely to be adopted in the forthcoming years. In this sense, meaningful information extraction from the web becomes a handicap for web agents. In this article, we present a framework for automatic extraction of semantically-meaningful information from the current web. Separating the extraction process from the business logic of an agent enhances modularity, adaptability, and maintainability. Our approach is novel in that it combines different technologies to extract information, surf the web and automatically adapt to some changes.
The work reported in this article was supported by the Spanish Inter-ministerial Commission on Science and Technology under grant TIC2000-1106-C02-01
Chapter PDF
Similar content being viewed by others
Keywords
- Information Channel
- Inductive Logic Programming
- Business Logic
- Extraction Rule
- Defense Advance Research Project Agency
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
DARPA (Defense Advanced Research Projects Agency). The darpa agent mark up language (daml). http://www.daml.org, 2000.
W. W. Cohen and L. S. Jensen. A structured wrapper induction system for extracting information from semi-structured documents. In Workshop on Adaptive Text Extraction and Mining (IJCAI-2001), 2001.
O. Corcho and A. Gómez-Pérez. A road map on ontology specification languages. In Workshop on Applications of Ontologies and Problem solving methods. 14th European Conference on Artificial Intelligence (ECAI’00), 2000.
S. Cranefield and M. Purvis. Generating ontology-specific content languages. In Proceedings of Ontologies in Agent Systems Workshop (Agents 2001), pages 29–35, 2000.
H. García-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. Integrating and accessing heterogeneous information sources in TSIM-MIS. In The AAAI Symposium on Information Gathering, pages 61–64, March 1995.
C. A. Knoblock. Accurately and reliably extracting data from the web: A machine learning approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2000.
N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(2000):15–68, 1999.
G. Mecca, P. Merialdo, and P. Atzeni. ARANEUS in the era of XML. Data Engineering Bullettin, Special Issue on XML, September 1999.
I. Muslea, S. Minton, and C. Knoblock. Wrapper induction for semistructured, web-based information sources. In Proceedings of the Conference on Automated Learning and Discovery (CONALD), 1998.
S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, pages 1–44, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arjona, J.L., Corchuelo, R., Ruiz, A., Toro, M. (2002). A Practical Agent-Based Method to Extract Semantic Information from the Web. In: Pidduck, A.B., Ozsu, M.T., Mylopoulos, J., Woo, C.C. (eds) Advanced Information Systems Engineering. CAiSE 2002. Lecture Notes in Computer Science, vol 2348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47961-9_48
Download citation
DOI: https://doi.org/10.1007/3-540-47961-9_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43738-3
Online ISBN: 978-3-540-47961-1
eBook Packages: Springer Book Archive