Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1013367.1013545acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

OntoMiner: bootstrapping ontologies from overlapping domain specific web sites

Published: 19 May 2004 Publication History

Abstract

In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.

References

[1]
Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In Proceedings of 27th International Conference on Very Large Data Bases, pages 109--118, 2001.
[2]
A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. In ACM SIGMOD, 2003.
[3]
Christina Yip Chung, Michael Gertz, and Neel Sundaresan. Reverse engineering for web data: From visual to semantic structures. In Intl. Conf. on Data Engineering, 2002.

Cited By

View all

Index Terms

  1. OntoMiner: bootstrapping ontologies from overlapping domain specific web sites

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
      May 2004
      532 pages
      ISBN:1581139128
      DOI:10.1145/1013367
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 May 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data mining
      2. ontology
      3. semantic web
      4. web mining

      Qualifiers

      • Article

      Conference

      WWW04
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)Ranking Algorithm for Semantic Document AnnotationsInternational Journal of Information Retrieval Research10.4018/ijirr.20120101012:1(1-10)Online publication date: 1-Jan-2012
      • (2011)Navigating within news collections using tag-flakesJournal of Visual Languages and Computing10.1016/j.jvlc.2010.11.00122:2(120-139)Online publication date: 1-Apr-2011
      • (2011)User-Centered Evaluation for IR: Ranking Annotated Document AlgorithmsSoftware Engineering and Computer Systems10.1007/978-3-642-22203-0_27(306-312)Online publication date: 2011
      • (2009)Ontology learning from domain specific web documentsInternational Journal of Metadata, Semantics and Ontologies10.1504/IJMSO.2009.0262514:1/2(24-33)Online publication date: 1-May-2009
      • (2008)Creating tag hierarchies for effective navigation in social mediaProceedings of the 2008 ACM workshop on Search in social media10.1145/1458583.1458597(75-82)Online publication date: 30-Oct-2008
      • (2008)Using tagflake for condensing navigable tag hierarchies from tag cloudsProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1401890.1402021(1069-1072)Online publication date: 24-Aug-2008
      • (2008)Analysis of Network Lifetime in Hybrid Sensor Networks with Wired Shortcut2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.862(1-4)Online publication date: Oct-2008
      • (2008)Extracting Structure of Web Site Based on Hyperlink Analysis2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.2538(1-4)Online publication date: Oct-2008
      • (2008)A Novel Agent-Based Model for Search in Distributed Networks2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.1326(1-4)Online publication date: Oct-2008
      • (2006)AggregateRankProceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1148170.1148187(75-82)Online publication date: 6-Aug-2006
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media