Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A graph-based approach for extracting terminological properties from information sources with heterogeneous formats

Published: 01 November 2005 Publication History

Abstract

The problem of handling both the integration and the cooperation of a large number of information sources characterised by heterogeneous representation formats is a challenging issue. In this context, a central role can be played by the knowledge about the semantic relationships holding between concepts belonging to different information sources (intersource properties). In this paper, we propose a semiautomatic approach for extracting two kinds of intersource properties, namely synonymies and homonymies, from heterogeneous information sources. In order to carry out the extraction task, we introduce both a conceptual model, for representing involved sources, and a metrics, for measuring the strength of the semantic relationships holding among concepts represented within the same source.

References

[1]
Abiteboul S (1997) Querying semi-structured data. In: Proc of international conference on database theory (ICDT'97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1---18
[2]
Abiteboul S, Quass D, McHugh J, Widom J, Wiener JL (1997) The lorel query language for semistructured data. Int Jl Digital Libr 1(1):68---88
[3]
Abiteboul S, Vianu V (1997) Queries and computation on the web. In: Proc of international conference on database theory (ICDT'97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 262---275
[4]
Batini C, Lenzerini M (1984) A methodology for data schema integration in the entity relationship model. IEEE Trans Softw Eng 10(6):650---664
[5]
Beneventano D, Bergamaschi S, Sartori C, Vincini M (1997) ODB-Tools: a description logics based tool for schema validation and semantic query optimization in object oriented databases. In: Proc of advances in artificial intelligence, 5th congress of the Italian association for artificial intelligence (AI*IA'97), Roma, Italy. Lecture notes in artificial intelligence, Springer, Berlin, Heidelberg, New York, pp 435---438
[6]
Bergamaschi S, Castano S, Vincini M (1999) Semantic integration of semistructured and structured data sources. SIGMOD Rec 28(1):54---59
[7]
Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration and query of heterogeneous information sources. Data Knowl Eng 36(3):215---249
[8]
Bernstein PA, Rahm E (2000) Data warehouse scenarios for model management. In: Proc of international conference on conceptual modeling (ER'00), Salt Lake City, Utah, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1---15
[9]
Buccafurri F, Lax G, Rosaci D, Ursino D (2002) A user behavior-based agent for improving web usage. In: Proc of international conference on ontologies, databases and applications of semantics (ODBASE 2002), Irvine, California, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 1168---1185
[10]
Buneman P (1997) Semistructured data. In: Proc of symposium on principles of database systems, (PODS'97), Tucson, Arizona, USA. ACM Press, pp 117---121
[11]
Buneman P, Davidson S, Fernandez M, Suciu D (1997) Adding structure to unstructured data. In: Proc of international conference on database theory (ICDT'97), Delphi, Greece. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 336---350
[12]
Calvanese D, De Giacomo G, Lenzerini M (1999) Modeling and querying semi-structured data. Netw Inf Syst J 2(2):253---273
[13]
Calvanese D, De Giacomo G, Lenzerini M, Nardi D, Rosati R (1998) Description logic framework for information integration. In: Proc of international conference on principles of knowledge representation and reasoning (KR'98), Trento, Italy. Morgan Kaufman, pp 2---13
[14]
Castano S, De Antonellis V (1997) Semantic dictionary design for database interoperability. In: Proc of international conference on data engineering (ICDE'97), Birmingham, United Kingdom. IEEE Computer Society, pp 43---54
[15]
Castano S, De Antonellis V, De Capitani di Vimercati S (2001) Global viewing of heterogeneous data sources. Trans Data Knowl Eng 13(2):277---297
[16]
Castano S, De Antonellis V, Ferrara A, Kuruvilla G (2002) Ontology-based integration of heterogeneous XML datasources. In: Atti del decimo convegno nazionale su sistemi evoluti per basi di dati (SEBD'02), Portoferraio, Italy, pp 27---41
[17]
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. ACM SIGMOD REC 26(1):65---74
[18]
Comai S, Damiani E, Fraternali P (2001) Computing graphical queries over XML data. ACM Trans Inf Syst 19(4):371---430
[19]
Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In: Proc of the international conference on management of data (SIGMOD 2001), Santa Barbara, California, USA. ACM Press, pp 509---520
[20]
Fankhauser P, Kracker M, Neuhold EJ (1991) Semantic vs structural resemblance of classes. ACM SIGMOD REC 20(4):59---63
[21]
Fernandez MF, Popa L, Suciu D (1997) A structure-based approach to querying semi-structured data. In: Proc of international workshop on database programming languages (DBLP'97), Estes Park, Colorado, USA. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 136---159
[22]
Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18:23---38
[23]
Goldman R, McHugh J, Widom J (1999) From semistructured data to XML: migrating the lore data model and query languages. In: Proc of international workshop on the web and databases (WebDB'99), Philadelphia, Pennsylvania, pp 25---30
[24]
Goldman R, Widom J (1997) Dataguides: enabling query formulation and optimization in semistructured databases. In: Proc of very large data bases (VLDB'97), Athens, Greece. Morgan Kaufman, pp 436---445
[25]
Haas LM, Miller RJ, Niswonger B, Roth MT, Schwarz PM, Wimmers EL (1999) Transforming heterogeneous data with database middleware: beyond integration. IEEE Data Eng Bull 22(1):31---36
[26]
Larson JA, Navathe SB, Elmastri R (1989) A theory of attribute equivalence in databases with application to schema integration. IEEE Trans Softw Eng 15(4):449---463
[27]
Lim S, Ng Y (2001) Semantic integration of semistructured data. In: Proc of the international symposium on cooperative database systems and applications (CODAS'01), Beijing, China. IEEE Computer Society Press, pp 15---24
[28]
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: Proc of the international conference on very large data bases (VLDB 2001), Roma, Italy. Morgan Kaufmann, pp 49---58
[29]
Mendelzon AO, Mihaila GA, Milo T (1996) Querying the world wide web. In: Proc of conference on parallel and distributed information systems (PDIS'96), Miami Beach (Florida). IEEE Computer Society, pp 80---91
[30]
Miller AG (1995) WordNet: a lexical database for English. Commun ACM 38(11):39---41
[31]
Milo T, Zohar S (1998) Using schema matching to simplify heterogenous data translations. In: Proc of the international conference on very large data bases (VLDB'98), New York City. Morgan Kaufmann, pp 122---133
[32]
Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In: Proc of fusion'99, Sunnyvale, California
[33]
Nestorov S, Ullman JD, Wiener JL, Chawathe SS (1997) Representative objects: concise representations of semistructured, hierarchical data. In: Proc of international conference on data engineering (ICDE'97), Birmingham, United Kingdom. IEEE Computer Society, pp 79---90
[34]
Palopoli L, Pontieri L, Terracina G, Ursino D (2002) A novel three-level architecture for large data warehouses. J Syst Arch 47(11):937---958
[35]
Palopoli L, Pontieri L, Ursino D (1999a) Automatic and semantic techniques for scheme integration and scheme abstraction. In: Proc of international conference on database and expert systems applications (DEXA'99), Firenze, Italy. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 511---520
[36]
Palopoli L, Rosaci D, Terracina G, Ursino D (2001) Modeling web-search scenarios exploiting user and source profiles. AI Commun 14(4):215---230
[37]
Palopoli L, Saccà D, Terracina G, Ursino D (2003) Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Trans Knowl Data Eng 15(2):271---294
[38]
Palopoli L, Saccà D, Ursino D (1999b) Semi-automatic techniques for deriving interscheme properties from database schemes. Data Knowl Eng 30(4):239---273
[39]
Papakonstantinou Y, Garcia-Molina H, Widom J (1995) Object exchange across heterogeneous information sources. In: Proc of international conference on data engineering (ICDE'95), Taipei, Taiwan. IEEE Computer Society, pp 251---260
[40]
Quass D, Rajaraman A, Sagiv Y, Ullman JD, Widom J (1995) Querying semistructured heterogeneous information. In: Proc of international conference on deductive and object-oriented databases (DOOD'95), Singapore. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 319---344
[41]
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334---350
[42]
Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: Proc of international conference on computational linguistics (COLING-ACL'98), Montreal, Quebec, Canada. Morgan Kaufmann, pp 1098---1102
[43]
Rishe N, Yuan J, Athauda R, Chen S-C, Lu X, Ma X, Vaschillo A, Shaposhnikov A, Vasilevsky D (2000) Semantic access: semantic interface for querying databases. In: Proc of international conference on very large data bases (VLDB 2000), Il Cairo, Egypt. Morgan Kaufmann, pp 591---594
[44]
Rosaci D, Sarnè GML, Ursino D (2002) A multi-agent model for handling e-commerce activities. In: Proc of international database engineering and applications symposium (IDEAS 2002), Edmonton, Alberta, Canada. IEEE Computer Society, pp 202---211
[45]
Rosaci D, Terracina G, Ursino D (2004) An approach for deriving a global representation of data sources having different formats and structures. Knowl Inf Syst 6(1):42---82
[46]
Suciu D (1998) Semistructured data and XML. In: Proc of international conference on foundations of data organization (FODO'98), Kobe, Japan
[47]
Tresch M, Palmer N, Luniewski A (1995) Type classification of semi-structured documents. In: Proc of international conference on very large databases (VLDB'95), Zurich, Switzerland. Morgan Kaufmann, pp 263---274
[48]
Ursino D (1999) Deriving type conflicts and object cluster similarities in database schemes by an automatic and semantic approach. In: Proc of symposium on advances in databases and information systems (ADBIS'99), Maribor, Slovenia. Lecture notes in computer science, Springer, Berlin, Heidelberg, New York, pp 46---60
[49]
Wald JA, Sorenson PG (1990) Explaining ambiguity in a formal query language. ACM Trans Database Syst 15(2):125---161
[50]
Widom J (1995) Research problems in data warehousing. In: Proc of international conference on information and knowledge management (CIKM'95), Baltimore, Maryland. ACM Press, pp 25---30

Cited By

View all
  • (2015)Using knowledge-based relatedness for information retrievalKnowledge and Information Systems10.1007/s10115-014-0785-444:3(689-718)Online publication date: 1-Sep-2015
  • (2009)Consensus-based evaluation framework for distributed information retrieval systemsKnowledge and Information Systems10.5555/3225660.322596418:2(199-211)Online publication date: 1-Feb-2009
  • (2007)Understanding the schema matching problemProceedings of the 7th Conference on 7th WSEAS International Conference on Applied Computer Science - Volume 710.5555/1348171.1348181(59-68)Online publication date: 21-Nov-2007
  • Show More Cited By
  1. A graph-based approach for extracting terminological properties from information sources with heterogeneous formats

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Knowledge and Information Systems
    Knowledge and Information Systems  Volume 8, Issue 4
    November 2005
    127 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 November 2005

    Author Tags

    1. Automatic and semantic approaches for intersource property detection
    2. Intersource properties
    3. Structured and semistructured information sources
    4. Synonymies and homonymies

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Using knowledge-based relatedness for information retrievalKnowledge and Information Systems10.1007/s10115-014-0785-444:3(689-718)Online publication date: 1-Sep-2015
    • (2009)Consensus-based evaluation framework for distributed information retrieval systemsKnowledge and Information Systems10.5555/3225660.322596418:2(199-211)Online publication date: 1-Feb-2009
    • (2007)Understanding the schema matching problemProceedings of the 7th Conference on 7th WSEAS International Conference on Applied Computer Science - Volume 710.5555/1348171.1348181(59-68)Online publication date: 21-Nov-2007
    • (2006)HISENE2Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I10.1007/11914853_60(949-966)Online publication date: 29-Oct-2006

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media