article

A framework for abstracting data sources having heterogeneous representation formats

Authors:

D. Rosaci,

G. Terracina,

D. UrsinoAuthors Info & Claims

Data & Knowledge Engineering, Volume 48, Issue 1

Pages 1 - 38

https://doi.org/10.1016/S0169-023X(03)00092-2

Published: 01 January 2004 Publication History

Abstract

This paper deals with the issue of abstracting a data source characterized by one among several possible representation formats. First we show that data source abstraction plays a central role in several important application problems in the area of information system design. Then we propose a new approach which is capable of semi-automatically carrying out the abstraction of a data source possibly encoded according to one among a variety of formats such as structured databases, OEM graphs and XML documents. The capability to handle heterogeneous formats is obtained via the usage of a particular conceptual model, called SDR-Network, which is able to uniformly represent and handle data sources with different formats. As a significant application of the presented data source abstraction algorithm, the construction of an Intensional Repository is also illustrated.

References

[1]

{1} S. Babu, M. Garofalakis, R. Rastogi, SPARTAN: using constrained models for guaranteed-error semantic compression, ACM SIGKDD Explorations Newsletter 4 (1) (2002) 11-20.

Digital Library

Google Scholar

[2]

{2} C. Batini, S. Castano, V. De Antonellis, M.G. Fugini, B. Pernici, Analysis of an inventory of information systems in the public administration, Requirement Engineering Journal 1 (1) (1996) 47-62.

Digital Library

Google Scholar

[3]

{3} C. Batini, M. Lenzerini, A methodology for data schema integration in the entity relationship model, IEEE Transactions on Software Engineering 10 (6) (1984) 650-664.

Digital Library

Google Scholar

[4]

{4} S. Bergamaschi, S. Castano, M. Vincini, Semantic integration of semistructured and structured data sources, SIGMOD Record 28 (1) (1999) 54-59.

Digital Library

Google Scholar

[5]

{5} S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano, Semantic integration and query of heterogeneous information sources, Data & Knowledge Engineering 36 (3) (2001) 215-249.

Digital Library

Google Scholar

[6]

{6} A.L. Berger, V.O. Mittal, Ocelot: a system for summarizing web pages, in: Proceedings of Annual Conference on Research and Development in Information Retrieval (SIGIR'00), New York, USA, ACM Press, 2000, pp. 144-151.

Crossref

Google Scholar

[7]

{7} P. Buneman, S. Davidson, M. Fernandez, D. Suciu, Adding structure to unstructured data, in: Proceedings of International Conference on Database Theory (ICDT'97) Delphi, Greece, Lecture Notes in Computer Science, Springer-Verlag, 1997, pp. 336-350.

Crossref

Google Scholar

[8]

{8} O. Buyukkokten, O. Kaljuvee, H. Garcia-Molina, A. Paepcke, T. Winograd, Efficient web browsing on handheld devices using page and form summarization, ACM Transactions on Information Systems (TOIS) 20 (1) (2002) 82-115.

Crossref

Google Scholar

[9]

{9} D. Calvanese, G. De Giacomo, M. Lenzerini, Modeling and querying semi-structured data, Networking and Information Systems Journal 2 (2) (1999) 253-273.

Google Scholar

[10]

{10} D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati, Description logic framework for information integration, in: Proceedings of International Conference on Principles of Knowledge Representation and Reasoning (KR'98), Trento, Italy, Morgan Kaufman, 1998, pp. 2-13.

Google Scholar

[11]

{11} M. Cannataro, A. Guzzo, A. Pugliese, Knowledge management and XML: derivation of synthetic views over semistructured data, ACM SIGAPP Applied Computing Review 10 (1) (2002) 33-36.

Digital Library

Google Scholar

[12]

{12} S. Castano, V. De Antonellis, S. De Capitani di Vimercati, Global viewing of heterogeneous data sources, Transactions on Data and Knowledge Engineering 13 (2) (2001) 277-297.

Digital Library

Google Scholar

[13]

{13} S. Castano, V. De Antonellis, M.G. Fugini, B. Pernici, Conceptual schema analysis: Techniques and applications, ACM Transactions on Database Systems (TODS) 23 (3) (1998) 286-332.

Crossref

Google Scholar

[14]

{14} R. Cattel, The Object Data Standard: ODMG 2.0, Morgan Kaufmann, 1997.

Crossref

Google Scholar

[15]

{15} S. Comai, E. Damiani, P. Fraternali, Computing graphical queries over XML data, ACM Transactions on Information Systems 19 (4) (2001) 371-430.

Digital Library

Google Scholar

[16]

{16} A. Doan, P. Domingos, A. Halevy, Reconciling schemas of disparate data sources: a machine-learning approach, in: Proceedings of International Conference on Management of Data (SIGMOD 2001), Santa Barbara, California, USA, ACM Press, 2001, pp. 509-520.

Crossref

Google Scholar

[17]

{17} A. Fox, S.D. Gribble, E.A. Brewer, E. Amir, Adapting to network and client variability via on-demand dynamic distillation, ACM SIGOPS Operating Systems Review 30 (5) (1996) 160-170.

Digital Library

Google Scholar

[18]

{18} Z. Galil, Efficient algorithms for finding maximum matching in graphs, ACM Computing Surveys 18 (1986) 23-38.

Digital Library

Google Scholar

[19]

{19} R. Goldman, J. McHugh, J. Widom, From semistructured data to XML: migrating the lore data model and query languages, in: Proceedings of International Workshop on the Web and Databases (WebDB'99), Philadelphia, Pennsylvania, USA, 1999, pp. 25-30.

Google Scholar

[20]

{20} R. Goldman, J. Widom, Dataguides: enabling query formulation and optimization in semistructured databases, in: Proceedings of Very Large Data Bases (VLDB'97), Athens, Greece, Morgan Kaufman, 1997, pp. 436-445.

Crossref

Google Scholar

[21]

{21} J. Han, Y. Cai, N. Cercone, Knowledge discovery in databases: an attribute-oriented approach, in: in Proceedings of International Conference on Very Large Data Bases (VLDB'92), Vancouver, Canada, Morgan Kaufmann, 1992, pp. 547-559.

Crossref

Google Scholar

[22]

{22} A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, 1988.

Crossref

Google Scholar

[23]

{23} L. Kaufman, P.J. Rousseeuw, Findings Groups in Data: an Introduction to Cluster Analysis, John Wiley & Sons, New York, 1990.

Google Scholar

[24]

{24} D.H. Lee, M.H. Kim, Database summarization using fuzzy isa hierarchies, IEEE Transactions on Systems, Man, and Cybernetics--Part B: Cybernetics 27 (4) (1997) 671-680.

Digital Library

Google Scholar

[25]

{25} A. Levy, A. Rajaraman, J. Ordille, Querying heterogeneous information sources using source descriptions, in: Proceedings of International Conference on Very Large Data Bases (VLDB'96), Bombay, India, Morgan Kaufmann, 1996, pp. 251-262.

Crossref

Google Scholar

[26]

{26} J. Madhavan, P.A. Bernstein, E. Rahm, Generic schema matching with cupid, in: Proceedings of International Conference on Very Large Data Bases (VLDB'2001), Roma, Italy, Morgan Kaufmann, 2001, pp. 49-58.

Digital Library

Google Scholar

[27]

{27} T. Milo, S. Zohar, Using schema matching to simplify heterogenous data translations, in: Proceedings of International Conference on Very Large Data Bases (VLDB'98), New York City, USA, Morgan Kaufmann, 1998, pp. 122-133.

Crossref

Google Scholar

[28]

{28} P. Mitra, G. Wiederhold, J. Jannink, Semi-automatic integration of knowledge sources, in: Proceedings of Fusion'99, Sunnyvale, California, USA, 1999.

Google Scholar

[29]

{29} D.A. Nation, C. Plaisant, G. Marchionini, A. Komlodi, Visualizing web sites using a hierarchical table of contents browser: Webtoc, in: Proceedings of Conference on Human Factors and the Web, Denver, Colorado, USA, US West Communications, 1997.

Google Scholar

[30]

{30} L. Palopoli, L. Pontieri, G. Terracina, D. Ursino, Intensional and extensional integration and abstraction of heterogeneous databases, Data & Knowledge Engineering 35 (3) (2000) 201-237.

Digital Library

Google Scholar

[31]

{31} L. Palopoli, D. Rosaci, G. Terracina, D. Ursino, Un modello concettuale per rappresentare e derivare la semantica associata a sorgenti informative strutturate e semi-strutturate, in: Atti del Congresso sui Sistemi Evoluti per Basi di Dati (SEBD 2001), Venezia, Italy, 2001, pp. 131-145 (in Italian).

Google Scholar

[32]

{32} L. Palopoli, G. Terracina, D. Ursino, A graph-based approach for extracting terminological properties of elements of XML documents, in: Proceedings of International Conference on Data Engineering (ICDE'2001), Heidelberg, Germany, IEEE Computer Society, 2001, pp. 330-337.

Crossref

Google Scholar

[33]

{33} Y. Papakonstantinou, H. Garcia-Molina, J. Widom, Object exchange across heterogeneous information sources, in: Proceedings of International Conference on Data Engineering (ICDE'95), Taipei, Taiwan, IEEE Computer Society, 1995, pp. 251-260.

Crossref

Google Scholar

[34]

{34} D. Rosaci, G. Terracina, D. Ursino, An approach for deriving a global representation of data sources having different formats and structures, Knowledge and Information Systems, in press.

Google Scholar

[35]

{35} G. Terracina, D. Ursino, Deriving synonymies and homonymies of object classes in semi-structured information sources, in: Proceedings of International Conference on Management of Data (COMAD'2000), Pune, India, McGraw Hill, 2000, pp. 21-32.

Google Scholar

[36]

{36} I.H. Witten, T.C. Bell, A. Moffat, C.G. Nevill-Manning, T.C. Smith, H.W. Thimbleby, Semantic and generative models for lossy text compression, The Computer Journal 37 (2) (1994) 83-87.

Crossref

Google Scholar

Cited By

View all

Sellami SDkaki TZarour NCharrel P(2019)KGMapProceedings of the 3rd International Conference on Advances in Artificial Intelligence10.1145/3369114.3369146(90-96)Online publication date: 26-Oct-2019
https://dl.acm.org/doi/10.1145/3369114.3369146
Magnani MMontesi D(2019)A unified approach to structured and XML data modeling and manipulationData & Knowledge Engineering10.1016/j.datak.2005.06.00459:1(25-62)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1016/j.datak.2005.06.004
De Meo PQuattrone GTerracina GUrsino D(2018)Integration of XML schemas at various "severity" levelsInformation Systems10.1016/j.is.2004.11.01031:6(397-434)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.is.2004.11.010
Show More Cited By

Index Terms

A framework for abstracting data sources having heterogeneous representation formats
1. Information systems
  1. Data management systems
    1. Database design and models

Recommendations

A practical approach to extracting DTD-conforming XML documents from heterogeneous data sources

XML documents are becoming popular for business process integration. To achieve interoperability between applications, XML documents must also conform to various commonly used data type definitions (DTDs). However, most business data are not maintained ...
Schema Mediation for Heterogeneous XML Schema Sources
WAINA '09: Proceedings of the 2009 International Conference on Advanced Information Networking and Applications Workshops

Due to the increasingly widespread use of XML, many XML-related applications require the service of schema mediation, which is to find semantically similar elements from two or more schema sources. Current approaches to schema mediation require much ...
An approach for the extensional integration of data sources with heterogeneous representation formats

In this paper we propose an approach for the extensional integration of data sources with heterogeneous representation formats. The proposed approach is based on the exploitation of a new model, called E-SDR-Network, for representing and handling, at ...

Reviews

Reviewer: Jonathan P. E. Hodgson

A major problem in information management is combining data that comes from heterogeneous sources. The issue here is not reconciling information from two sources, important though that may be. Rather, the goal is to embed disparate data representations into a common framework. This paper describes a conceptual model, called a semantic distance and relevance network (SDR network), that can be used to incorporate data from heterogeneous, but structured sources. The idea is as follows: given data in the form of some structured source, such as a relational database or an Extensible Markup Language (XML) document, one can construct an SDR network. Links between concepts in these networks carry weights that measure semantic distance and relevance. The authors propose an abstraction scheme that can be applied to these networks, in which some nodes and arcs are absorbed into others, but with a preservation of the fact of absorption. This is needed for any subsequent expansion of the SDR network. The authors suggest that SDR networks be integrated into an "intentional repository," by applying a suitable clustering algorithm based on measuring the similarities between SDR networks. In fact, they claim that one can construct a hierarchy of clusters that would be suitable for browsing the repository. The authors illustrate the construction of SDR networks from XML documents, and from databases, and give an example of the abstraction process. There are indications of the time and space complexity of the algorithms, but they do not appear to have been implemented. The ideas in the paper are interesting, and are worth pursuing. Perhaps the best reader for this paper would be someone looking to implement a system based on heterogeneous resources. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data & Knowledge Engineering

Data & Knowledge Engineering Volume 48, Issue 1

January 2004

150 pages

ISSN:0169-023X

Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2004

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sellami SDkaki TZarour NCharrel P(2019)KGMapProceedings of the 3rd International Conference on Advances in Artificial Intelligence10.1145/3369114.3369146(90-96)Online publication date: 26-Oct-2019
https://dl.acm.org/doi/10.1145/3369114.3369146
Magnani MMontesi D(2019)A unified approach to structured and XML data modeling and manipulationData & Knowledge Engineering10.1016/j.datak.2005.06.00459:1(25-62)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1016/j.datak.2005.06.004
De Meo PQuattrone GTerracina GUrsino D(2018)Integration of XML schemas at various "severity" levelsInformation Systems10.1016/j.is.2004.11.01031:6(397-434)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.is.2004.11.010
Tseng SLin S(2009)VODKAExpert Systems with Applications: An International Journal10.1016/j.eswa.2007.12.05536:2(2433-2450)Online publication date: 1-Mar-2009
https://dl.acm.org/doi/10.1016/j.eswa.2007.12.055

Abstract

References

Cited By

Index Terms

Recommendations

A practical approach to extracting DTD-conforming XML documents from heterogeneous data sources

Schema Mediation for Heterogeneous XML Schema Sources

An approach for the extensional integration of data sources with heterogeneous representation formats

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations