Nothing Special   »   [go: up one dir, main page]

skip to main content
article

SchemaSQL: An extension to SQL for multidatabase interoperability

Published: 01 December 2001 Publication History

Abstract

We provide a principled extension of SQL, called SchemaSQL, that offers the capability of uniform manipulation of data and schema in relational multidatabase systems. We develop a precise syntax and semantics of SchemaSQL in a manner that extends traditional SQL syntax and semantics, and demonstrate the following. (1) SchemaSQL retains the flavor of SQL while supporting querying of both data and schema. (2) It can be used to transform data in a database in a structure substantially different from original database, in which data and schema may be interchanged. (3) It also permits the creation of views whose schema is dynamically dependent on the contents of the input instance. (4) While aggregation in SQL is restricted to values occurring in one column at a time, SchemaSQL permits "horizontal" aggregation and even aggregation over more general "blocks" of information. (5) SchemaSQL provides a useful facility for interoperability and data/schema manipulation in relational multidatabase systems. We provide many examples to illustrate our claims. We clearly spell out the formal semantics of SchemaSQL that accounts for all these features. We describe an architecture for the implementation of SchemaSQL and develop implementation algorithms based on available database technology that allows for powerful integration of SQL based relational DBMS. We also discuss the applicability of SchemaSQL for handling semantic heterogeneity arising in a multidatabase system.

References

[1]
ACM. 1990. ACM Computing Surveys 22, 3 (Sept.). Special issue on HDBS.
[2]
AGARWAL, S., AGRAWAL, R., DESHPANDE, P., GUPTA, A., NAUGHTON,J.F.,RAMAKRISHNAN, R., AND SARAWAGI, S. 1996. On the computation of multidimensional aggregates. In VLDB'96, Proceedings of the 22th International Conference on Very Large Data Bases, T. M. Vijayaraman, A. P. Buchmann, C. Mohan, and N. L. Sarda Eds. (Mumbai (Bombay), India, Sept. 3-6). Morgan-Kaufmann, San Mateo, Calif., pp. 506-521.
[3]
AGRAWAL, R., SOMANI, A., AND XU, Y. 2001. Storage and querying of e-commerce data. In Proceedings of the 27th International Conference on Very Large Databases, pp. 149-158.
[4]
AHMED, R., SMEDT, P., DU, W., KENT, W., KETABCHI, A., AND LITWIN, W. 1991. The pegasus heterogeneous multidatabase system. IEEE Comput. 24, 12 (Dec.), 19-27.
[5]
BANCILHON,F.AND RAMAKRISHNAN, R. 1986. An amateur's introduction to recursive queryprocessing strategies. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 16-52.
[6]
BEECH, D. 1993. Collections of objects in SQL3. In Proceedings of the International Conference on Very Large Database. pp. 244-255.
[7]
BERGAMASCHI, S., CASTANO,S.,AND VINCINI, M. 1999. Semantic integration of semistructured and structured data sources. SIGMOD Record 28, 1, 54-59.
[8]
CALVANESE, D., GIACOMO,G.D.,LENZERINI, M., NARDI,D.,AND ROSATI, R. 1998. Information integration: Conceptual modeling and reasoning support. In Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems (New York, New York, Aug. 20-22). Sponsored by IFCIS, The International Foundation on Cooperative Information Systems. IEEE-CS Press, Los Alamitos, Calif., pp. 280-291.
[9]
CAREY,M.J.,KIERNAN, J., SHANMUGASUNDARAM, J., SHEKITA,E.J.,AND SUBRAMANIAN, S. N. 2000. Xperanto: Middleware for publishing object-relational data as XML documents. In VLDB 2000, Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt, Sept. 10-14). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan-Kaufmann, San Mateo, Calif., pp. 646-648.
[10]
CASTANO,S.AND ANTONELLIS, V. D. 1997. Semantic dictionary design for database interoperability. In Proceedings of the 13th International Conference on Data Engineering (Birmingham, U.K., Apr. 7-11). A. Gray and P.-A a . Larson, Eds. IEEE Computer Society, Los Alamitos, Calif., pp. 43-54.
[11]
CHOMICKI,J.AND LITWIN, W. 1993. Declarative definition of object-oriented multidatabase mappings. In Distributed Object Management. M. T. Ozsu, U. Dayal, and P. Valduriez, Eds. Morgan- Kaufmann, Los Altos, Calif.
[12]
CODD,E.F.,CODD,S.B.,AND SALLEY, C. T. 1995. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. White paper-URL:http://www.arborsoft.com/papers/ coddTOC.html.
[13]
DAVIS,K.B.AND SADRI, F. 2001. Optimization of SchemaSQL queries. In Proceedings of International Database Engineering and Applications (IDEAS). pp. 111-116.
[14]
ELMAGARMID, A., RUSINKIEWICZ, M., AND SHETH,A.EDS. 1998. Management of Heterogeneous and Autonomous Database Systems. Morgan-Kaufmann, San Mateo, Calif.
[15]
FLORESCU,D.AND KOSSMANN, D. 1999. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull. 22, 3, 27-34.
[16]
GARCIA-MOLINA, H., PAPAKONSTANTINOU, Y., QUASS, D., RAJARAMAN, A., SAGIV, Y., ULLMAN,J.D.,VASSALOS, V., AND WIDOM, J. 1997. The TSIMMIS approach to mediation: Data models and languages. J. Int. Inf. Syst. 8, 2, 117-132.
[17]
GINGRAS, F. 1997. Extending SchemaSQL towards multidimensional databases and OLAP. Master's dissertation, Dept. Computer Science. Concordia Univ., Montreal, Que., Canada.
[18]
GINGRAS,F.AND LAKSHMANAN, L. V. S. 1998. nD-SQL: A multi-dimensional language for interoperability and OLAP. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases (New York, New York, Aug. 24-27). A. Gupta, O. Shmueli, and J. Widom, Eds. Morgan- Kaufmann, San Mateo, Calif., pp. 134-145.
[19]
GINGRAS, F., LAKSHMANAN,L.V.S.,SUBRAMANIAN, I. N., PAPOULIS,D.,AND SHIRI, N. 1997. Languages for multi-database interoperability. In SIGMOD 1997, Proceedings ACMSIGMOD International Conference on Management of Data (Tucson, Az., May 13-15). J. Peckham, Ed. ACM, New York, pp. 536-538.
[20]
GRANT, J., LITWIN, W., ROUSSOPOULOS,N.,AND SELLIS, T. 1993. Query languages for relational multidatabases. VLDB J. 2, 2, 153-171.
[21]
GRAY, J., BOSWORTH, A., LAYMAN, A., AND PIRAHESH, H. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In Proceedings of the International Conference on Data Engineering. pp. 152-159.
[22]
GYSSENS, M., LAKSHMANAN,L.V.S.,AND SUBRAMANIAN, I. N. 1996. Tables as a paradigm for querying and restructuring. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS) (June). ACM, New York, pp. 93-103.
[23]
HAMMER,J.AND MCLEOD, D. 1993. An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems. Int. J. Intell. Coop. Inf. Syst. 2, 1, 51-83.
[24]
HSIAO, D. K. 1992. Federated databases and systems: Part one - A tutorial on their data sharing. VLDB J. 1, 127-179.
[25]
IBM. DB2 datajoiner. http://www.software.ibm.com/data/datajoiner.
[26]
KELLEY, W., GALA, S. K., KIM, W., REYES,T.C.,AND GRAHAM, B. 1995. Schema architecture of the UniSQL/ M multidatabase system. In Modern Database Systems. Addison-Wesley, Reading, Mass.
[27]
KIFER, M., KIM,W.,AND SAGIV, Y. 1992. Querying object-oriented databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 393-402.
[28]
KIFER, M., LAUSEN,G.,AND WU, J. 1995. Logical foundations for object-oriented and framebased languages. J. ACM 42, 4, 741-843.
[29]
KIM, W., CHOI, I., GALA,S.K.,AND SCHEEVEL, M. 1993. On resolving schematic heterogeneity in multidatabase systems. Dist. Parall. Datab. 1, 3, 251-279.
[30]
KLUG, A. C. 1982. Equivalence of relational algebra and relational calculus query languages having aggregate functions. J. ACM 29, 3, 699-717.
[31]
KRISHNAMURTHY, R., LITWIN,W.,AND KENT, W. 1991. Language features for interoperability of databases with schematic discrepancies. In Proceedings of the ACMSIGMOD International Conference on Management of Data. ACM, New York, pp. 40-49.
[32]
KRISHNAMURTHY,R.AND NAQVI, S. 1988. Towards a real Horn clause language. In Proceedings of the 14th VLDB Conference, pp. 252-263.
[33]
KRISHNAMURTHY,R.AND ZLOOF, M. M. 1995. RBE: Rendering by example. In Proceedings of the 11th International Conference on Data Engineering (Taipei, Taiwan, Mar. 6-10). P. S. Yu and A. L. P. Chen, Eds. IEEE Computer Society Press, Los Alamitos, Calif., pp. 288-297.
[34]
LAKSHMANAN,L.V.S.,SADRI,F.,AND SUBRAMANIAN, I. N. 1993. On the logical foundations of schema integration and evolution in heterogeneous database systems. In Proceedings of the 3rd International Conference on Deductive and Object-Oriented Databases (DOOD '93) (Dec.). Lecture Notes in Computer Science, Vol. 760. Springer-Verlag, New York, pp. 81-100.
[35]
LAKSHMANAN,L.V.S.,SADRI,F.,AND SUBRAMANIAN, I. N. 1997. Logic and algebraic languages for interoperability in multidatabase systems. J. Logic prog. 33, 2 (Nov.), 101-149.
[36]
LAKSHMANAN,L.V.S.,SADRI,F.,AND SUBRAMANIAN, S. N. 1999. On efficiently implementing SchemaSQL on a SQL database system. In Proceedings of International Conference on Very Large Databases. pp. 471-482.
[37]
LEFEBVRE, A., BERNUS,P.,AND TOPOR, R. 1992. Query transformation for accessing heterogeneous databases. In Workshop on Deductive Databases in conjunction with JICSLP (Nov.), pp. 31-40.
[38]
LIPTON,R.AND NAUGHTON, J. 1990. Query size estimation by adaptive sampling. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS). ACM, New York.
[39]
LIPTON, R., NAUGHTON,J.,AND SCHNEIDER, D. 1990. Practical selectivity estimation through adaptive sampling. In Proceedings of the ACM SIGMOD. ACM, New York.
[40]
LITWIN, W., ABDELLATIF, A., ZEROUAL, A., AND NICOLAS, B. 1989. MSQL: A multidatabase language. Inf. Sci. 49, 50-101.
[41]
LITWIN, W., MARK, L., AND ROUSSOPOULOS, N. 1990. Interoperability of multiple autonomous databases. ACM Comput. Surv. 22, 3 (Sept.), 267-293.
[42]
MEO-EVOLI, L., RICCI,F.L.,AND SHOSHANI, A. 1992. On the semantic completeness of macrodata operators for statistical aggregation. In Proceedings of the International Conference on Scientific and Statistical Database Management. pp. 239-258.
[43]
MILLER, R. J. 1998. Using schematically heterogeneous structures. In SIGMOD 1998, Proceedings of the ACMSIGMOD International Conference on Management of Data (Seattle, Wash., June 2-4). L. M. Haas and A. Tiwari, Eds. ACM, New York, pp. 189-200.
[44]
MILLER,R.J.,IOANNIDIS,Y.E.,AND RAMAKRISHNAN, R. 1993. The use of information capacity in schema integration and translation. In Proceedings of the 19th International Conference on Very Large Data Bases (Dublin, Ireland, Aug. 24-27). R. Agrawal, S. Baker, and D. A. Bell, Eds. Morgan-Kaufmann, San Mateo, Calif., pp. 120-133.
[45]
MILLER,R.J.,TSATALOS,O.G.,AND WILLIAMS, J. H. 1997. Dataweb: Customizable database publishing for the web. IEEE MultiMed. 4, 4, 14-21.
[46]
MISSIER,P.AND RUSINKIEWICZ, M. 1995. Extending a multidatabase manipulation language to resolve schema and data conflicts. In Proceedings of the 6th IFIP TC-2 Working Conference on Data Semantics (DS-6) (Atlanta, Ga., May).
[47]
MUMICK,I.S.AND ROSS, K. A. 1993. Noodle: A language for declarative querying in objectoriented database. In Proceedings of the 3rd International Conference on Deductive and Object- Oriented Databases (DOOD'93) (Dec.). Lecture Notes in Computer Science, Vol. 760, Springer-Verlag, New York.
[48]
OZSOYOGLU, G., MATOS,V.,AND OZSOYOGLU, Z. M. 1989. Query processing techniques in the summary-table-by-example query language. ACM Trans. Datab. Syst. 14, 4, 526-573.
[49]
OZSOYOGLU, G., OZSOYOGLU,Z.M.,AND MATA, F. 1985. A language and a physical organization technique for summary tables. In Proceedings of ACM SIGMOD International Conference on Management of Data. ACM, New York, pp. 3-16.
[50]
PANTI, M., SPALAZZI, L., AND GIRETTI, A. 2000. A case-based approach to information integration. In VLDB2000, Proceedings of 26th International Conference on Very Large Data Bases (Cairo, Egypt, Sept. 10-14). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan-Kaufmann, Reading, Mass., pp. 557-565.
[51]
ROSS, K. 1992. Relations with relation names as arguments: Algebra and calculus. In Proceedings of the 11th Annual ACM Symposium on Principles of Database Systems (June). ACM, New York, pp. 346-353.
[52]
SADRI,F.AND WILSON, S. B. 1997. Implementation of SchemaSQL-A language for relational multi-database systems. Manuscript, www.uncg.edu/csadrif/papers.html.
[53]
SARAWAGI, S., THOMAS,S.,AND AGRAWAL, R. 1998. Integrating mining with relational database systems: Alternatives and implications. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data (Seattle, Wash., June 2-4). L. M. Haas and A. Tiwari, Eds. ACM, New York, pp. 343-354.
[54]
SCIORE, E., SIEGEL, M., AND ROSENTHAL, A. 1994. Using semantic values to facilitate interoperability among heterogeneous information systems. ACM Trans. Datab. Syst. 19, 2 (June), 254-290.
[55]
SHANMUGASUNDARAM, J., SHEKITA, E. J., BARR, R., CAREY,M.J.,LINDSAY,B.G.,PIRAHESH, H., AND REINWALD, B. 2000. Efficiently publishing relational data as XML documents. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases (Cairo, Egypt, Sept. 10- 14). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan-Kaufmann, San Mateo, Calif., pp. 65-76.
[56]
SHANMUGASUNDARAM, J., TUFTE, K., ZHANG, C., HE, G., DEWITT,D.J.,AND NAUGHTON, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases (Edinburgh, Scotland, U.K., Sept. 7-10). M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, Eds. Morgan-Kaufmann, San Mateo, Calif., pp. 302-314.
[57]
SHETH, A., ED. December, 1991. Semantic Issues in Multidatabase Systems. SIGMOD Record 20,4.
[58]
SHETH,A.P.AND LARSON, J. A. 1990. Federated database system for managing distributed, heterogeneous and autonomous databases. ACM Comput. Surv. 22, 3 (Sept.), 183-236.
[59]
SHOSHANI, A. 1997. OLAP and statistical databases: Similarities and differences. In Proceedings of ACM Symposium on Principles of Database Systems. ACM, New York, pp. 185-196.
[60]
SQL STANDARDS HOME PAGE. 1996. SQL 3 articles and publications. URL: www.jcc.com/ sql articles.html.
[61]
SUBRAMANIAN,S.N.AND VENKATARAMAN, S. 1998. Query optimization using restructuring views. IBM Internal Report.
[62]
WANG, M., IYER,B.,AND VITTER, J. S. 1998. Scalable mining for classification rules in relational databases. In Proceedings of International Database Engineering and Applications (IDEAS).

Cited By

View all

Recommendations

Reviews

Vasant B. Kaujalgi

Modern information systems are very complex, due to wide area networks with Internet links, distributed databases, and the general characteristics of online systems. Therefore, the present version of structured query language (SQL) may not meet user expectations in the distributed database environment. This effort by Lakshmanan, Sadri, and Subramanian makes an attempt to enhance SQL in order to meet such additional processing requirements. Distributed databases may be on different operating systems (OS), hardware, and data models, using different database management systems (DBMS). Traditionally, information system applications were designed based on the functional units of an organization, and were designed independently. Supporting cross-queries and interoperability among units was therefore a difficult task once the information systems were networked. The authors propose a multidatabase system (MDBS) for such a computing environment. The main problems are identified based on semantic, syntactic, and system problems. This paper focuses on syntactic and query language facilities for specific interoperability needs in the distributed databases. The authors identify the main features of an enhanced SQL as having independence from the target databases; having the ability to restructure a database; being built on the standard SQL; and being successful in implementation. Their enhanced SQL, SchemaSQL, is designed to allow uniform manipulation of data and metadata. The syntax of SchemaSQL allows processing of different databases. Additional variables can be declared. Many features of aggregation have been included, and the aggregation facility has been enhanced for horizontal and block level aggregation. SchemaSQL semantics allow fixed output schema and dynamic output schema, and support dynamic characteristics for output schema, with the ability to restructure views. The most interesting features of the paper concern the details of SchemaSQL’s implementation. An architecture for SchemaSQL is proposed. Algorithms are included for fixed and dynamic output schema. Certain optimization ideas are indicated for the SchemaSQL implementation. The authors claim that SchemaSQL can tackle semantic heterogeneity, which is required for accessing different databases. The paper ends by comparing SchemaSQL with recent extensions to standard SQL, such as MSQL, XSQL, HOSQL, OSQL, and Uni SQL/M. The authors claim that SchemaSQL may be useful in online analytical processing (OLAP). SchemaSQL may also help in data mining applications. The paper has a list of ideas for future research on this topic. The paper is relevant to researchers working on query languages for distributed databases. It has sufficient technical depth to allow the concepts to be implemented. The authors have contributed to present knowledge about querying distributed databases in a complex information system. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 26, Issue 4
December 2001
135 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/503099
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2001
Published in TODS Volume 26, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information integration
  2. SchemaSQL
  3. multidatabase systems
  4. restructuring views
  5. schematic heterogeneity

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Interactive Table Synthesis With Natural LanguageIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332912030:9(6130-6145)Online publication date: 1-Nov-2023
  • (2022)Rigel: Transforming Tabular Data by Declarative MappingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.3209385(1-11)Online publication date: 2022
  • (2019)Data WranglingEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_9(584-591)Online publication date: 20-Feb-2019
  • (2019)Data CleaningundefinedOnline publication date: 9-Jul-2019
  • (2018)Data WranglingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_9-1(1-8)Online publication date: 5-Feb-2018
  • (2018)Transforming Social Networks DataEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_389(3170-3182)Online publication date: 12-Jun-2018
  • (2017)Distributed First Order LogicArtificial Intelligence10.1016/j.artint.2017.08.008253(1-39)Online publication date: Dec-2017
  • (2017)Transforming Social Networks DataEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_389-1(1-13)Online publication date: 17-Mar-2017
  • (2014)Schema-free SQLProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2588571(1051-1062)Online publication date: 18-Jun-2014
  • (2014)XSPathIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.24726:2(485-499)Online publication date: 1-Feb-2014
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media