Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1182635.1164157acmconferencesArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
Article

Multi-column substring matching for database schema translation

Published: 01 September 2006 Publication History

Abstract

We describe a method for discovering complex schema translations involving substrings from multiple database columns. The method does not require a training set of instances linked across databases and it is capable of dealing with both fixed-and variable-length field columns. We propose an iterative algorithm that deduces the correct sequence of concatenations of column substrings in order to translate from one database to another. We introduce the algorithm along with examples on common database data values and examine its performance on real-world and synthetic datasets.

References

[1]
{1} P. Carreira and H. Galhardas. Execution of data mappers. In Intl. Workshop on Information Quality in Info. Sys., pages 2-9, 2004.
[2]
{2} S. Chaudhuri, K. Ganjam, V. Ganti, and R. M. ani. Robust and efficient fuzzy match for online data cleaning. In Intl. Conf. ACM SIGMOD, pages 313-324, 2003.
[3]
{3} R. Dhamankar, Y. Lee, A. Doan, A. Halevy, and P. Domingos. imap: discovering complex semantic matches between database schemas. In Intl. Conf. ACM SIGMOD, pages 383-394, 2004.
[4]
{4} A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: a machine-learning approach. In Intl. Conf. ACM SIGMOD, page 509, 2001.
[5]
{5} D. W. Embley, L. Xu, and Y. Ding. Automatic direct and indirect schema mapping: experiences and lessons learned. SIGMOD Rec., 33(4):14-19, 2004.
[6]
{6} G. H. L. Fletcher. The data mapping problem: Algorithmic and logical characterizations. In Workshop on Databases For Next Generation Researchers at ICDE, 2005.
[7]
{7} L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava. Text joins in an rdbms for web data integration. In Intl. WWW Conference, pages 90-101, 2003.
[8]
{8} D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Comm. ACM, 18(6):341-343, 1975.
[9]
{9} J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences. Comm. ACM, 20(5):350-353, 1977.
[10]
{10} N. Koudas, A. Marathe, and D. Srivastava. Flexible string matching against large databases in practice. In VLDB, pages 1078-1086, 2004.
[11]
{11} V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics - Doklady, 10(8):707-710, Feb. 1966.
[12]
{12} J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. In Intl. Conf. VLDB, page 49, 2001.
[13]
{13} B. Momjian. PostgreSQL: introduction and concepts. Addison Wesley, 2001.
[14]
{14} A. E. Monge and C. Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records. In DMKD, pages 0-, 1997.
[15]
{15} M. S. Paterson and V. Dancik. Longest common subsequences. In Math. Foundations of Comp. Sci., pages 127-142, 1994.
[16]
{16} E. Rahm and P. Bernstein. On matching schemas automatically. Technical Report MSR-TR-2001-17, Microsoft Research, Feb. 2001.
[17]
{17} E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10(4):334-350, 2001.
[18]
{18} G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Indexing. Comm. ACM, 18(11):613, 1975.
[19]
{19} L. Seligman, A. Rosenthal, P. Lehner, and A. Smith. Data integration: Where does the time go?, Nov. 2005.
[20]
{20} E. Ukkonen. Approximate string-matching with q-grams and maximal matches. Theor. Comp. Sci., 92(1):191-211, 1992.
[21]
{21} L. L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In Intl. Conf. ACM SIGMOD, pages 485-496, 2001.
[22]
{22} M. D. Young-Lai and F. Tompa. Stochastic grammatical inference of text database structure. Machine Learning, 40:111-137, 2000.

Cited By

View all
  • (2023)Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema GraphProceedings of the VLDB Endowment10.14778/3603581.360359616:10(2578-2590)Online publication date: 1-Jun-2023
  • (2021)Auto-FuzzyJoinProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452824(1064-1076)Online publication date: 9-Jun-2021
  • (2018)Scalable Entity Resolution Using Probabilistic Signatures on Parallel DatabasesProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3272016(2213-2221)Online publication date: 17-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
VLDB '06: Proceedings of the 32nd international conference on Very large data bases
September 2006
1269 pages

Sponsors

  • SIGMOD: ACM Special Interest Group on Management of Data
  • K.I.S.S. SIG on Databases
  • AJU Information Technology Co., Ltd
  • US Army ITC-PAC Asian Research Office
  • Google Inc.
  • The Database Society of Japan
  • Samsung SOS
  • Advanced Information Technology Research Center
  • Naver
  • Microsoft: Microsoft
  • Korea Info Sci Society: Korea Information Science Society
  • SK telecom
  • Systems Applications Products
  • ORACLE: ORACLE
  • International Business Management
  • Air Force Office of Scientific Research/Asian Office of Aerospace R&D
  • Kosef
  • Kaist
  • LG Electronics
  • CCF-DBS

Publisher

VLDB Endowment

Publication History

Published: 01 September 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema GraphProceedings of the VLDB Endowment10.14778/3603581.360359616:10(2578-2590)Online publication date: 1-Jun-2023
  • (2021)Auto-FuzzyJoinProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452824(1064-1076)Online publication date: 9-Jun-2021
  • (2018)Scalable Entity Resolution Using Probabilistic Signatures on Parallel DatabasesProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3272016(2213-2221)Online publication date: 17-Oct-2018
  • (2017)Auto-joinProceedings of the VLDB Endowment10.14778/3115404.311540910:10(1034-1045)Online publication date: 1-Jun-2017
  • (2016)String similarity search and joinFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-015-5900-510:3(399-417)Online publication date: 1-Jun-2016
  • (2012)Learning semantic string transformations from examplesProceedings of the VLDB Endowment10.14778/2212351.22123565:8(740-751)Online publication date: 1-Apr-2012
  • (2012)Appearance-Order-Based schema matchingProceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I10.1007/978-3-642-29038-1_8(79-94)Online publication date: 15-Apr-2012
  • (2011)Discovering implicit categorical semantics for schema matchingProceedings of the 16th international conference on Database systems for advanced applications: Part II10.5555/1997251.1997269(179-194)Online publication date: 22-Apr-2011
  • (2010)Top-k generation of mediated schemas over multiple data sourcesProceedings of the 15th international conference on Database systems for advanced applications10.5555/1880853.1880871(143-155)Online publication date: 1-Apr-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media