Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3555041.3589732acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

DIALITE: Discover, Align and Integrate Open Data Tables

Published: 05 June 2023 Publication History

Abstract

We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the-art table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze the integration result by applying different downstreaming tasks over it. Our pipeline is flexible such that the user can easily add and compare additional discovery and integration algorithms.

Supplemental Material

MP4 File
Brief description and demonstration video

References

[1]
Sonia Castelo, Rémi Rampin, Aécio S. R. Santos, Aline Bessa, Fernando Chirigati, and Juliana Freire. 2021. Auctus: A Dataset Search Engine for Data Discovery and Augmentation. Proc. VLDB Endow., Vol. 14, 12 (2021), 2791--2794.
[2]
Sara Cohen, Itzhak Fadida, Yaron Kanza, Benny Kimelfeld, and Yehoshua Sagiv. 2006. Full Disjunctions: Polynomial-Delay Iterators in Action. In VLDB 2006. ACM. http://dl.acm.org/citation.cfm?id=1164191
[3]
Mahdi Esmailoghli, Jorge-Arnulfo Quiané-Ruiz, and Ziawasch Abedjan. 2021. COCOA: COrrelation COefficient-Aware Data Augmentation. In EDBT 2021. 331--336. https://doi.org/10.5441/002/edbt.2021.30
[4]
Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée J. Miller. 2023. Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning. Proc. VLDB Endow., Vol. 16, 7 (2023), 1726--1739. https://doi.org/10.14778/3574245.3574274
[5]
Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds Mach., Vol. 30, 4 (2020), 681--694.
[6]
César A. Galindo-Legaria. 1994. Outerjoins as Disjunctions. In SIGMOD Conference 1994. ACM, 348--358. https://doi.org/10.1145/191839.191908
[7]
Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J Miller, and Mirek Riedewald. 2023. SANTOS: Relationship-based Semantic Table Union Search. Proc. ACM Manag. Data, Vol. 1, 1 (2023), Article 9. https://doi.org/10.1145/3588689
[8]
Aamod Khatiwada, Roee Shraga, Wolfgang Gatterbauer, and Renée J. Miller. 2022. Integrating Data Lake Tables. Proc. VLDB Endow., Vol. 16, 4 (2022), 932--945. https://doi.org/10.14778/3574245.3574274
[9]
Fatemeh Nargesian, Erkang Zhu, Ken Q. Pu, and Renée J. Miller. 2018. Table Union Search on Open Data. Proc. VLDB Endow., Vol. 11, 7 (2018), 813--825. https://doi.org/10.14778/3192965.3192973
[10]
Matteo Paganelli, Domenico Beneventano, Francesco Guerra, and Paolo Sottovia. 2019. Parallelizing Computations of Full Disjunctions. Big Data Research, Vol. 17 (2019), 18--31. https://doi.org/10.1016/j.bdr.2019.07.002
[11]
Anand Rajaraman and Jeffrey D. Ullman. 1996. Integrating Information by Outerjoins and Full Disjunctions (Extended Abstract). In PODS 1996. ACM.
[12]
Raghu Ramakrishnan and Johannes Gehrke. 2003. Database management systems (3. ed.). McGraw-Hill.
[13]
Roee Shraga, Haggai Roitman, Guy Feigenblat, and Mustafa Canim. [n.d.]. Web Table Retrieval using Multimodal Deep Learning. In SIGIR conference 2020. ACM.
[14]
Erkang Zhu, Dong Deng, Fatemeh Nargesian, and René e J. Miller. 2019. JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. In SIGMOD Conference 2019. ACM, 847--864. https://doi.org/10.1145/3299869.3300065
[15]
Erkang Zhu, Fatemeh Nargesian, Ken Q. Pu, and René e J. Miller. 2016. LSH Ensemble: Internet-Scale Domain Search. Proc. VLDB Endow., Vol. 9, 12 (2016), 1185--1196. https://doi.org/10.14778/2994509.2994534

Index Terms

  1. DIALITE: Discover, Align and Integrate Open Data Tables

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '23: Companion of the 2023 International Conference on Management of Data
    June 2023
    330 pages
    ISBN:9781450395076
    DOI:10.1145/3555041
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data discovery
    2. data integration
    3. data lakes
    4. full disjunction

    Qualifiers

    • Short-paper

    Data Availability

    Funding Sources

    • National Science Foundation

    Conference

    SIGMOD/PODS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 232
      Total Downloads
    • Downloads (Last 12 months)124
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media