Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3658271.3658322acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article
Open access

HKPoly: A Polystore Architecture to Support Data Linkage and Queries on Distributed and Heterogeneous Data

Published: 23 May 2024 Publication History

Abstract

Context: Modern information systems commonly manipulate heterogeneous data and schemas fragmented in the data stores that best fit their storage and access requirements. Besides, different organizations’ business processes independently consume these fragments without explicit links between the employed data.
Problem: Supporting heterogeneous and not explicitly connected data residing in distinct data repositories is a big challenge.
Solution: This work proposes HKPoly: a federated architecture that encapsulates data heterogeneity, location, and linkage.
IS Theory: We employed the Representation theory to create the models of the architecture and its components.
Method: Architecture implementation, its application in an Oil & Gas scenario, and its comparison to a multi-database system.
Results: The proposal allows query writing to be two times less complex than the one written for the relational multi-database system, adding an excess of about 30% in query processing time.
Contributions: An architecture to query heterogeneous data, the requirements and components for its implementation, and an implementation example using the stated-of-the-art concepts.

Supplemental Material

PDF File
Details about the paper that was not included in the paper submission to SBSI 2024 due to lack of space.

References

[1]
Jurgen Angele and Michael Gesmann. 2006. Data integration using semantic technology: a use case. In 2006 Second International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML’06). IEEE, 58–66. https://doi.org/10.1109/RULEML.2006.9
[2]
Leonardo Guerreiro Azevedo, Elton Figueiredo de Souza Soares, Renan Souza, and Marcio Ferreira Moreno. 2020. Modern Federated Database Systems: An Overview. In 22nd International Conference in Enterprise Information Systems (ICEIS). 276–283. https://doi.org/10.5220/0009795402760283
[3]
Michael J. Carey, Laura M. Haas, Peter M. Schwarz, 1995. Towards heterogeneous multimedia information systems: The Garlic approach. In Proceedings 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management. IEEE, 124–131. https://doi.org/10.1109/RIDE.1995.378736
[4]
Daniel Salles Chevitarese, Daniela Szwarcman, Emilio Vital Brazil, and Bianca Zadrozny. 2018. Efficient classification of seismic textures. In 2018 Intl. Joint Conf. on Neural Networks. IEEE, 1–8. https://doi.org/10.1109/IJCNN.2018.8489654
[5]
Flavio Costa, Vítor Silva, Daniel Oliveira, and et al.2013. Capturing and querying workflow runtime provenance with PROV: a practical approach. In EDBT/ICDT workshops. https://doi.org/10.1145/2457317.2457365
[6]
Susan B Davidson and Juliana Freire. 2008. Provenance and scientific workflows: challenges and opportunities. In The 2008 ACM SIGMOD international conference on Management of data. 1345–1350. https://doi.org/10.1145/1376616.1376772
[7]
Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, and Riccardo Rosati. 2018. Using Ontologies for Semantic Data Integration. In A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Springer, 187–202. https://doi.org/10.1007/978-3-319-61893-7_11
[8]
Lisa Ehrlinger and Wolfram Wöß. 2016. Towards a definition of knowledge graph. https://ceur-ws.org/Vol-1695/paper4.pdf?ref=https://githubhelp.com. SEMANTICS 2016: Posters and Demos Track 48, 1-4 (2016), 2.
[9]
Vijay Gadepally, Peinan Chen, Jennie Duggan, and et al.2016. The bigdawg polystore system and architecture. In 2016 IEEE High Performance Extreme Computing Conf. (HPEC). 1–6. https://doi.org/10.1109/HPEC.2016.7761636
[10]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM symposium on Operating systems principles. 29–43. https://doi.org/10.1145/945445.945450
[11]
Yolanda Gil, Suzanne A Pierce, Hassan Babaie, Arindam Banerjee, 2018. Intelligent systems for geosciences: an essential research agenda. Commun. ACM 62, 1 (2018), 76–84. https://doi.org/10.1145/3192335
[12]
Paul Groth and Luc Moreau. 2020. W3C PROV: an overview of the PROV family of documents. https://www.w3.org/TR/prov-overview
[13]
Laura M Haas, Eileen Tien Lin, and Mary A Roth. 2002. Data integration through database federation. IBM Systems Journal 41, 4 (2002), 578–596. https://doi.org/10.1147/sj.414.0578
[14]
Laura M. Haas, Peter M. Schwarz, Prasad Kodali, and et al.2001. DiscoveryLink: A system for integrated access to life sciences data sources. IBM systems Journal 40, 2 (2001), 489–511. https://doi.org/10.1147/sj.402.0489
[15]
Torsten Hoefler and Roberto Belli. 2015. Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In The International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12. https://doi.org/10.1145/2807591.2807644
[16]
Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S. Yu Philip. 2021. A survey on knowledge graphs: representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems 33, 2 (2021), 494–514. https://doi.org/10.1109/TNNLS.2021.3070843
[17]
Andreas Langegger, Wolfram Wöß, and Martin Blöchl. 2008. A semantic web middleware for virtual data integration on the web. In European Semantic Web Conference. Springer, 493–507. https://doi.org/10.1007/978-3-540-68234-9_37
[18]
Danh Le-Phuoc, Hoan Quoc Nguyen-Mau, Josiane Xavier Parreira, and et al.2012. A middleware framework for scalable management of linked streams. Journal of Web Semantics 16 (2012), 42–51. https://doi.org/10.1016/j.websem.2012.06.003
[19]
Jim Melton, Jan-Eike Michels, Vanja Josifovski, Krishna Kulkarni, Peter Schwarz, and Kathy Zeidenstein. 2001. SQL and management of external data. ACM SIGMOD Record 30, 1 (2001), 70–77. https://doi.org/10.1145/373626.373709
[20]
Marcio Ferreira Moreno, Rafael Brandao, and Renato Cerqueira. 2017. Extending Hypermedia Conceptual Models to Support Hyperknowledge Specifications. International Journal of Semantic Computing 11, 01 (March 2017), 43–64. https://doi.org/10.1142/S1793351X17400037
[21]
Márcio Ferreira Moreno, Polyana Costa, Rodrigo Costa, Vitor Nascimento, Elton FS Soares, and Marcelo Machado. 2021. A Hyperknowledge Approach to Support Dataset Engineering. In ISWC (Posters/Demos/Industry).
[22]
Marcio Ferreira Moreno, Rodrigo Santos, Wallas Santos, and et al.2018. Handling hyperknowledge representations through an interactive visual approach. In 2018 IEEE International Conference on Information Reuse and Integration (IRI). IEEE, 139–146. https://doi.org/10.1109/IRI.2018.00029
[23]
Anthony I. Otuonye. 2021. Cloud-Based Enterprise Resource Planning for Sustainable Growth of SMES in Third World Countries. Intl. Journal of Computer Science and Information Security 19, 5 (2021). https://doi.org/10.5281/zenodo.4900658
[24]
M. Tamer Özsu and Patrick Valduriez. 2020. Principles of distributed database systems (4th ed.). Springer. https://doi.org/10.1007/978-3-030-26253-2
[25]
Samira Pouyanfar, Yimin Yang, Shu-Ching Chen, Mei-Ling Shyu, and SS Iyengar. 2018. Multimedia big data analytics: A survey. ACM computing surveys (CSUR) 51, 1 (2018), 1–34. https://doi.org/10.1145/3150226
[26]
Eric Prud and Andy Seaborne. 2008. SPARQL Query Language for RDF. https://www.w3.org/TR/rdf-sparql-query/ Accessed in April 12st, 2021.
[27]
Trygve Randen, Erik Monsen, Claude Signer, Arve Abrahamsen, Jan Ove Hansen, Toril Sæter, and Jürgen Schlaf. 2000. Three-dimensional texture attributes for seismic data analysis. In SEG Technical Program Expanded Abstracts 2000. Society of Exploration Geophysicists, 668–671.
[28]
Leonard Richardson and Sam Ruby. 2007. RESTful Web Services. O′ Reilly.
[29]
Amit Singhal. 2012. Introducing the knowledge graph: thing, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not. Accessed in June 25st, 2022.
[30]
Renan Souza, Leonardo Azevedo, Raphael Thiago, Elton Soares, and et al.2019. Efficient Runtime Capture of Multiworkflow Data Using Provenance. In 2019 15th International Conference on eScience (eScience). 359–368. https://doi.org/10.1109/eScience.2019.00047
[31]
Renan Souza, Leonardo G. Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão, Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, and Marco A. S. Netto. 2022. Workflow Provenance in the Lifecycle of Scientific Machine Learning. Concurrency and Computation: Practice and Experience 34, 14 (2022), e6544. https://doi.org/10.1002/cpe.6544
[32]
Michael Stonebraker. 2015. The Case for Polystore. https://wp.sigmod.org/?p=1629.
[33]
Made Agus Putra Subali and Siti Rochimah. 2018. A new model for measuring the complexity of SQL commands. In 10th International Conference on Information Technology and Electrical Engineering (ICITEE). 1–5. https://doi.org/10.1109/ICITEED.2018.8534782
[34]
Ran Tan, Rada Chirkova, Vijay Gadepally, and Timothy G Mattson. 2017. Enabling query processing across heterogeneous data models: A survey. In IEEE Intl. Conf. on Big Data (Big Data). IEEE, 3211–3220. https://doi.org/10.1109/BigData.2017.8258302
[35]
Aditya Vashistha and Shrainik Jain. 2016. Measuring query complexity in SQLShare workload. In The International Conference on Management of Data.
[36]
Xuefei Wang, Ruohang Feng, Wei Dong, Xiaoqian Zhu, and Wenke Wang. 2017. Unified Access Layer with PostgreSQL FDW for Heterogeneous Databases. In IFIP International Conference on Network and Parallel Computing. Springer, 131–135. https://doi.org/10.1007/978-3-319-68210-5_14
[37]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, 2019. Spider: a Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arxiv:1809.08887 [cs.CL]

Index Terms

  1. HKPoly: A Polystore Architecture to Support Data Linkage and Queries on Distributed and Heterogeneous Data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SBSI '24: Proceedings of the 20th Brazilian Symposium on Information Systems
    May 2024
    708 pages
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 May 2024

    Check for updates

    Author Tags

    1. Business process
    2. Database integration
    3. Distributed databases
    4. Microservices
    5. Provenance.
    6. Query processing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Data Availability

    Details about the paper that was not included in the paper submission to SBSI 2024 due to lack of space. https://dl.acm.org/doi/10.1145/3658271.3658322#sbsi24-50-appendix.pdf

    Conference

    SBSI '24
    SBSI '24: XX Brazilian Symposium on Information Systems
    May 20 - 23, 2024
    Juiz de Fora, Brazil

    Acceptance Rates

    Overall Acceptance Rate 181 of 557 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 186
      Total Downloads
    • Downloads (Last 12 months)186
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 28 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media