Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3366030.3366130acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
short-paper

ToT for CSV: accessing open data CSV files through SQL

Published: 22 February 2020 Publication History

Abstract

Recently, the push for open data has been very strong, and more and more sources, such as governments are sharing data such as weather records or demographic statistics. The Remote Table Access (RTA) system allows the easy publication of data from a relational database, and its use through SQL-like queries by remote users. Still, the data currently being shared as open data comes in many formats, not always directly integrable with relational databases, and many sources publish data as raw CSV, XML or even PDF files. These files then need to be downloaded, parsed, and integrated with the final user's data, often in a relational database. In this work, we present Table on Top (ToT) for CSV, an extension of the RTA system that allows the easy publication and access of data contained in CSV files through RTA.

References

[1]
Shu Murakami, Yusuke Kosaka, Kento Goto, Motomichi Toyama, "RTA: Proposal of Direct Querying Mechanism to Public Type Table, DEIM2017 The 9th Forum on Data Engineering and Information Management(The 15th Annual Meeting of the Japan Database Society), Takayama Green Hotel, Gihu, March 2017.
[2]
Dennis Heimbigner, Boulder Dennis McLeod: "A federated architecture for information management", Journal of ACM Transactions on Information Systems (TOIS), 1985
[3]
Amit P. Sheth, James A. Larson: "Federated database systems for managing distributed, heterogeneous, and autonomous databases", Journal of ACM Computing Surveys (CSUR), 1990
[4]
Laszlo Dobos, Istvan Csabai: "Graywulf: A platform for federated scientific databases and services", Proceedings of the 25th International Conference on Scientific and Statistical Database Management Article (SSDBM) No.30, 2013
[5]
Youzhong Ma, Jia Rao, Weisong Hu, Xiaofeng Meng, Xu Han, Yu Zhang, Yunpeng Chai, Chunqiu Liu: "An Efficient Index for Massive IOT Data in Cloud Environment", Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM) 2012
[6]
Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte: "F1: A Distributed SQL Database That Scales", Proceedings of the VLDB Endowment 2013
[7]
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford: "Spanner: Google's Globally Distributed Database", ACM Transactions on Computer Systems 2013
[8]
Postal code data: http://www.post.japanpost.jp/zipcode/download.html
[9]
Stock data: http://k-db.com/
[10]
JSQLParser: https://github.com/JSQLParser/JSqlParser
[11]
Microsoft RDA: https://msdn.microsoft.com/ja-jp/library/cc414853.aspx
[12]
FEDERATED storage engine: https://dev.mysql.com/doc/refman/5.6/ja/federated-storage-engine.html
[13]
postgres_fdw: https://www.postgresql.jp/document/9.3/html/postgres-fdw.html
[14]
Data.gov: http://data.gov/
[15]
Data.gov.uk: http://data.gov.uk/
[16]
Data.gov.jp: http://www.data.go.jp/
[17]
2014: The Year of CSV | News, Open Data Institute. [Online]. Available: https://theodi.org/blog/2014-the-year-of-csv. [Accessed: 15-Jul-2016].
[18]
T. Davies, R. M. Sharif, and J. M. Alonso, "Open Data Barometer Global Report," World Wide Web Found., 2015.
[19]
Y. Shafranovich, "Common format and MIME type for comma-separated values (CSV) files," 2005.
[20]
Till Dohmen, Hannes Muhleisen, Peter Boncz "Multi-Hypothesis CSV Parsing", SSDBM '17 Proceedings of the 29th International Conference on Scientific and Statistical Database Management Article No. 16
[21]
Wirawit Chaochaisit, Ken Sakamura, Noboru Koshizuka, Masahiro Bessho, "CSV-X: A Linked Data Enabled Schema Language, Model, and Processing Engine for Non-Uniform CSV", iThings/GreenCom/CPSCom/SmartData 2016: 795--804
[22]
Local government open data CKAN https://ckan.open-governmentdata.org/dataset/

Cited By

View all
  • (2020)Model-Driven Development of Web APIs to Access Integrated Tabular Open DataIEEE Access10.1109/ACCESS.2020.30364628(202669-202686)Online publication date: 2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
December 2019
709 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • JKU: Johannes Kepler Universität Linz
  • @WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CSV
  2. Open data
  3. SQL

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

iiWAS2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Model-Driven Development of Web APIs to Access Integrated Tabular Open DataIEEE Access10.1109/ACCESS.2020.30364628(202669-202686)Online publication date: 2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media