Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3340531.3417435acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

INforE: Interactive Cross-platform Analytics for Everyone

Published: 19 October 2020 Publication History

Abstract

We present INforE, a prototype supporting non-expert programmers in performing optimized, cross-platform, streaming analytics at scale. INforE offers: a) a new extension to the RapidMiner Studio for graphical design of Big streaming Data workflows, (b) a novel optimizer to instruct the execution of workflows across Big Data platforms and clusters, (c) a synopses data engine for interactivity at scale via the use of data summaries, (d) a distributed, online data mining and machine learning module. To our knowledge INforE is the first holistic approach in streaming settings. We demonstrate INforE in the fields of life science and financial data analysis.

Supplementary Material

MP4 File (3340531.3417435.mp4)
Presentation video.

References

[1]
O. Alipourfard, H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In NSDI.
[2]
A. Benczúr, L. Kocsis, and R. Pálovics. 2018. Online machine learning in big data streams. arXiv:1802.05872 (2018).
[3]
J. Cumbers. 2019. How The Cloud Can Solve Life Science's Big Data Problem. https://www.forbes.com/sites/johncumbers/2019/12/19/how-the-cloud-can-solve-life-sciences-big-data-problem/. [Online; accessed 25-Aug-2020].
[4]
K. Doka, N. Papailiou, D. Tsoumakos, C. Mantas, and N. Koziris. 2015. IReS: Intelligent, Multi-Engine Resource Scheduler for Big Data Analytics Workflows. In SIGMOD.
[5]
A. J. Elmore, J. Duggan, M. Stonebraker, and et al. 2015. A Demonstration of the BigDAWG Polystore System. Proc. VLDB Endow., Vol. 8, 12 (2015).
[6]
I. Flouris, N. Giatrakos, A. Deligiannakis, and M. Garofalakis. 2020. Network-wide complex event processing over geographically distributed data sources. Inf. Syst., Vol. 88 (2020).
[7]
M. Garofalakis, J. Gehrke, and R. Rastogi. 2016. Data Stream Management: A Brave New World. In Data Stream Management - Processing High-Speed Data Streams. Springer.
[8]
N. Giatrakos, N. Katzouris, A. Deligiannakis, and et al. 2019. Interactive Extreme--Scale Analytics Towards Battling Cancer. IEEE Technol. Soc. Mag., Vol. 38, 2 (2019).
[9]
I. Gog, M. Schwarzkopf, N. Crooks, M. P. Grosvenor, A. Clement, and S. Hand. 2015. Musketeer: all for one, one for all in data processing systems. In EuroSys.
[10]
T. Groenfeldt. 2013. At NYSE, The Data Deluge Overwhelms Traditional Databases. https://www.forbes.com/sites/tomgroenfeldt/2013/02/14/at-nyse-the-data-deluge-overwhelms-traditional-databases/. [Online; accessed 25-Aug-2020].
[11]
J. Heer and S. Kandel. 2012. Interactive Analysis of Big Data. ACM Crossroads, Vol. 19, 1 (2012).
[12]
A. Kontaxakis, N. Giatrakos, and A. Deligiannakis. 2020. A Synopses Data Engine for Interactive Extreme-Scale Analytics. In CIKM.
[13]
G. Letort, A. Montagud, G. Stoll, R. Heiland, E. Barillot, P. Macklin, A. Zinovyev, and L. Calzone. 2019. PhysiBoSS: a multi-scale agent-based modelling framework integrating physical dimension and cell signalling. Bioinform., Vol. 35, 7 (2019).
[14]
J. Lucas, Y. Idris, B. C. Rojas, J. A. Q. Ruiz, and S. Chawla. 2018. RheemStudio: Cross-Platform Data Analytics Made Easy. In ICDE.
[15]
M.Li, D. Andersen, J. W. Park, and et al. 2014. Scaling Distributed Machine Learning with the Parameter Server. In OSDI.
[16]
J. Silva, E. Faria, R. Barros, E. Hruschka, A. Carvalho, and J. Gama. 2013. Data Stream Clustering: A Survey. ACM Comput. Surv., Vol. 46, 1 (2013).
[17]
V. Zhang. 2019. The Rise of the Financial Data Scientist". https://www.nasdaq.com/articles/the-rise-of-the-financial-data-scientist-2019-09--27. [Online; accessed 25-Aug-2020].

Cited By

View all
  • (2024)HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00024(221-234)Online publication date: 13-May-2024
  • (2021)EasyFlinkCEPProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482094(3029-3033)Online publication date: 26-Oct-2021
  • (2021)Processing Big Data in Motion: Core Components and System Architectures with Applications to the Maritime DomainTechnologies and Applications for Big Data Value10.1007/978-3-030-78307-5_22(497-518)Online publication date: 1-Jul-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-platform analytics
  2. data streams
  3. interactive big data analytics

Qualifiers

  • Short-paper

Funding Sources

Conference

CIKM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HYPPO: Using Equivalences to Optimize Pipelines in Exploratory Machine Learning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00024(221-234)Online publication date: 13-May-2024
  • (2021)EasyFlinkCEPProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482094(3029-3033)Online publication date: 26-Oct-2021
  • (2021)Processing Big Data in Motion: Core Components and System Architectures with Applications to the Maritime DomainTechnologies and Applications for Big Data Value10.1007/978-3-030-78307-5_22(497-518)Online publication date: 1-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media