Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3209900.3209907acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ViDeTTe Interactive Notebooks

Published: 10 June 2018 Publication History

Abstract

Interactive notebooks allow the use of popular languages, such as python, for composing data analytics projects. The interface they provide, enables data scientists to import data, analyze them and compose the results into easily readable report-like web pages, that can contain re-runnable code, visualizations and textual description of the entire process, all in one place. Scientists can then share such pages with other users in order to present their findings, collaborate and further explore the underlying data.
However, as we show in this work, interactive notebooks lack in interactivity for the reader of the resulting notebook. Users can rerun or extend the code included in a notebook but cannot directly interact with the generated visualizations in order to trigger additional computation and further explore the underlying data. This means that only code-literate readers can further interact with and extend such notebooks, while the rest can only passively read the provided report. This comes in stark contrast to OLAP data cube interfaces, which utilize user interaction to trigger additional data exploratory capabilities. Adding OLAP-like reactive functionality in notebooks further increases the required technical expertise as event-driven logic has to be added by the data analyst.
To address these issues, we propose ViDeTTe1, an engine that enhances notebooks with capabilities that benefit both data scientists and non-technical notebook readers. ViDeTTe uses a declarative language that simplifies data retrieval and data visualization for analysts. The generated visualizations are capable of collecting the reader's input and reacting to it. As the user interacts with the visualizations, ViDeTTe identifies subsequent parts of the notebook that depend on the user's input, causes reevaluation of the affected computations and propagates changes to the visualization units. By doing this, ViDeTTe offers enhanced data exploratory capabilities to readers, without requiring any coding skills, while at the same time lowering the technical expertise needed for the development of reactive notebooks.

References

[1]
Altair: Declarative Visualization in Python 2018. https://altair-viz.github.io/.
[2]
Anant Bhardwaj, Amol Deshpande, Aaron J Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang. 2015. Collaborative data analytics with DataHub. Proceedings of the VLDB Endowment 8, 12 (2015), 1916--1919.
[3]
Bokeh Development Team. 2014. Bokeh: Python library for interactive visualization. http://www.bokeh.pydata.org
[4]
Michael Bostock and Jeffrey Heer. 2009. Protovis: A graphical toolkit for visualization. IEEE transactions on visualization and computer graphics 15, 6 (2009), 1121--1128.
[5]
Andrew Crotty, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2015. Vizdom: interactive analytics through pen and touch. Proceedings of the VLDB Endowment 8, 12 (2015), 2024--2027.
[6]
D3: Data-Driven Documents 2018. https://d3js.org/.
[7]
Mark Derthick, John Kolojejchick, and Steven F Roth. 1997. An interactive visual query environment for exploring data. In Proceedings of the 10th annual ACM symposium on User interface software and technology. ACM, 189--198.
[8]
FORWARD 2018. forward.ucsd.edu/visualizations.html.
[9]
Yupeng Fu, Keith Kowalczykowski, Kian Win Ong, Yannis Papakonstantinou, and Kevin Keliang Zhao. 2010. Ajax-based report pages as incrementally rendered views. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 567--578.
[10]
Yupeng Fu, Keith Kowalczykowski, Kian Win Ong, Kevin Keliang Zhao, and Yannis Papakonstantinou. 2010. Ajax-based Report Pages as Incrementally Rendered Views. In SIGMOD Conference.
[11]
Yupeng Fu, Kian Win Ong, Yannis Papakonstantinou, and Michalis Petropoulos. 2011. The SQL-based all-declarative FORWARD web application development framework. In CIDR. 69--78.
[12]
Yupeng Fu, Kian Win Ong, Yannis Papakonstantinou, and Michalis Petropoulos. 2011. The SQL-based all-declarative FORWARD web application development framework. In CIDR. 69--78.
[13]
ggvis: Interactive grammar of graphics for R. 2018. http://ggvis.rstudio.com/.
[14]
ipyvega: IPython/Jupyter notebook module for Vega and Vega-Lite 2018. https://github.com/vega/ipyvega.
[15]
ipywidgets: Interactive widgets for the Jupyter Notebook 2018. https://ipywidgets.readthedocs.io/en/latest/.
[16]
Jupyter 2018. Jupyter. http://jupyter.org/.
[17]
Niranjan Kamat, Eugene Wu, and Arnab Nandi. 2016. TrendQuery: A System for Interactive Exploration of Trends. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA '16). ACM, New York, NY, USA, Article 12, 4 pages.
[18]
Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg. 2016. ActiveClean: Interactive Data Cleaning for Statistical Modeling. Proc. VLDB Endow. 9, 12 (Aug. 2016), 948--959.
[19]
Erietta Liarou and Stratos Idreos. 2014. dbTouch in action database kernels for touch-based data exploration. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31-April 4, 2014. 1262--1265.
[20]
Miron Livny, Raghu Ramakrishnan, Kevin Beyer, Guangshun Chen, Donko Donjerkovic, Shilpa Lawande, Jussi Myllymaki, and Kent Wenger. 1997. DEVise: integrated querying and visual exploration of large datasets. In ACM SIGMOD Record, Vol. 26. ACM, 301--312.
[21]
Plotly:Collaborative data science 2018. https://plot.ly.
[22]
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2015. Vega-lite: A grammar of interactive graphics. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis) 2017 (2015).
[23]
Arvind Satyanarayan, Ryan Russell, Jane Hoffswell, and Jeffrey Heer. 2016. Reactive vega: A streaming dataflow architecture for declarative interactive visualization. IEEE transactions on visualization and computer graphics 22, 1 (2016), 659--668.
[24]
Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya G. Parameswaran. 2016. zenvisage: Effortless Visual Data Exploration. CoRR abs/1604.03583 (2016). http://arxiv.org/abs/1604.03583
[25]
Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics 8, 1 (2002), 52--65.
[26]
Vega, A Visualization Grammar. 2018. https://vega.github.io/vega/.
[27]
Hadley Wickham. 2009. ggplot2: elegant graphics for data analysis. Springer Science & Business Media.
[28]
Leland Wilkinson. 2005. The Grammar of Graphics (Statistics and Computing). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
[29]
Yifan Wu, Joseph M. Hellerstein, and Eugene Wu. 2016. A DeVIL-ish Approach to Inconsistency in Interactive Visualizations. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA '16). ACM, New York, NY, USA, Article 15, 6 pages.
[30]
Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2015. RINSE: interactive data series exploration with ADS+. Proceedings of the VLDB Endowment 8, 12 (2015), 1912--1915.

Cited By

View all
  • (2020)Amplifying Domain Expertise in Clinical Data PipelinesJMIR Medical Informatics10.2196/196128:11(e19612)Online publication date: 5-Nov-2020
  • (2019)Evaluating interactive data systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00589-229:1(119-146)Online publication date: 13-Nov-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HILDA '18: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
June 2018
87 pages
ISBN:9781450358279
DOI:10.1145/3209900
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Exploration
  2. Jupyter Notebooks
  3. Reactive Visualizations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Amplifying Domain Expertise in Clinical Data PipelinesJMIR Medical Informatics10.2196/196128:11(e19612)Online publication date: 5-Nov-2020
  • (2019)Evaluating interactive data systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00589-229:1(119-146)Online publication date: 13-Nov-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media