Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3209900.3209913acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Human-in-the-Loop Data Analysis: A Personal Perspective

Published: 10 June 2018 Publication History

Abstract

In the past few years human-in-the-loop data analysis (HILDA) has received significant growing attention. Most HILDA works have focused on concrete problems. In this paper I take a step back and discuss several "big picture" questions regarding HILDA. First, I discuss problems that I believe should fall under the scope of the field, including some that have received little attention, such as fostering user communities that develop data repositories and tools. Next, I discuss important aspects in developing HILDA solutions that I believe should receive more attention. These include solving problems that real users care about, developing how-to guides to users, building end-to-end systems (such as extending the "Pandas system"), developing challenges and benchmarks, and developing a theory of human data interaction. Finally, I speculate about the future of the field, and discuss the dangers it can face, given that many other communities are also working on related problems. I argue that a focus on end-to-end problems and system building is important for us to thrive and make significant impacts.

References

[1]
Daniel J. Abadi et al. 2014. The Beckman Report on Database Research. SIGMOD Record 43, 3 (2014), 61--70.
[2]
Christopher Ahlberg et al. 2003. Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In The Craft of Information Visualization. 7--13.
[3]
Leilani Battle et al. 2017. Position Statement: The Case for A Visualization Performance Benchmark. In DSIA Workshop.
[4]
Anant P. Bhardwaj et al. 2015. DataHub: Collaborative Data Science & Dataset Version Management at Scale. In CIDR.
[5]
Matthew Brehmer and Tamara Munzner. 2013. A Multi-Level Typology of Abstract Visualization Tasks. IEEE Trans. Vis. Comput. Graph. 19, 12 (2013), 2376--2385.
[6]
Eric Chu et al. 2009. Combining keyword search and forms for ad hoc querying of databases. In SIGMOD.
[7]
Andy Crabtree and Richard Mortier. 2015. Human Data Interaction: Historical Lessons from Social Studies and CSCW. In ECSCW The 14th European Conference on Computer Supported Cooperative Work.
[8]
Sanjib Das et al. 2017. Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services. In SIGMOD.
[9]
AnHai Doan et al. 2007. User-Centric Research Challenges in Community Information Management Systems. IEEE Data Eng. Bull. 30, 2 (2007), 32--40.
[10]
AnHai Doan et al. 2017. Human-in-the-Loop Challenges for Entity Matching: A Midterm Report. In The 2nd Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2017.
[11]
AnHai Doan, Alon Y. Halevy, and Zachary G. Ives. 2012. Principles of Data Integration. Morgan Kaufmann.
[12]
Philipp Eichmann et al. 2016. Towards a Benchmark for Interactive Data Exploration. IEEE Data Eng. Bull. 39, 4 (2016), 50--61.
[13]
Yash Govind et al. 2017. CloudMatcher: A Cloud/Crowd Service for Entity Matching. In BIGDAS.
[14]
Zachary G. Ives et al. 2015. Looking at Everything in Context. In CIDR.
[15]
Jianfeng Jia et al. 2016. Towards interactive analytics and visualization on one billion tweets. In GIS.
[16]
Daniel Kahneman and Amos Tversky. 1979. Prospect Theory: An Analysis of Decision Under Risk. Econometrica 47, 2 (1979), 263--292.
[17]
Sean Kandel et al. 2011. Wrangler: interactive visual specification of data transformation scripts. In CHI.
[18]
Sean Kandel et al. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Trans. Vis. Comput. Graph. 18, 12 (2012), 2917--2926.
[19]
Albert Kim et al. 2015. Rapid Sampling for Visualizations with Ordering Guarantees. PVLDB 8, 5 (2015), 521--532.
[20]
Pradap Konda et al. 2016. Magellan: Toward Building Entity Matching Management Systems. PVLDB 9, 12 (2016), 1197--1208.
[21]
Yongjoo Park et al. 2016. Visualization-aware sampling for very large databases. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016.
[22]
Peter Pirolli and Stuart K. Card. 1995. Information Foraging in Information Access Environments. In CHI.
[23]
Ben Shneiderman. 2003. The eyes have it: a task by data type taxonomy for information visualizations. In The Craft of Information Visualization. 364--371.
[24]
Wang-Chiew Tan et al. 2018. Big Gorilla: an Open-source Ecosystem for Data Preparation and Integration. In IEEE Data Engineering Bulletin, Special Issue on Data Integration.
[25]
Richard Thaler. 1980. Toward a Positive Theory of Consumer Choice. Journal of Economic Behavior and Organization 1, 1 (1980), 39--60.
[26]
Eugene Wu et al. 2015. Towards perception-aware interactive data visualization systems. In DSIA Workshop.

Cited By

View all
  • (2024)Computationally Aware Surrogate Models for the Hydrodynamic Response Characterization of Floating Spar-Type Offshore Wind TurbineIEEE Access10.1109/ACCESS.2023.334387412(6494-6517)Online publication date: 2024
  • (2023)A Human-in-the-Loop Segmented Mixed-Effects Modeling Method for Analyzing Wearables DataACM Transactions on Management Information Systems10.1145/356427614:2(1-17)Online publication date: 25-Jan-2023
  • (2023)AI Assistants: A Framework for Semi-Automated Data WranglingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322253835:9(9295-9306)Online publication date: 1-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HILDA '18: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
June 2018
87 pages
ISBN:9781450358279
DOI:10.1145/3209900
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)11
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computationally Aware Surrogate Models for the Hydrodynamic Response Characterization of Floating Spar-Type Offshore Wind TurbineIEEE Access10.1109/ACCESS.2023.334387412(6494-6517)Online publication date: 2024
  • (2023)A Human-in-the-Loop Segmented Mixed-Effects Modeling Method for Analyzing Wearables DataACM Transactions on Management Information Systems10.1145/356427614:2(1-17)Online publication date: 25-Jan-2023
  • (2023)AI Assistants: A Framework for Semi-Automated Data WranglingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322253835:9(9295-9306)Online publication date: 1-Sep-2023
  • (2023)A big data exploration approach to exploit in-vehicle data for smart road maintenanceFuture Generation Computer Systems10.1016/j.future.2023.08.004149(701-716)Online publication date: Dec-2023
  • (2023)Supporting Provenance and Data Awareness in Exploratory Process MiningAdvanced Information Systems Engineering10.1007/978-3-031-34560-9_27(454-470)Online publication date: 12-Jun-2023
  • (2022)HumanALProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3546930.3547496(1-8)Online publication date: 12-Jun-2022
  • (2022)Forgetting Practices in the Data SciencesProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517644(1-19)Online publication date: 29-Apr-2022
  • (2022)Researches advanced in human-computer collaboration and human-machine cooperation: from variances to common prospect2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022)10.1117/12.2641865(142)Online publication date: 10-Nov-2022
  • (2022)A survey of human-in-the-loop for machine learningFuture Generation Computer Systems10.1016/j.future.2022.05.014135(364-381)Online publication date: Oct-2022
  • (2022)Intelligent buildings with IoT systems using ML and HITL for indoor environmental control: an investigation of occupants’ adoption intentSN Business & Economics10.1007/s43546-021-00191-12:3Online publication date: 11-Feb-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media