Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
FastCat Catalogues: Interactive Entity-Based Exploratory Analysis of Archival Documents
JCDL '23: Proceedings of the 2023 ACM/IEEE Joint Conference on Digital LibrariesPages 190–194https://doi.org/10.1109/JCDL57899.2023.00035We describe FastCat Catalogues, a web application that supports researchers studying archival material, such as historians, in exploring and quantitatively analysing the data (transcripts) of archival documents. The application was designed based on real ...
- short-paperJune 2024
ShiftScope: Adapting Visualization Recommendations to Users' Dynamic Data Focus
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 536–539https://doi.org/10.1145/3626246.3654753Visualization Recommendation Systems help users discover important insights during data exploration. These systems should understand users' exploration behaviors and goals to suggest relevant visualizations. However, users' mental models constantly ...
- short-paperJune 2024
ASQP-RL Demo: Learning Approximation Sets for Exploratory Queries
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 452–455https://doi.org/10.1145/3626246.3654741We demonstrate the Approximate Selection Query Processing (ASQP-RL) system, which uses Reinforcement Learning to select a subset of a large external dataset to process locally in a notebook during data exploration. Given a query workload over an external ...
- short-paperAugust 2024
Constrained Approximate Query Processing with Error and Response Time-Bound Guarantees for Efficient Big Data Analytics
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed ComputingPages 373–376https://doi.org/10.1145/3625549.3658824Approximate query processing (AQP) is a technique for obtaining approximate answers to queries over large datasets. AQP techniques trade off accuracy for speed, making them ideal for scenarios where exact answers are not required or the cost of obtaining ...
- research-articleDecember 2023
Univariate exploratory data analysis of satellite telemetry
- Mv Ramachandra Praveen,
- Sushabhan Choudhury,
- Piyush Kuchhal,
- Rajesh Singh,
- Purnendu Shekhar Pandey,
- Antonino Galletta
International Journal of Satellite Communications and Networking (WSAT), Volume 42, Issue 1Pages 57–85https://doi.org/10.1002/sat.1498SummaryLarge low Earth orbit satellite constellations require machine learning methods for enabling autonomy in health keeping of the satellites. Autonomy in health keeping entail's fault detection, isolation and reconfiguration. However, prior to model ...
-
- ArticleSeptember 2023
Construct Hunting in GovTech Research: An Exploratory Data Analysis
AbstractThe concept of “GovTech” has emerged as a business-oriented model and practice for enabling the public sector to take advantage of digital solutions as service towards the citizen, while the private for-profit sector is responsible for innovation, ...
- research-articleAugust 2023
Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees
- Stephanie Brink,
- Michael McKinsey,
- David Boehme,
- Connor Scully-Allison,
- Ian Lumsden,
- Daryl Hawkins,
- Treece Burgess,
- Vanessa Lama,
- Jakob Lüttgau,
- Katherine E. Isaacs,
- Michela Taufer,
- Olga Pearce
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingPages 281–293https://doi.org/10.1145/3588195.3592989Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a ...
- research-articleJuly 2023
Finding the forest in the trees: Enabling performance optimization on heterogeneous architectures through data science analysis of ensemble performance data
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 3-4Pages 434–441https://doi.org/10.1177/10943420231175687In this work, we develop novel data science methodologies for ensemble performance data that have the potential to uncover orders of magnitude of performance that is unknowingly being left on the table. Building on years of successful performance tool ...
- short-paperJune 2023
Fast Natural Language Based Data Exploration with Samples
SIGMOD '23: Companion of the 2023 International Conference on Management of DataPages 155–158https://doi.org/10.1145/3555041.3589724The ability to extract insights from large amounts of data in a timely manner is a crucial problem. Exploratory Data Analysis (EDA) is commonly used by analysts to uncover insights using a sequence of SQL commands and associated visualizations. However, ...
- demonstrationApril 2023
Weedle: Composable Dashboard for Data-Centric NLP in Computational Notebooks
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023Pages 132–135https://doi.org/10.1145/3543873.3587330Data-centric NLP is a highly iterative process requiring careful exploration of text data throughout entire model development lifecycle. Unfortunately, existing data exploration tools are not suitable to support data-centric NLP because of workflow ...
- extended-abstractApril 2023
A Case Study on Scaffolding Exploratory Data Analysis for AI Pair Programmers
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsArticle No.: 561, Pages 1–7https://doi.org/10.1145/3544549.3583943Recent advances in automatic code generation have made tools like GitHub Copilot attractive for programmers, as they allow for the creation of code blocks by simply providing descriptive prompts to the AI. While researchers have studied the performance ...
- short-paperNovember 2022
Spatially weighted structural similarity index: a multiscale comparison tool for diverse sources of mobility data
HANIMOB '22: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Animal Movement Ecology and Human MobilityPages 19–22https://doi.org/10.1145/3557921.3565542Data collected about routine human activity and mobility is used in diverse applications to improve our society. Robust models are needed to address the challenges of our increasingly interconnected world. Methods capable of portraying the dynamic ...
- research-articleOctober 2022
Investigating whether people identify how suitable a data visualization is for answering specific analysis questions
- Ariane Moraes Bueno Rodrigues,
- Gabriel Diniz Junqueira Barbosa,
- Hélio Côrtes Vieira Lopes,
- Simone Diniz Junqueira Barbosa
IHC '22: Proceedings of the 21st Brazilian Symposium on Human Factors in Computing SystemsArticle No.: 22, Pages 1–11https://doi.org/10.1145/3554364.3560904Choosing the type of chart to represent data may be challenging. Various books and websites provide catalogs of data visualizations to help choose the correct chart, usually taking into account data types and general tasks. However, they often do not ...
- research-articleOctober 2022
Guided Text-based Item Exploration
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementPages 3410–3420https://doi.org/10.1145/3511808.3557141Exploratory Data Analysis (EDA) provides guidance to users to help them refine their needs and find items of interest in large volumes of structured data. In this paper, we develop GUIDES, a framework for guided Text-based Item Exploration (TIE). TIE ...
- abstractAugust 2022
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 4814–4815https://doi.org/10.1145/3534678.3542604It is widely accepted that data preparation is one of the most time-consuming steps of the machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of data directly influences the quality of a model. In this tutorial, ...
- research-articleJune 2022
Reptile: Aggregation-level Explanations for Hierarchical Data
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataPages 399–413https://doi.org/10.1145/3514221.3517854Users often can see from overview-level statistics that some results look "off", but are rarely able to characterize even the type of error. Reptile is an iterative human-in-the-loop explanation and cleaning system for errors in hierarchical data. Users ...
- research-articleApril 2022
Diff in the Loop: Supporting Data Comparison in Exploratory Data Analysis
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsArticle No.: 97, Pages 1–10https://doi.org/10.1145/3491102.3502123Data science is characterized by evolution: since data science is exploratory, results evolve from moment to moment; since it can be collaborative, results evolve as the work changes hands. While existing tools help data scientists track changes in code,...
- research-articleApril 2022
Significance and Coverage in Group Testing on the Social Web
WWW '22: Proceedings of the ACM Web Conference 2022Pages 3052–3060https://doi.org/10.1145/3485447.3512025We tackle the longstanding question of checking hypotheses on the social Web. In particular, we address the challenges that arise in the context of testing an input hypothesis on many data samples, in our case, user groups. This is referred to as ...
- research-articleJanuary 2022
It's Good to Talk: A Comparison of Using Voice Versus Screen-Based Interactions for Agent-Assisted Tasks
ACM Transactions on Computer-Human Interaction (TOCHI), Volume 29, Issue 3Article No.: 25, Pages 1–41https://doi.org/10.1145/3484221Voice assistants have become hugely popular in the home as domestic and entertainment devices. Recently, there has been a move towards developing them for work settings. For example, Alexa for Business and IBM Watson for Business were designed to improve ...
- research-articleJanuary 2022
Big data analytics and classification of cardiovascular disease using machine learning
- Sanam Narejo,
- Anoud Shaikh,
- Mehak Maqbool Memon,
- Kainat Mahar,
- Zonera Aleem,
- Bisharat Zardari,
- Valentina Emilia Balas
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology (JIFS), Volume 43, Issue 2Pages 2025–2033https://doi.org/10.3233/JIFS-219302Hundreds of people dying from heart disease almost every day that is how terrific a delayed diagnosis can be. Living in an advanced era full of intelligent systems, the increasing number of deaths can be reduced. This research paper focuses on the ...