Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Adaptive Contextualization Methods for Combating Selection Bias during High-Dimensional Visualization

Published: 21 November 2017 Publication History

Abstract

Large and high-dimensional real-world datasets are being gathered across a wide range of application disciplines to enable data-driven decision making. Interactive data visualization can play a critical role in allowing domain experts to select and analyze data from these large collections. However, there is a critical mismatch between the very large number of dimensions in complex real-world datasets and the much smaller number of dimensions that can be concurrently visualized using modern techniques. This gap in dimensionality can result in high levels of selection bias that go unnoticed by users. The bias can in turn threaten the very validity of any subsequent insights. This article describes Adaptive Contextualization (AC), a novel approach to interactive visual data selection that is specifically designed to combat the invisible introduction of selection bias. The AC approach (1) monitors and models a user’s visual data selection activity, (2) computes metrics over that model to quantify the amount of selection bias after each step, (3) visualizes the metric results, and (4) provides interactive tools that help users assess and avoid bias-related problems. This article expands on an earlier article presented at ACM IUI 2016 [16] by providing a more detailed review of the AC methodology and additional evaluation results.

References

[1]
Amy P. Abernethy, Lynn M. Etheredge, Patricia A. Ganz, Paul Wallace, Robert R. German, Chalapathy Neti, Peter B. Bach, and Sharon B. Murphy. 2010. Rapid-learning system for cancer care. J. Clin. Oncol. 28, 27 (Sep. 2010), 4268--4274.
[2]
D. G. Altman, K. F. Schulz, D. Moher, M. Egger, F. Davidoff, D. Elbourne, P. C. Gøtzsche, T. Lang, and Consort Group (Consolidated Standards of Reporting Trials). 2001. The revised consort statement for reporting randomized trials: Explanation and elaboration. Ann. Intern. Med. 134, 8 (April 2001), 663--694.
[3]
M. Ankerst, S. Berchtold, and D. A. Keim. 1998. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. 52--60.
[4]
L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, and H. T. Vo. 2005. VisTrails: Enabling interactive multiple-view visualizations. In IEEE Visualization. 135--142.
[5]
C. Begg, M. Cho, S. Eastwood, R. Horton, D. Moher, I. Olkin, R. Pitkin, D. Rennie, K. F. Schulz, D. Simel, and D. F. Stroup. 1996. Improving the quality of reporting of randomized controlled trials. The consort statement. J. Am. Med. Assoc. 276, 8 (Aug. 1996), 637--639.
[6]
E. Bertini, A. Tatu, and D. Keim. 2011. Quality metrics in high-dimensional data visualization: An overview and systematization. IEEE Trans. Vis. Comput. Graph. 17, 12 (Dec. 2011), 2203--2212.
[7]
Andreas Buja, Dianne Cook, and Deborah F. Swayne. 1996. Interactive high-dimensional data visualization. J. Comput. Graph. Stat. 5, 1 (Mar. 1996), 78--99.
[8]
Nan Cao, David Gotz, Jimeng Sun, and Huamin Qu. 2011. DICON: Interactive visual analysis of multidimensional clusters. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2581--2590.
[9]
D. B. Carr, R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield. 1987. Scatterplot matrix techniques for large N. J. Am. Stat. Assoc. 82, 398 (1987), 424--436.
[10]
Keke Chen and Ling Liu. 2004. VISTA: Validating and refining clusters via visualization. Inf. Vis. 3, 4 (Dec. 2004), 257--270. 1473-8716.
[11]
N. Elmqvist and J. Fekete. 2010. Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. IEEE Trans. Vis. Comput. Graph. 16, 3 (2010), 439--454.
[12]
Lynn M. Etheredge. 2007. A rapid-learning health system. Health Affairs 26, 2 (Mar. 2007), 107--118. 0278-2715.
[13]
Charles P. Friedman, Adam K. Wong, and David Blumenthal. 2010. Achieving a nationwide learning health system. Sci. Transl. Med. 2, 57 (Nov. 2010). 1946-6234.
[14]
D. Gotz and D. Borland. 2016. Data-driven healthcare: Challenges and opportunities for interactive visualization. IEEE Comput. Graph. Appl. 36, 3 (May 2016), 90--96.
[15]
D. Gotz and H. Stavropoulos. 2014. DecisionFlow: Visual analytics for high-dimensional temporal event sequence data. IEEE Trans. Vis. Comput. Graph. 20, 12 (2014), 1783--1792.
[16]
David Gotz, Shun Sun, and Nan Cao. 2016. Adaptive contextualization: Combating bias during high-dimensional visualization and data selection. In Proceedings of the 21st International Conference on Intelligent User Interfaces (IUI’16). ACM, New York, NY, 85--95.
[17]
David Gotz and Zhen Wen. 2009. Behavior-driven visualization recommendation. In Proceedings of the 14th International Conference on Intelligent User Interfaces. ACM, New York, NY, 315--324.
[18]
D. Gotz and M. X. Zhou. 2008. Characterizing users’ visual analytic activity for insight provenance. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology 2008 (VAST’08). 123--130.
[19]
Julian Heinrich and Daniel Weiskopf. 2012. State of the art of parallel coordinates. In Eurographics 2013—State of the Art Reports, M. Sbert and L. Szirmay-Kalos (Eds.). The Eurographics Association.
[20]
Stacie Hibino and Elke A. Rundensteiner. 1997. User interface evaluation of a direct manipulation temporal visual query language. In Proceedings of the 5th ACM International Conference on Multimedia. ACM, New York, NY, 99--107.
[21]
Sally Hopewell, Allison Hirst, Gary S. Collins, Sue Mallett, Ly-Mee Yu, and Douglas G. Altman. 2011. Reporting of participant flow diagrams in published reports of randomized trials. Trials 12 (Dec. 2011), 253.
[22]
Alfred Inselberg and Bernard Dimsdale. 1990. Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry. IEEE Computer Society Press, Los Alamitos, CA, 361--378.
[23]
Institute of Medicine. 2012. Best Care at Lower Cost: The Path to Continuously Learning Health Care in America. Technical Report. Retrieved from http://iom.nationalacademies.org/Reports/2012/Best-Care-at-Lower-Cost-The-Path-to-Continuously-Learning-Health-Care-in-America.aspx.
[24]
T. J. Jankun-Kelly, Kwan Liu Ma, and Michael Gertz. 2002. A model for the visualization exploration process. In Proceedings of the IEEE Conference on Visualization. Washington, DC, 323--330. http://dl.acm.org/citation.cfm?id=602099.602149
[25]
M. Kreuseler, T. Nocke, and H. Schumann. 2004. A history mechanism for visual data mining. In Proceedings of the IEEE Symposium on Information Visualization. 49--56.
[26]
Lineberger. 2014. UNC Lineberger Comprehensive Cancer Center. Integrated Cancer Information and Surveillance System. Retrieved from http://iciss.unc.edu/.
[27]
Jie Lu, Zhen Wen, Shimei Pan, and Jennifer Lai. 2011. Analytic trails: Supporting provenance, collaboration, and reuse for visual data analysis by business users. In Proceedings of the Human-Computer Interaction (INTERACT’11), Pedro Campos, Nicholas Graham, Joaquim Jorge, Nuno Nunes, Philippe Palanque, and Marco Winckler (Eds.). Volume 6949 in Lecture Notes in Computer Science. Springer, Berlin, 256--273. http://link.springer.com/chapter/10.1007/978-3-642-23768-3_22
[28]
Travis B. Murdoch and Allan S. Detsky. 2013. The inevitable application of big data to health care. J. Am. Med. Assoc. 309, 13 (Apr. 2013), 1351--1352.
[29]
Chris North, Remco Chang, Alex Endert, Wenwen Dou, Richard May, Bill Pike, and Glenn Fink. 2011. Analytic provenance: Process+interaction+insight. In Proceedings of the Extended Abstracts on Human Factors in Computing Systems (CHI’11). ACM, New York, NY, 33--36.
[30]
PhysioNet. 2016. MIMIC II: Clinical Database Overview. Retrieved from http://physionet.org/mimic2/mimic2_clinical_overview.shtml.
[31]
David Pollard. 2002. A User’s Guide to Measure Theoretic Probability. Cambridge University Press.
[32]
Alexander Rind. 2013. Interactive information visualization to explore and query electronic health records. Found. Trends Hum.-Comput. Interact. 5, 3 (2013), 207--298.
[33]
Mohammed Saeed, Mauricio Villarroel, Andrew T. Reisner, Gari Clifford, Li-Wei Lehman, George Moody, Thomas Heldt, Tin H. Kyaw, Benjamin Moody, and Roger G. Mark. 2011. Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public-access intensive care unit database. Crit. Care Med. 39, 5 (May 2011), 952--960.
[34]
Jinwook Seo and Ben Shneiderman. 2002. Interactively exploring hierarchical clustering results. IEEE Comput. 35 (2002), 80--86.
[35]
Ben Shneiderman. 1994. Dynamic queries for visual information seeking. IEEE Softw. 11, 6 (Nov. 1994), 70--77.
[36]
Ben Shneiderman and Catherine Plaisant. 2006. Strategies for evaluating information visualization tools: Multi-dimensional in-depth long-term case studies. In Proceedings of the 2006 AVI Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization. ACM, New York, NY, 1--7.
[37]
Y. B. Shrinivasan, D. Gotz, and Jie Lu. 2009. Connecting the dots in visual analysis. In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology 2009 (VAST’09). 123--130.
[38]
Yedendra Babu Shrinivasan and David Gotz. 2009. Connecting the dots with related notes. In CHI ’09 Proceedings of the Extended Abstracts on Human Factors in Computing Systems. ACM, New York, NY, 3649--3654.
[39]
Yedendra Babu Shrinivasan and Jarke J. van Wijk. 2008. Supporting the analytical reasoning process in information visualization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 1237--1246.
[40]
Douglas G. Simpson. 1987. Minimum hellinger distance estimation for the analysis of count data. J. Am. Stat. Assoc. 82, 399 (Sept. 1987), 802--807.
[41]
James Thomas and Kristin Cook. 2005. Illuminating the Path: The Research and Development Agenda for Visual Analytics. National Visualization and Analytics Ctr.
[42]
Michelle Q. Wang Baldonado, Allison Woodruff, and Allan Kuchinsky. 2000. Guidelines for using multiple views in information visualization. In Proceedings of the Working Conference on Advanced Visual Interfaces. ACM, New York, NY, 110--119.
[43]
Matthew Ward, Georges Grinstein, and Daniel Keim. 2010. Interactive Data Visualization: Foundations, Techniques, and Applications (1 ed.). A K Peters/CRC Press, Natick, MA.
[44]
Zhiyuan Zhang, David Gotz, and Adam Perer. 2015. Iterative cohort analysis and exploration. Information Visualization 14, 4 (2015).

Cited By

View all
  • (2023)PORDE: Explaining Data Poisoning Attacks Through Visual Analytics with Food Delivery App ReviewsCompanion Proceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581754.3584128(46-50)Online publication date: 27-Mar-2023
  • (2021)Modeling and Leveraging Analytic Focus During Exploratory Visual AnalysisProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445674(1-15)Online publication date: 6-May-2021
  • (2021)Selection-Bias-Corrected Visualization via Dynamic ReweightingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.303045527:2(1481-1491)Online publication date: Feb-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Interactive Intelligent Systems
ACM Transactions on Interactive Intelligent Systems  Volume 7, Issue 4
Special Issue on IUI 2016 Highlights
December 2017
134 pages
ISSN:2160-6455
EISSN:2160-6463
DOI:10.1145/3166060
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2017
Accepted: 01 March 2017
Revised: 01 February 2017
Received: 01 June 2016
Published in TIIS Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Visualization
  2. exploratory analysis
  3. intelligent visual interfaces
  4. selection bias
  5. visual analytics

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)18
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)PORDE: Explaining Data Poisoning Attacks Through Visual Analytics with Food Delivery App ReviewsCompanion Proceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581754.3584128(46-50)Online publication date: 27-Mar-2023
  • (2021)Modeling and Leveraging Analytic Focus During Exploratory Visual AnalysisProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445674(1-15)Online publication date: 6-May-2021
  • (2021)Selection-Bias-Corrected Visualization via Dynamic ReweightingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.303045527:2(1481-1491)Online publication date: Feb-2021
  • (2020)Survey on the Analysis of User Interactions and Visualization ProvenanceComputer Graphics Forum10.1111/cgf.1403539:3(757-783)Online publication date: 18-Jul-2020
  • (2019)Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical AggregationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2019.2934661(1-1)Online publication date: 2019
  • (2019)Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2019.2934209(1-1)Online publication date: 2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media