Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Coordinating computational and visual approaches for interactive feature selection and multivariate clustering

Published: 01 December 2003 Publication History

Abstract

Unknown (and unexpected) multivariate patterns lurking in high-dimensional datasets are often very hard to find. This paper describes a human-centered exploration environment, which incorporates a coordinated suite of computational and visualization methods to explore high-dimensional data for uncovering patterns in multivariate spaces. Specifically, it includes: (1) an interactive feature selection method for identifying potentially interesting, multidimensional subspaces from a high-dimensional data space, (2) an interactive, hierarchical clustering method for searching multivariate clusters of arbitrary shape, and (3) a suite of coordinated visualization and computational components centered around the above two methods to facilitate a human-led exploration. The implemented system is used to analyze a cancer dataset and shows that it is efficient and effective for discovering unknown and unexpected multivariate patterns from high-dimensional data.

References

[1]
1 Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD International Conference on Management of Data (Seattle, WA, U.S.A., 1998), ACM Press: New York, 94-105.
[2]
2 Procopiuc CM, Jones M, Agarwal PK, Murali TM. A Monte Carlo Algorithm for Fast Projective Clustering. ACM SIGMOD International Conference on Management of Data (Madison, WI, U.S.A., 2002), ACM Press: New York, 418-427.
[3]
3 Cheng C, Fu A and Zhang Y. Entropy-based subspace clustering for mining numerical data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, CA, USA, 1999), ACM Press: New York, 84-93.
[4]
4 Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery-an review. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusay R (Eds). Advances in Knowledge Discovery. AAAI Press/The MIT Press: Cambridge, MA, 1996; 1-33.
[5]
5 Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers: Dordrecht, 1998; 214pp.
[6]
6 Dy JG, Brodley CE. Feature subset selection and order identification for unsuprervised learning. The Seventeenth International Conference on Machine Learning, Stanford University (CA, U.S.A., 2000), 247-254.
[7]
7 Dy JG, Brodley CE. Visualization and interactive feature selection for unsupervised data. The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, MA, U.S.A., 2000), ACM Press: New York, 360-364.
[8]
8 Kim Y, Street WN, Menczer F. Feature selection in unsupervised learning via evolutionary search. The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, MA, U.S.A., 2000), ACM Press: New York, 365-369.
[9]
9 Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS. Fast algorithms for projected clustering. ACM SIGMOD International Conference on Management of Data (Philadelphia, Pennsylvania, U.S.A., 1999), ACM Press: New York, 61-72.
[10]
10 Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional spaces. ACM SIGMOD International Conference on Management of Data (Dallas, TX, U.S.A, 2000), ACM Press: New York, 70-81.
[11]
11 Jong Hd, Rip A. The computer revolution in science: steps toward the realization of computer-supported discovery environments. Artificial Intelligence 1997; 91: 225-256.
[12]
12 Wong PC. Visual data mining. IEEE Computer Graphics & Applications 1999; 19: 20-31.
[13]
13 Ankerst M, Ester M, Kriegel H-P. Towards an effective cooperation of the user and the computer for classification. The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, MA, USA, 2000), ACM Press: New York, 179-188.
[14]
14 Kohonen T. Self-Organizing Maps. Springer: Berlin, 2001, 501 pp.
[15]
15 Friendly M. Corrgrams: exploratory displays for correlation matrices. The American Statistician 2002; 19: 316-325.
[16]
16 Ankerst M, Berchtold S, Keim DA. Similarity clustering of dimensions for an enhanced visualization of multidimensional data. Information Visualization '98 (Raleigh-Durham, NC, USA, 1998), 52-60.
[17]
17 Gordon AD. A review of hierarchical classification. Journal of the Royal Statistical Society. Series A (General) 1987; 150: 119-137.
[18]
18 Gordon AD. Hierarchical classification. In: Arabie P, Hubert LJ, Soete GD (Eds). Clustering and Classification. World Scientific Publisher: River Edge, NJ, USA; 1996; 65-122.
[19]
19 Jain AK, Dubes RC. Algorithms for Clustering Data. Prentice-Hall: Englewood Cliffs, NJ, 1988, 320pp.
[20]
20 Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Computing Surveys (CSUR) 1999; 31: 264-323.
[21]
21 Guo D, Peuquet D, Gahegan M. ICEAGE: interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica 2003; 7: 229-253.
[22]
22 Ester M, Kriegel H-P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. The Second International Conference on Knowledge Discovery and Data Mining (Portland, OR, USA, 1996), AAAI Press: New York, 226-231.
[23]
23 Schikuta E. Grid-clustering: An efficient hierarchical clustering method for very large data sets. 13th Conference on Pattern Recognition, 1996, IEEE Computer Society Press: New York, 101-105.
[24]
24 Hinneburg A, Keim DA. Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. The 25th VLDB Conference (Edingburgh, Scotland, 1999), Morgan Kaufmann: Los Altos, CA, 506-517.
[25]
25 Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD International Conference on Management of Data (Philadelphia, PA, USA, 1999), ACM Press: New York, 49-60.
[26]
26 Hinneburg A, Keim DA, Wawryniuk M. HD-Eye: Visual mining of high-dimensional data. IEEE Computer Graphics & Applications 1999; 19: 22-31.
[27]
27 Seo J, Shneiderman B. Interactively exploring hierarchical clustering results {gene identification}. Computer 2002; 35: 80-86.
[28]
28 Guo D, Gahegan M, Peuquet D, MacEachren A. Breaking down dimensionality: an effective feature selection method for high-dimensional clustering. Workshop on Clustering High Dimensional Data and its Applications, the Third SIAM International Conference on Data Mining, May 1-3 (San Francisco, CA, U.S.A., 2003).
[29]
29 Pyle D. Data preparation for data mining. Morgan Kaufmann: Los Altos, CA, 1999, 540pp.
[30]
30 Snedecor GW, Cochran WG. Statistical Methods. Iowa State University Press: IA, U.S.A. 1989, 503pp.
[31]
31 Vandev D, Tsvetanova GY. Perfect chains and single linkage clustering algorithm. Statistical Data Analysis, Proceedings SDA-95, 1995; 99-107.
[32]
32 Duda RO, Hart PE, Stork DG. Pattern Classification. John Wiley & Sons: New York, 2001.
[33]
33 Guo D, Peuquet D, Gahegan M. Opening the black box: interactive hierarchical clustering for multivariate spatial patterns. The 10th ACM International Symposium on Advances in Geographic Information Systems (McLean, VA, USA, 2002), 131-136.
[34]
34 Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD International Conference on Management of Data (Montreal, Canada, 1996), ACM Press: New York, 103-114.
[35]
35 Szyperski C. Component Software: Beyond Object-Oriented Programming . Addison-Wesley/ACM Press: New York, 1999, 589pp.
[36]
36 Gahegan M, Takatsuka M, Wheeler M, Hardisty F. GeoVISTA Studio: a geocomputational workbench. 5th International Conference on GeoComputation, University of Greenwich (Medway Campus, U.K., 2000).
[37]
37 Gahegan M, Takatsuka M, Wheeler M, Hardisty F. Introducing GeoVISTA Studio: an integrated suite of visualization and computational methods for exploration and knowledge construction in geography. Computers, Environment and Urban Systems 2001; 26: 267-292.
[38]
38 MacEachren AM, Hardisty F, Gahegan M, Wheeler M, Dai X, Guo D, Takatsuka M. Supporting visual integration and analysis of geospatially-referenced statistics through web-deployable, cross-platform tools. dg.o.2001, National Conference for Digital Government Research (Los Angeles, CA, 2001), 17-24.
[39]
39 MacEachren AM, Hardisty F, Dai X, Pickle L. Supporting visual analysis of federal statistical summaries. Communications of the ACM 2003; 46: 59-60.
[40]
40 Kullback S, Leibler RA. On information and sufficiency. The Annals of Mathematical Statistics 1951; 22: 79-86.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Visualization
Information Visualization  Volume 2, Issue 4
Special issue on coordinated and multiple views in exploratory visualization
December 2003
87 pages

Publisher

Palgrave Macmillan

Publication History

Published: 01 December 2003

Author Tags

  1. data mining and knowledge discovery
  2. entropy
  3. feature selection
  4. hierarchical clustering
  5. interactive visualization
  6. mutual information

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data avatarsInformation and Management10.1016/j.im.2023.10391161:2Online publication date: 25-Jun-2024
  • (2022)ConfusionVisKnowledge-Based Systems10.1016/j.knosys.2022.108651247:COnline publication date: 8-Jul-2022
  • (2021)Context-Aware Visual Abstraction of Crowded Parallel CoordinatesNeurocomputing10.1016/j.neucom.2021.05.005459:C(23-34)Online publication date: 12-Oct-2021
  • (2021)Feature selection based on star coordinates plots associated with eigenvalue problemsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01793-w37:2(203-216)Online publication date: 1-Feb-2021
  • (2020)XplainableClusterExplorerProceedings of the 13th International Symposium on Visual Information Communication and Interaction10.1145/3430036.3430066(1-5)Online publication date: 8-Dec-2020
  • (2020)Rewriting a Deep Generative ModelComputer Vision – ECCV 202010.1007/978-3-030-58452-8_21(351-369)Online publication date: 23-Aug-2020
  • (2019)A novel visual approach for enhanced attribute analysis and selectionComputers and Graphics10.1016/j.cag.2019.08.01584:C(160-172)Online publication date: 1-Nov-2019
  • (2018)PARADISOProceedings of the 30th International Conference on Scientific and Statistical Database Management10.1145/3221269.3221299(1-4)Online publication date: 9-Jul-2018
  • (2018)Scaled radial axes for interactive visual feature selectionExpert Systems with Applications: An International Journal10.1016/j.eswa.2018.01.054100:C(182-196)Online publication date: 15-Jun-2018
  • (2017)A Visual Approach towards Knowledge Engineering and Understanding How Students Learn in Complex EnvironmentsProceedings of the Fourth (2017) ACM Conference on Learning @ Scale10.1145/3051457.3051468(13-22)Online publication date: 12-Apr-2017
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media