Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Scaled radial axes for interactive visual feature selection

Published: 15 June 2018 Publication History

Abstract

We propose a new radial axes method to help visual backward feature selection.Experts can incorporate domain knowledge to analyze classes through LMNN, NCA, etc.The method reduces clutter in visualization compared to other radial axes plots.We conducted different experiments with several public data sets.We present a case study using high dimensional data of chronic medical conditions. In statistics, machine learning, and related fields, feature selection is the process of choosing a smaller subset of features to work with. This is an important topic since selecting a subset of features can help analysts to interpret models and data, and to decrease computational runtimes. While many techniques are purely automatic, the data visualization community has produced a number of interactive approaches where users can make decisions taking into account their domain knowledge. In this paper we propose a new visualization technique based on radial axes that allows analysts to perform feature selection effectively, in contrast to previous radial axes methods. This is achieved by employing alternative scaled axes that provide insight regarding the features that have a smaller contribution to the visualizations. Therefore, analysts can use the technique to carry out interactive backwards feature elimination, by discarding the least relevant features according to the information on the plots and their expertise. Our approach can be coupled with any linear dimensionality reduction method, and can be used when performing analyses of cluster structure, correlations, class separability, etc. Specifically, in this paper we focus on combining the proposed technique with methods designed for classification. Lastly, we illustrate the effectiveness of our proposal through a case study analyzing high-dimensional medical chronic conditions data. In particular, clinicians have used the technique for determining the most important features that discriminate between patients with diabetes and high blood pressure.

References

[1]
J. Alcala-Fdez, L. Sanchez, S. Garcia, M.J. del Jesus, S. Ventura, J.M. Garrell, Keel: a software tool to assess evolutionary algorithms for data mining problems, Soft Computing, 13 (2008) 307-318.
[2]
R. Amar, J. Eagan, J. Stasko, Low-level components of analytic activity in information visualization, IEEE Computer Society, Washington, DC, USA, 2005.
[3]
R.F. Averill, N. Goldfield, J. Eisenhandler, J.H. Muldoon, J. Hughes, J.M. Neff, Development and evaluation of clinical risk groups (CRGs), 3M Health Information Systems (1999).
[4]
M. Berlinguet, C. Preyra, S. Dean, Comparing the value of three main diagnostic-based risk-adjustment systems (DBRAS), Canadian Health Services Research Foundation, 2005.
[5]
E. Bertini, A. Tatu, D. Keim, Quality metrics in high-dimensional data visualization: An overview and systematization, IEEE Transactions on Visualization and Computer Graphics, 17 (2011) 2203-2212.
[6]
A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning, Artificial Intelligence, 97 (1997) 245-271.
[7]
Centers for Disease Control and Prevention (2011). International classification of diseases, ninth revision, clinical modification (ICD-9-CM). {Online} http://www.cdc.gov/nchs/icd/icd9cm.htm. Accessed Jan. 2018.
[8]
G. Chandrashekar, F. Sahin, A survey on feature selection methods, Computers and Electrical Engineering, 40 (2014) 16-28.
[9]
K. Chen, L. Liu, VISTA: validating and refining clusters via visualization, Information Visualization, 3 (2004) 257-270.
[10]
T. Cox, M. Cox, Multidimensional scaling, Chapman & Hall, 1994.
[11]
B.V. Dasarathy, Nearest neighbor (NN) norms: NN pattern classification techniques, IEEE Computer Society Press, Los Alamitos, CA, 1991.
[12]
C. Ding, T. Li, Adaptive dimension reduction using discriminant analysis and k-means clustering, ACM, New York, NY, USA, 2007.
[13]
G.M. Draper, Y. Livnat, R.F. Riesenfeld, A survey of radial methods for information visualization, IEEE Transactions on Visualization and Computer Graphics, 15 (2009) 759-776.
[14]
R.O. Duda, P.E. Hart, D.G. Stork, Pattern classification, Wiley, 2001.
[15]
J. Fernndez-Snchez, C. Soguero-Ruiz, P. de Miguel-Bohoyo, F.J. Rivas-Flores, A. Gmez-Delgado, F.J. Gutirrez-Expsito, I. Mora-Jimnez, Clinical risk groups analysis for chronic hypertensive patients in terms of ICD9-cm diagnosis codes, 2017.
[16]
K.R. Gabriel, The biplot graphic display of matrices with application to principal component analysis, Biometrika, 58 (1971) 453-467.
[17]
P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Machine Learning, 63 (2006) 3-42.
[18]
J. Goldberger, S. Roweis, G. Hinton, R. Salakhutdinov, Neighborhood component analysis, 2005.
[19]
J. Gower, S. Gardner-Lubbe, N. le Roux, Understanding biplots, John Wiley & Sons, 2011.
[20]
D. Guo, Coordinating computational and visual approaches for interactive feature selection and multivariate clustering, Information Visualization, 2 (2003) 232-246.
[21]
I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3 (2003) 1157-1182.
[22]
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002) 389-422.
[23]
J.S. Hughes, R.F. Averill, J. Eisenhandler, N.I. Goldfield, J. Muldoon, J.M. Neff, J.C. Gay, Clinical risk groups (CRGs): A classification system for risk-adjusted capitation-based payment and health care management., Medical Care, 42 (2004) 81-90.
[24]
A. Hyvrinen, J. Karhunen, E. Oja, Independent component analysis, J. Wiley, 2001.
[25]
S. Ingram, T. Munzner, V. Irvine, M. Tory, S. Bergner, T. Mller, Dimstiller: Workflows for dimensional analysis and reduction, IEEE Computer Society, 2010.
[26]
A. Inselberg, B. Dimsdale, Parallel coordinates: a tool for visualizing multi-dimensional geometry, IEEE Computer Society Press, Los Alamitos, CA, USA, 1990.
[27]
S. Johansson, J. Johansson, Interactive dimensionality reduction through user-defined combinations of quality metrics, IEEE Transactions on Visualization & Computer Graphics, 15 (2009) 993-1000.
[28]
I.T. Jolliffe, Principal component analysis, Springer-Verlag, 2010.
[29]
E. Kandogan, Star coordinates: A multi-dimensional visualization technique with uniform treatment of dimensions, IEEE Computer Society, Salt Lake City, USA, 2000.
[30]
E. Kandogan, Visualizing multi-dimensional clusters, trends, and outliers using star coordinates, ACM, New York, NY, USA, 2001.
[31]
J. Krause, A. Perer, E. Bertini, Infuse: Interactive feature selection for predictive modeling of high dimensional data, IEEE Transactions on Visualization & Computer Graphics, 20 (2014) 1614-1623.
[32]
G. Leban, B. Zupan, G. Vidmar, I. Bratko, Vizrank: Data visualization guided by machine learning, Data Mining and Knowledge Discovery, 13 (2006) 119-136.
[33]
Lichman, M. (2013). UCI machine learning repository.
[34]
Maaten, L. v. (2015). Matlab toolbox for dimensionality reduction.
[35]
J.B. MacQueen, Some methods for classification and analysis of multivariate observations, University of California Press, 1967.
[36]
T. May, A. Bannach, J. Davey, T. Ruppert, J. Kohlhammer, Guiding feature subset selection with an interactive visualization, 2011.
[37]
G.J. McLachlan, Discriminant analysis and statistical pattern recognition, Wiley-Interscience, 2004.
[38]
G. McNicoll, World population ageing 19502050, Population and Development Review, 28 (2002) 814-816.
[39]
Norwegian Institute of Public Health, WHO Collaborating centre for drug statistics methodology, guidelines for ATC classification and DDD assignment 2018, 2017.
[40]
F.V. Paulovich, L.G. Nonato, R. Minghim, H. Levkowitz, Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping, IEEE Transactions on Visualization and Computer Graphics, 14 (2008) 564-575.
[41]
P.E. Rauber, R.R.O.d. Silva, S. Feringa, M.E. Celebi, A.X. Falco, A.C. Telea, Interactive image feature selection aided by dimensionality reduction, The Eurographics Association, 2015.
[42]
T. Rauber, A. Steiger-Garo, Feature selection of categorical attributes based on contingency table analysis, The Portuguese Association for Pattern Recognition, Porto, Portugal, 1993.
[43]
M. Rubio-Snchez, L. Raya, F. Daz, A. Sanchez, A comparative study between Radviz and star coordinates, IEEE Transactions on Visualization and Computer Graphics, 22 (2016) 619-628.
[44]
M. Rubio-Snchez, A. Sanchez, Axis calibration for improving data attribute estimation in star coordinates plots, IEEE Transactions on Visualization and Computer Graphics, 20 (2014) 2013-2022.
[45]
M. Rubio-Snchez, A. Sanchez, D.J. Lehmann, Adaptable radial axes plots for improved multivariate data visualization, Computer Graphics Forum, 36 (2017) 389-399.
[46]
J. Seo, B. Shneiderman, A rank-by-feature framework for interactive exploration of multidimensional data, Information Visualization, 4 (2005) 96-113.
[47]
B. Shneiderman, The eyes have it: A task by data type taxonomy for information visualizations, IEEE Computer Society, Washington, DC, USA, 1996.
[48]
C. Soguero-Ruiz, K. Hindberg, I. Mora-Jimnez, J.L. Rojo-lvarez, etal., Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods, Journal of Biomedical Informatics, 61 (2016) 87-96.
[49]
Y. Sun, J. Yuan, Y. Hu, W. Xiao, An improved multivariate data visualization technique, 2008.
[50]
A. Tatu, F. Maa, I. Frber, E. Bertini, T. Schreck, T. Seidl, D.A. Keim, Subspace search and visualization to make sense of alternative clusterings in high-dimensional data, IEEE Computer Society, 2012.
[51]
C.-Y. Tsai, C.-C. Chiu, A clustering-oriented star coordinate translation method for reliable clustering parameterization, Springer-Verlag, Berlin, Heidelberg, 2008.
[52]
Y. Wang, J. Li, F. Nie, H. Theisel, M. Gong, D.J. Lehmann, Linear discriminative star coordinates for exploring class and cluster separation of high dimensional data, Computer Graphics Forum (Proc. EuroVis) (2017).
[53]
K.Q. Weinberger, L.K. Saul, Distance metric learning for large margin nearest neighbor classification, Journal of Machine Learning Research, 10 (2009) 207-244.
[54]
I.H. Witten, E. Frank, Data mining: Practical machine learning tools and techniques, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2005.
[55]
World Health Organization, Hypertension guidelines, Journal of Hypertension, 17 (1999) 151-183.
[56]
World Health Organization, Preventing chronic diseases. a vital investment: WHO global report, International Journal of Epidemiology, 35 (2006).
[57]
J. Yang, W. Peng, M.O. Ward, E.A. Rundensteiner, Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets, IEEE Computer Society, Washington, DC, USA, 2003.
[58]
J. Yang, M.O. Ward, E.A. Rundensteiner, Interring: An interactive tool for visually navigating and manipulating hierarchical structures, IEEE Computer Society, 2002.
[59]
J. Yang, M.O. Ward, E.A. Rundensteiner, S. Huang, Visual hierarchical dimension reduction for exploration of high dimensional datasets, Eurographics Association, 2003.
[60]
J.S. Yi, Y. ah Kang, J. Stasko, J. Jacko, Toward a deeper understanding of the role of interaction in information visualization, IEEE Transactions on Visualization and Computer Graphics, 13 (2007) 1224-1231.
[61]
K.-B. Zhang, M. Orgun, K. Zhang, HOV3: An approach to visual cluster analysis, Springer Berlin / Heidelberg, 2006.

Cited By

View all
  • (2024)Online sequential extreme learning machine approach for breast cancer diagnosisNeural Computing and Applications10.1007/s00521-024-09617-x36:18(10413-10429)Online publication date: 1-Jun-2024
  • (2023)BC-Net: Early Diagnostics of Breast Cancer Using Nested Ensemble Technique of Machine LearningAutomatic Control and Computer Sciences10.3103/S014641162306009357:6(646-659)Online publication date: 1-Dec-2023
  • (2021)Feature selection based on star coordinates plots associated with eigenvalue problemsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01793-w37:2(203-216)Online publication date: 1-Feb-2021
  • Show More Cited By
  1. Scaled radial axes for interactive visual feature selection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Expert Systems with Applications: An International Journal
    Expert Systems with Applications: An International Journal  Volume 100, Issue C
    June 2018
    212 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 15 June 2018

    Author Tags

    1. Exploratory data analysis
    2. High-dimensional data visualization
    3. Interactive feature selection
    4. Medical chronic conditions
    5. Visual analytics

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Online sequential extreme learning machine approach for breast cancer diagnosisNeural Computing and Applications10.1007/s00521-024-09617-x36:18(10413-10429)Online publication date: 1-Jun-2024
    • (2023)BC-Net: Early Diagnostics of Breast Cancer Using Nested Ensemble Technique of Machine LearningAutomatic Control and Computer Sciences10.3103/S014641162306009357:6(646-659)Online publication date: 1-Dec-2023
    • (2021)Feature selection based on star coordinates plots associated with eigenvalue problemsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-020-01793-w37:2(203-216)Online publication date: 1-Feb-2021
    • (2019)Noisy multi-label semi-supervised dimensionality reductionPattern Recognition10.1016/j.patcog.2019.01.03390:C(257-270)Online publication date: 1-Jun-2019
    • (2019)A novel visual approach for enhanced attribute analysis and selectionComputers and Graphics10.1016/j.cag.2019.08.01584:C(160-172)Online publication date: 1-Nov-2019
    • (2019)Leveraging implicit expert knowledge for non-circular machine learning in sepsis predictionArtificial Intelligence in Medicine10.1016/j.artmed.2019.101725100:COnline publication date: 1-Sep-2019
    • (2018)A visual interface for feature subset selection using machine learning methodsProceedings of the XXVIII Spanish Computer Graphics Conference10.2312/ceig.20181165(119-128)Online publication date: 27-Jun-2018

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media