Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

Marek Walesiak⁵ &
Andrzej Dudek⁵

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

6112 Accesses
2 Citations

Abstract

A proposal of an extended version of the HINoV method for the identification of the noisy variables (Carmone et al. (1999)) for nonmetric, mixed, and symbolic interval data is presented in this paper. Proposed modifications are evaluated on simulated data from a variety of models. The models contain the known structure of clusters. In addition, the models contain a different number of noisy (irrelevant) variables added to obscure the underlying structure to be recovered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modelling the role of variables in model-based cluster analysis

Article 12 January 2017

Exact-Permutation-Based Sign Tests for Clustered Binary Data Via Weighted and Unweighted Test Statistics

Article 22 July 2016

Clustering of modal-valued symbolic data

Article 24 October 2020

References

BILLARD, L., DIDAY, E. (2006): Symbolic data analysis. Conceptual statistics and data mining, Wiley, Chichester.
MATH Google Scholar
CARMONE, F.J., KARA, A. and MAXWELL, S. (1999): HINoV: a new method to improve market segment definition by identifying noisy variables, Journal of Marketing Research, vol. 36, November, 501-509.
Article Google Scholar
GNANADESIKAN, R., KETTENRING, J.R., and TSAO, S.L. (1995): Weighting and selec-tion of variables for cluster analysis, Journal of Classification, vol. 12, no. 1, 113-136.
Article MATH Google Scholar
HUBERT, L.J., ARABIE, P. (1985): Comparing partitions, Journal of Classification, vol. 2, no. 1, 193-218.
Article Google Scholar
JAJUGA, K., WALESIAK, M., BAK, A. (2003): On the General Distance Measure, In: M., Schwaiger, and O., Opitz (Eds.), Exploratory data analysis in empirical research, Springer-Verlag, Berlin, Heidelberg, 104-109.
Google Scholar
MILLIGAN, G.W. (1996): Clustering validation: results and implications for applied analyses, In: P., Arabie, L.J., Hubert, G., de Soete (Eds.), Clustering and classification, World Scientific, Singapore, 341-375.
Google Scholar
TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001): Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society, ser. B, vol. 63, part 2,411-423.
Article MATH MathSciNet Google Scholar
WALESIAK, M. (2005): Variable selection for cluster analysis - approaches, problems, meth-ods, Plenary Session of the Committee on Statistics and Econometrics of the Polish Academy of Sciences, 15, March, Wroclaw.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Econometrics and Computer Science, Wroclaw University of Economics, Nowowiejska 3, 58-500, Jelenia Gora, Poland
Marek Walesiak & Andrzej Dudek

Authors

Marek Walesiak
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Dudek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walesiak, M., Dudek, A. (2008). Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modelling the role of variables in model-based cluster analysis

Exact-Permutation-Based Sign Tests for Clustered Binary Data Via Weighted and Unweighted Test Statistics

Clustering of modal-valued symbolic data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modelling the role of variables in model-based cluster analysis

Exact-Permutation-Based Sign Tests for Clustered Binary Data Via Weighted and Unweighted Test Statistics

Clustering of modal-valued symbolic data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation