Abstract
A fundamental procedure appearing within such clustering methods as k-Means, Expectation Maximization, Fuzzy-C-Means and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to compute, far too slow for large data mining applications. In this paper, we propose O(Dn√n)-time randomized algorithms for computing robust estimators of location, where n is the size of the data set, and D is the dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. S. Bradley, O. L. Mangasarian, and W. N. Street. Clustering via concave minimization. Advances in Neural Information Processing Systems, 9:368–374, 1997.
V. Cherkassky and F. Muller. Learning from Data — Concept, Theory and Methods. John Wiley, NY, 1998.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society B, 39:1–38, 1977.
V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
V. Estivill-Castro and J. Yang. A fast and robust general purpose clustering algorithm. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
U. Fayyad and R. Uthurusamy. Data mining and knowledge discovery in databases. Comm. ACM, 39(11):24–26, Nov. 1996. Special issue on Data Mining.
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley, NY, 1986.
P. J. Huber. Robust Regression. John Wiley, NY, 1981.
M. I. Jordan and R. I. Jacobs. Supervised learning and divide-and-conquer: a statistical approach. In Proc. 10th Intern. Conf. Machine Learning, San Mateo, CA, Morgan Kaufmann, pp. 159–166, 1993.
K. Koperski, J. Han, and J. Adhikary. Mining knowledge in geographical data. Communications of the ACM, to appear.
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, UK, 1995.
N. J. Nilsson. Artificial Intelligence — A New Synthesis. Morgan Kaufmann, 1998.
E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977.
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, NY, 1987.
M. A. Tanner. Tools for Statistical Inference. Springer-Verlag, NY, USA, 1993.
D. M. Titterington, A. F. M. Smith, and U. E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley, UK, 1985.
D. E. Tyler. Some issues in the robust estimation of multivariate location and scatter. In W. Stahel and S. Weisberg, eds., Directions in Robust Statistics and Diagnostics — Part II, Springer-Verlag, Berlin, pp. 327–336, 1991.
C. S. Wallace. Intrinsic classfication of spatially correlated data. The Computer Journal, 41(8):602–611, 1998.
R. R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing. Academic Press, San Diego, CA, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Estivill-Castro, V., Houle, M.E. (2001). Fast Randomized Algorithms for Robust Estimation of Location. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_7
Download citation
DOI: https://doi.org/10.1007/3-540-45244-3_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive