Fast Randomized Algorithms for Robust Estimation of Location

Vladimir Estivill-Castro³ &
Michael E. Houle⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2007))

Included in the following conference series:

International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining

1016 Accesses

Abstract

A fundamental procedure appearing within such clustering methods as k-Means, Expectation Maximization, Fuzzy-C-Means and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to compute, far too slow for large data mining applications. In this paper, we propose O(Dn√n)-time randomized algorithms for computing robust estimators of location, where n is the size of the data set, and D is the dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Robust Minimax Variance Estimation of Location under Bounded Distribution Interquantile Ranges

Article 22 May 2020

A Comparative Study of Robust and Stable Estimates of Multivariate Location

Article 21 February 2019

Recent Developments in Approximation Algorithms for Facility Location and Clustering Problems

References

P. S. Bradley, O. L. Mangasarian, and W. N. Street. Clustering via concave minimization. Advances in Neural Information Processing Systems, 9:368–374, 1997.
Google Scholar
V. Cherkassky and F. Muller. Learning from Data — Concept, Theory and Methods. John Wiley, NY, 1998.
Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society B, 39:1–38, 1977.
MATH MathSciNet Google Scholar
V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
Google Scholar
V. Estivill-Castro and J. Yang. A fast and robust general purpose clustering algorithm. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
Google Scholar
U. Fayyad and R. Uthurusamy. Data mining and knowledge discovery in databases. Comm. ACM, 39(11):24–26, Nov. 1996. Special issue on Data Mining.
Article Google Scholar
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. John Wiley, NY, 1986.
MATH Google Scholar
P. J. Huber. Robust Regression. John Wiley, NY, 1981.
Google Scholar
M. I. Jordan and R. I. Jacobs. Supervised learning and divide-and-conquer: a statistical approach. In Proc. 10th Intern. Conf. Machine Learning, San Mateo, CA, Morgan Kaufmann, pp. 159–166, 1993.
Google Scholar
K. Koperski, J. Han, and J. Adhikary. Mining knowledge in geographical data. Communications of the ACM, to appear.
Google Scholar
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, UK, 1995.
MATH Google Scholar
N. J. Nilsson. Artificial Intelligence — A New Synthesis. Morgan Kaufmann, 1998.
Google Scholar
E. M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977.
Google Scholar
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, NY, 1987.
MATH Google Scholar
M. A. Tanner. Tools for Statistical Inference. Springer-Verlag, NY, USA, 1993.
MATH Google Scholar
D. M. Titterington, A. F. M. Smith, and U. E. Makov. Statistical Analysis of Finite Mixture Distributions. John Wiley, UK, 1985.
MATH Google Scholar
D. E. Tyler. Some issues in the robust estimation of multivariate location and scatter. In W. Stahel and S. Weisberg, eds., Directions in Robust Statistics and Diagnostics — Part II, Springer-Verlag, Berlin, pp. 327–336, 1991.
Google Scholar
C. S. Wallace. Intrinsic classfication of spatially correlated data. The Computer Journal, 41(8):602–611, 1998.
Article MATH Google Scholar
R. R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing. Academic Press, San Diego, CA, 1997.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Software Engineering, The University of Newcastle, NSW 2308, Callaghan, Australia
Vladimir Estivill-Castro
Basser Department of Computer Science, The University of Sydney, NSW 2006, Sydney, Australia
Michael E. Houle

Authors

Vladimir Estivill-Castro
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Houle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Informatics and Engineering, Flinders University of South Australia, P.O. Box 2100, 5001, Adelaide, South Australia
John F. Roddick
National Center for Geographic Information and Analysis, University of Maine, ME 04469-5711, Maine, Orono, USA
Kathleen Hornsby

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Estivill-Castro, V., Houle, M.E. (2001). Fast Randomized Algorithms for Robust Estimation of Location. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_7

Download citation

DOI: https://doi.org/10.1007/3-540-45244-3_7
Published: 16 March 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics