Abstract
Outlier detection in large datasets is an important problem. There are several recent approaches that employ very reasonable definitions of an outlier. However, a fundamental issue is that the notion of which objects are outliers typically varies between users or, even, datasets. In this paper, we present a novel solution to this problem, by bringing users into the loop. Our OBE (Outlier By Example) system is, to the best of our knowledge, the first that allows users to give some examples of what they consider as outliers. Then, it can directly incorporate a small number of such examples to successfully discover the hidden concept and spot further objects that exhibit the same “outlier-ness” as the examples. We describe the key design decisions and algorithms in building such a system and demonstrate on both real and synthetic datasets that OBE can indeed discover outliers that match the users’ intentions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, Chichester (1994)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: Proc. SIGMOD Conf., pp. 93–104 (2000)
Bay, S.D., Schwabacher, M.: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule. In: SIGKDD 2003, August 24-27 (2003)
Hawkins, D.M.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)
Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: Proc. KDD, pp. 224–228 (1998)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comp. Surveys 31(3), 264–323 (1999)
Knorr, E.M., Ng, R.T.: A unified notion of outliers: Properties and computation. In: Proc. KDD, pp. 219–222 (1997)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proc. VLDB 1998, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: Proc. VLDB, pp. 211–222 (1999)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: Proc. ICDE, pp. 315–326 (2003)
Yu, H., Han, J., Chang, K.: PEBL: Positive Example Based Learning for Web Page Classification Using SVM. In: Proc. KDD (2002)
Yamanishi, K., Takeuchi, J.: Discovering Outlier Filtering Rules from Unlabeled Data. In: Proc. KDD (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, C., Kitagawa, H., Papadimitriou, S., Faloutsos, C. (2004). OBE: Outlier by Example. In: Dai, H., Srikant, R., Zhang, C. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science(), vol 3056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-24775-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22064-0
Online ISBN: 978-3-540-24775-3
eBook Packages: Springer Book Archive