Abstract
In recent years, data uncertainty widely attracts researchers’ attention because the amount of imprecise data is growing rapidly. Although data are not known exactly, probability distributions or expected errors are sometimes available. While most researchers on uncertain data mining are looking for methods to extract mining results from uncertain data, which is usually in the form of probability distributions or expected errors, it is also very important to lower the data uncertainty by making a part of data more certain to help get better mining results. For example, input values of some sensors in the sensor network are usually designed to be recorded more frequently than others because they are more important or more likely to change. In this paper, the issue of selecting a part of uncertain data and acquiring their exact values to improve clustering results is explored. Under a general uncertainty model, we propose both global and localized data selection methods, which can be used together with any existing uncertain clustering algorithm. Experimental results show that the quality of clustering improves after the selective exact value acquisition is applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C., Yu, P.S.: A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering (2009)
Aggarwal, C.C., Yu, P.S.: A Framework for Clustering Uncertain Data Streams. In: Proceedings of the 24th IEEE International Conference on Data Engineering (2008)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proceedings of the 29th International Conference on Very Large Data Bases (2003)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1999)
Chen, J., Cheng, R.: Quality-Aware Probing of Uncertain Data with Resource Constraints. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 491–508. Springer, Heidelberg (2008)
Cheng, R., Chen, J., Xie, X.: Cleaning Uncertain Data with Quality Guarantees. In: Proceedings of the 34th International Conference on Very Large Data Bases (2008)
Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., Hong, W.: Model-Driven Data Acquisition in Sensor Networks. In: Proceedings of the 34th International Conference on Very Large Data Bases (2004)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1996)
Kriegel, H.-P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2005)
Kriegel, H.-P., Pfeifle, M.: Hierarchical Density Based Clustering of Uncertain Data. In: Proceedings of the 5th IEEE International Conference on Data Mining (2005)
Kriegel, H.-P., Pfeifle, M.: Measuring the Quality of Approximated Clusterings. In: BTW (2005)
Ngai, W., Kao, B., Chui, C., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proceedings of the 6th IEEE International Conference on Data Mining (2006)
Olston, C., Jiang, J., Widom, J.: Adaptive Filters for Continuous Queries over Distributed Data Streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)
Zhang, Q., Li, F., Yi, K.: Finding Frequent Items in Probailistic Data. In: Proceedings of ACM SIGMOD International Conference on Management of Data (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, YC., Yang, DN., Chen, MS. (2010). Data Selection for Exact Value Acquisition to Improve Uncertain Clustering. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)