Abstract
In the real world databases, dirty data such as inconsistent data, duplicate data affect the effectiveness of applications with database. It brings new challenges to efficiently process OLAP on the database with dirty data. CUBE is an important operator for OLAP. This paper proposes the CUBE operation based on overlapping clustering, and an effective and efficient storing and computing method for CUBE on the database with dirty data. Based on CUBE, this paper proposes efficient algorithms for answering aggregation queries, and the processing methods of other major operators for OLAP on the database with dirty data. Experimental results show the efficiency of the algorithms presented in this paper.
Supported by the National Science Foundation of China (No 60703012, 60773063), the NSFC-RGC of China (No. 60831160525), National Grant of Fundamental Research 973 Program of China (No. 2006CB303000), National Grant of High Technology 863 Program of China (No. 2009AA01Z149), Key Program of the National Natural Science Foundation of China (No. 60933001), National Postdoctor Foundation of China (No. 20090450126), Development Program for Outstanding Young Teachers in Harbin Institute of Technology (no. HITQNJS. 2009. 052).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xie, J., Yang, J., Chen, J., Wang, H., Yu, P.S.: A sampling-based approach to information recovery. In: ICDE, pp. 476–485. IEEE, Cancún (2008)
Qi, Y., Candan, K.S., Sapino, M.L.: Ficsr: Feedback-based nonistency resolution and query processing on misaligned data sources. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD, pp. 151–162. ACM, Beijing (2007)
Jeffery, S.R., Garofalakis, M.N., Franklin, M.J.: Adaptive Cleaning for RFID Data Streams. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 163–174. ACM, Seoul (2006)
Xiong, H., Pandey, G., Steinbach, M., Kumar, V.: Enhancing Data Analysis with Noise Removal. TKDE 18(2), 304–319 (2006)
Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. Journal of Computer and System Sciences 73(4), 610–635 (2007)
Fuxman, A., Fazli, E., Miller, R.J.: ConQuer: Efficient Management of Inconsistent Databases. In: Özcan, F. (ed.) SIGMOD, pp. 155–166. ACM, Baltimore (2005)
Andritsos, P., Fuxman, P., Miller, R.J.: Clean Answers over Dirty Databases: A Probabilistic Approach. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 30. IEEE, Atlanta (2006)
Gal, A., Martinez, M.V., Simari, G.I., Subrahmanian, V.S.: Aggregate Query Answering under Uncertain Schema Mappings. In: ICDE, pp. 940–951. IEEE, Shanghai (2009)
Jiang, G., Wang, H., Li, J., Gao, H.: An Aggregation Query Processing Method of Dirty Database Based on Clustering. Journal of Computer Research and Development (suppl. 46), 140–146 (2009)
Sismanis, Y., Wang, L., Fuxman, A., Haas, P.J., Reinwald, B.: Resolution-Aware Query Answering for Business Intelligence. In: ICDE, pp. 976–987. IEEE, Shanghai (2009)
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. The Computing Research Repository, abs/cs/0701155 (2007)
Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, G., Wang, H., Jiang, S., Li, J., Gao, H. (2010). DCUBE: CUBE on Dirty Databases. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)