Nothing Special   »   [go: up one dir, main page]

Skip to main content

DCUBE: CUBE on Dirty Databases

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

  • 1723 Accesses

Abstract

In the real world databases, dirty data such as inconsistent data, duplicate data affect the effectiveness of applications with database. It brings new challenges to efficiently process OLAP on the database with dirty data. CUBE is an important operator for OLAP. This paper proposes the CUBE operation based on overlapping clustering, and an effective and efficient storing and computing method for CUBE on the database with dirty data. Based on CUBE, this paper proposes efficient algorithms for answering aggregation queries, and the processing methods of other major operators for OLAP on the database with dirty data. Experimental results show the efficiency of the algorithms presented in this paper.

Supported by the National Science Foundation of China (No 60703012, 60773063), the NSFC-RGC of China (No. 60831160525), National Grant of Fundamental Research 973 Program of China (No. 2006CB303000), National Grant of High Technology 863 Program of China (No. 2009AA01Z149), Key Program of the National Natural Science Foundation of China (No. 60933001), National Postdoctor Foundation of China (No. 20090450126), Development Program for Outstanding Young Teachers in Harbin Institute of Technology (no. HITQNJS. 2009. 052).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Xie, J., Yang, J., Chen, J., Wang, H., Yu, P.S.: A sampling-based approach to information recovery. In: ICDE, pp. 476–485. IEEE, Cancún (2008)

    Google Scholar 

  2. Qi, Y., Candan, K.S., Sapino, M.L.: Ficsr: Feedback-based nonistency resolution and query processing on misaligned data sources. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD, pp. 151–162. ACM, Beijing (2007)

    Google Scholar 

  3. Jeffery, S.R., Garofalakis, M.N., Franklin, M.J.: Adaptive Cleaning for RFID Data Streams. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 163–174. ACM, Seoul (2006)

    Google Scholar 

  4. Xiong, H., Pandey, G., Steinbach, M., Kumar, V.: Enhancing Data Analysis with Noise Removal. TKDE 18(2), 304–319 (2006)

    Google Scholar 

  5. Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. Journal of Computer and System Sciences 73(4), 610–635 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fuxman, A., Fazli, E., Miller, R.J.: ConQuer: Efficient Management of Inconsistent Databases. In: Özcan, F. (ed.) SIGMOD, pp. 155–166. ACM, Baltimore (2005)

    Google Scholar 

  7. Andritsos, P., Fuxman, P., Miller, R.J.: Clean Answers over Dirty Databases: A Probabilistic Approach. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 30. IEEE, Atlanta (2006)

    Google Scholar 

  8. Gal, A., Martinez, M.V., Simari, G.I., Subrahmanian, V.S.: Aggregate Query Answering under Uncertain Schema Mappings. In: ICDE, pp. 940–951. IEEE, Shanghai (2009)

    Google Scholar 

  9. Jiang, G., Wang, H., Li, J., Gao, H.: An Aggregation Query Processing Method of Dirty Database Based on Clustering. Journal of Computer Research and Development (suppl. 46), 140–146 (2009)

    Google Scholar 

  10. Sismanis, Y., Wang, L., Fuxman, A., Haas, P.J., Reinwald, B.: Resolution-Aware Query Answering for Business Intelligence. In: ICDE, pp. 976–987. IEEE, Shanghai (2009)

    Google Scholar 

  11. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. The Computing Research Repository, abs/cs/0701155 (2007)

    Google Scholar 

  12. Burdick, D., Deshpande, P.M., Jayram, T.S., Ramakrishnan, R., Vaithyanathan, S.: OLAP over uncertain and imprecise data. VLDB J. 16(1), 123–144 (2007)

    Google Scholar 

  13. http://www.tpc.org/tpch/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, G., Wang, H., Jiang, S., Li, J., Gao, H. (2010). DCUBE: CUBE on Dirty Databases. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics