Nothing Special   »   [go: up one dir, main page]

Skip to main content

Prior Data Quality Management in Data Mining Process

  • Conference paper
  • First Online:
New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 312))

Abstract

Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sharma, S., K.-M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated Knowledge Discovery and Data Mining process model. Expert Systems with Applications, 2012. 39(13): p. 11335–11348.

    Article  Google Scholar 

  2. Larose, D.T., Data Mining Methods and Models. 2006: John Wiley & Sons.

    Google Scholar 

  3. SPSS, CRISP-DM 1.0: Step-by-step data mining guide. 2000, SPSS Inc.

    Google Scholar 

  4. Malinowski, E. and E. Zimanyi, A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models. Data & Knowledge Engineering, 2008. 64: p. 101–133.

    Article  Google Scholar 

  5. Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Second ed. 2002: John Wiley & Sons, Inc.

    Google Scholar 

  6. Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24: p. 53–62.

    Article  Google Scholar 

  7. Little, R.J.A. and D.B. Rubin, Statistical Analysis with Missing Data. 1987, New York: J. Wiley & Sons.

    MATH  Google Scholar 

  8. Zha, Y., et al., Dealing with missing data based on data envelopment analysis and halo effect. Applied Mathematical Modelling, 2013. 37: p. 6135–6145.

    Article  MathSciNet  Google Scholar 

  9. Hawkins, D., Identifications of Outliers. 1980, London: Chapman and Hall.

    Book  Google Scholar 

  10. Jiang, F., Y. Sui, and C. Cao, A hybrid approach to outlier detection based on boundary region. Pattern Recognition Letters, 2011. 32: p. 1860–1870.

    Article  Google Scholar 

  11. Li, X. and F. Rao, Outlier Detection Using the Information Entropy of Neighborhood Rough Sets. Journal of Information & Computational Science, 2012. 9(12): p. 3339-3350.

    Google Scholar 

  12. Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley.

    Google Scholar 

  13. Johnson, T., I. Kwok, and R. Ng. Fast Computation of 2-Dimensional Depth Contours in the 4th International Conference on Knowledge Discovery and Data Mining. 1998. New York.

    Google Scholar 

  14. Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: A review. ACM Computing Surveys, 1999. 31(3): p. 264-323

    Article  Google Scholar 

  15. Angiulli, F., R. Ben-Eliyahu–Zohary, and L. Palopoli, Outlier detection using default reasoning. Artificial Intelligence, 2008.172: p. 96–115.

    Article  MathSciNet  Google Scholar 

  16. Arenas, M., L. Bertossi, and J. Chomicki, Consistent Query Answers in Inconsistent Databases, in ACM Symposium on Principles of Database Systems (PODS). 1999, ACM Press. p. 68–79.

    Google Scholar 

  17. Bertossi, L., Consistent Query Answering in Databases. ACM SIGMOD Record, 2006. 35(2): p. 68–76.

    Article  Google Scholar 

  18. García-Garcia, J. and C. Ordonez, Extended aggregations for databases with referential integrity issues. Data & Knowledge Engineering, 2010. 69: p. 73–95.

    Article  Google Scholar 

  19. Caniupan, M., L. Bravo, and C.A. Hurtado, Repairing inconsistent dimensions in data warehouses. Data & Knowledge Engineering, 2012. 79–80: p. 17–39.

    Article  Google Scholar 

  20. Caniupan, M.M. and A. Placencia, Data Warehouse Fixer: Fixing Inconsistencies in Data Warehouses in 30th International Conference of the Chilean Computer Science Society. 2011, IEEE Curico p. 28–32.

    Google Scholar 

  21. Snodgrass, R.T., Developing Time-Oriented Database Applications in SQL. 2000, San Francisco: Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  22. Johnston, T. and R. Weis, Managing Time in Databases: A Comprehensive Approach. 2010: Morgan Kaufmann.

    Google Scholar 

  23. Mitsa, T., Temporal Data Mining. 2010: Taylor & Francis.

    Google Scholar 

  24. Hsu, W., M.L. Lee, and J. Wang, Temporal and Spatio-temporal Data Mining. 2008: IGI Global Snippet.

    Google Scholar 

  25. Sommerville, I., Software Engineering. Ninth Edition ed. 2011: Addison-Wesley.

    Google Scholar 

  26. Marban, O., et al., Toward datamining engineering: A software engineering approach. Information Systems, 2009. 34: p. 87–107.

    Article  Google Scholar 

  27. Ian, H.W. and F. Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2005: Morgan Kaufmann Publishers.

    Google Scholar 

  28. Boettcher, S.G. and C. Dethlefsen, deal: A Package for Learning Bayesian Networks. Journal of Statistical Software, 2003. 8(20): p. 1–40.

    Google Scholar 

  29. Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. second edition ed. 2005: The Morgan Kaufmann Series in Data Management Systems.

    Google Scholar 

  30. Conallen, J., Building Web Applications with Uml 2002, Boston, MA, USA: Addison-Wesley Longman Publishing.

    Google Scholar 

  31. Ramakrishnan, R. and J. Gehrke, Database Management Systems. 2000, Berkeley: Osborne/McGraw-Hill.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mamadou S. Camara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Camara, M.S., Naguingar, D., Bah, A. (2015). Prior Data Quality Management in Data Mining Process. In: Elleithy, K., Sobh, T. (eds) New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-06764-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06764-3_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06763-6

  • Online ISBN: 978-3-319-06764-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics