Abstract
Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sharma, S., K.-M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated Knowledge Discovery and Data Mining process model. Expert Systems with Applications, 2012. 39(13): p. 11335–11348.
Larose, D.T., Data Mining Methods and Models. 2006: John Wiley & Sons.
SPSS, CRISP-DM 1.0: Step-by-step data mining guide. 2000, SPSS Inc.
Malinowski, E. and E. Zimanyi, A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models. Data & Knowledge Engineering, 2008. 64: p. 101–133.
Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Second ed. 2002: John Wiley & Sons, Inc.
Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24: p. 53–62.
Little, R.J.A. and D.B. Rubin, Statistical Analysis with Missing Data. 1987, New York: J. Wiley & Sons.
Zha, Y., et al., Dealing with missing data based on data envelopment analysis and halo effect. Applied Mathematical Modelling, 2013. 37: p. 6135–6145.
Hawkins, D., Identifications of Outliers. 1980, London: Chapman and Hall.
Jiang, F., Y. Sui, and C. Cao, A hybrid approach to outlier detection based on boundary region. Pattern Recognition Letters, 2011. 32: p. 1860–1870.
Li, X. and F. Rao, Outlier Detection Using the Information Entropy of Neighborhood Rough Sets. Journal of Information & Computational Science, 2012. 9(12): p. 3339-3350.
Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley.
Johnson, T., I. Kwok, and R. Ng. Fast Computation of 2-Dimensional Depth Contours in the 4th International Conference on Knowledge Discovery and Data Mining. 1998. New York.
Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: A review. ACM Computing Surveys, 1999. 31(3): p. 264-323
Angiulli, F., R. Ben-Eliyahu–Zohary, and L. Palopoli, Outlier detection using default reasoning. Artificial Intelligence, 2008.172: p. 96–115.
Arenas, M., L. Bertossi, and J. Chomicki, Consistent Query Answers in Inconsistent Databases, in ACM Symposium on Principles of Database Systems (PODS). 1999, ACM Press. p. 68–79.
Bertossi, L., Consistent Query Answering in Databases. ACM SIGMOD Record, 2006. 35(2): p. 68–76.
García-Garcia, J. and C. Ordonez, Extended aggregations for databases with referential integrity issues. Data & Knowledge Engineering, 2010. 69: p. 73–95.
Caniupan, M., L. Bravo, and C.A. Hurtado, Repairing inconsistent dimensions in data warehouses. Data & Knowledge Engineering, 2012. 79–80: p. 17–39.
Caniupan, M.M. and A. Placencia, Data Warehouse Fixer: Fixing Inconsistencies in Data Warehouses in 30th International Conference of the Chilean Computer Science Society. 2011, IEEE Curico p. 28–32.
Snodgrass, R.T., Developing Time-Oriented Database Applications in SQL. 2000, San Francisco: Morgan Kaufmann Publishers, Inc.
Johnston, T. and R. Weis, Managing Time in Databases: A Comprehensive Approach. 2010: Morgan Kaufmann.
Mitsa, T., Temporal Data Mining. 2010: Taylor & Francis.
Hsu, W., M.L. Lee, and J. Wang, Temporal and Spatio-temporal Data Mining. 2008: IGI Global Snippet.
Sommerville, I., Software Engineering. Ninth Edition ed. 2011: Addison-Wesley.
Marban, O., et al., Toward datamining engineering: A software engineering approach. Information Systems, 2009. 34: p. 87–107.
Ian, H.W. and F. Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2005: Morgan Kaufmann Publishers.
Boettcher, S.G. and C. Dethlefsen, deal: A Package for Learning Bayesian Networks. Journal of Statistical Software, 2003. 8(20): p. 1–40.
Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. second edition ed. 2005: The Morgan Kaufmann Series in Data Management Systems.
Conallen, J., Building Web Applications with Uml 2002, Boston, MA, USA: Addison-Wesley Longman Publishing.
Ramakrishnan, R. and J. Gehrke, Database Management Systems. 2000, Berkeley: Osborne/McGraw-Hill.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Camara, M.S., Naguingar, D., Bah, A. (2015). Prior Data Quality Management in Data Mining Process. In: Elleithy, K., Sobh, T. (eds) New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-06764-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-06764-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06763-6
Online ISBN: 978-3-319-06764-3
eBook Packages: EngineeringEngineering (R0)