Prior Data Quality Management in Data Mining Process

Mamadou S. Camara³,
Djasrabe Naguingar⁴ &
Alassane Bah⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 312))

2549 Accesses
3 Citations

Abstract

Data Mining (DM) projects are implemented by following the knowledge discovery process. Several techniques for detecting and handling data quality problems such as missing data, outliers, inconsistent data or time-variant data, can be found in the literature of DM and Data Warehousing (DW). Tasks that are related to the quality of data are mostly in the Data Understanding and in the Data Preparation phases of the DM process. The main limitation in the application of the data quality management techniques is the complexity caused by a lack of anticipation in the detection and resolution of the problems. A DM process model designed for the prior management of data quality is proposed in this work. In this model, the DM process is defined in relation to the Software Engineering (SE) process; the two processes are combined in parallel. The main contribution of this DM process is the anticipation and the automation of all activities necessary to remove data quality problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Integration, Management, and Quality: From Basic Research to Industrial Application

Data mining for software engineering and humans in the loop

Article Open access 16 April 2016

Data Quality Management: An Overview of Methods and Challenges

References

Sharma, S., K.-M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated Knowledge Discovery and Data Mining process model. Expert Systems with Applications, 2012. 39(13): p. 11335–11348.
Article Google Scholar
Larose, D.T., Data Mining Methods and Models. 2006: John Wiley & Sons.
Google Scholar
SPSS, CRISP-DM 1.0: Step-by-step data mining guide. 2000, SPSS Inc.
Google Scholar
Malinowski, E. and E. Zimanyi, A conceptual model for temporal data warehouses and its transformation to the ER and the object-relational models. Data & Knowledge Engineering, 2008. 64: p. 101–133.
Article Google Scholar
Kimball, R. and M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Second ed. 2002: John Wiley & Sons, Inc.
Google Scholar
Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24: p. 53–62.
Article Google Scholar
Little, R.J.A. and D.B. Rubin, Statistical Analysis with Missing Data. 1987, New York: J. Wiley & Sons.
MATH Google Scholar
Zha, Y., et al., Dealing with missing data based on data envelopment analysis and halo effect. Applied Mathematical Modelling, 2013. 37: p. 6135–6145.
Article MathSciNet Google Scholar
Hawkins, D., Identifications of Outliers. 1980, London: Chapman and Hall.
Book Google Scholar
Jiang, F., Y. Sui, and C. Cao, A hybrid approach to outlier detection based on boundary region. Pattern Recognition Letters, 2011. 32: p. 1860–1870.
Article Google Scholar
Li, X. and F. Rao, Outlier Detection Using the Information Entropy of Neighborhood Rough Sets. Journal of Information & Computational Science, 2012. 9(12): p. 3339-3350.
Google Scholar
Barnett, V. and T. Lewis, Outliers in Statistical Data. 1994: Wiley.
Google Scholar
Johnson, T., I. Kwok, and R. Ng. Fast Computation of 2-Dimensional Depth Contours in the 4th International Conference on Knowledge Discovery and Data Mining. 1998. New York.
Google Scholar
Jain, A.K., M.N. Murty, and P.J. Flynn, Data clustering: A review. ACM Computing Surveys, 1999. 31(3): p. 264-323
Article Google Scholar
Angiulli, F., R. Ben-Eliyahu–Zohary, and L. Palopoli, Outlier detection using default reasoning. Artificial Intelligence, 2008.172: p. 96–115.
Article MathSciNet Google Scholar
Arenas, M., L. Bertossi, and J. Chomicki, Consistent Query Answers in Inconsistent Databases, in ACM Symposium on Principles of Database Systems (PODS). 1999, ACM Press. p. 68–79.
Google Scholar
Bertossi, L., Consistent Query Answering in Databases. ACM SIGMOD Record, 2006. 35(2): p. 68–76.
Article Google Scholar
García-Garcia, J. and C. Ordonez, Extended aggregations for databases with referential integrity issues. Data & Knowledge Engineering, 2010. 69: p. 73–95.
Article Google Scholar
Caniupan, M., L. Bravo, and C.A. Hurtado, Repairing inconsistent dimensions in data warehouses. Data & Knowledge Engineering, 2012. 79–80: p. 17–39.
Article Google Scholar
Caniupan, M.M. and A. Placencia, Data Warehouse Fixer: Fixing Inconsistencies in Data Warehouses in 30th International Conference of the Chilean Computer Science Society. 2011, IEEE Curico p. 28–32.
Google Scholar
Snodgrass, R.T., Developing Time-Oriented Database Applications in SQL. 2000, San Francisco: Morgan Kaufmann Publishers, Inc.
Google Scholar
Johnston, T. and R. Weis, Managing Time in Databases: A Comprehensive Approach. 2010: Morgan Kaufmann.
Google Scholar
Mitsa, T., Temporal Data Mining. 2010: Taylor & Francis.
Google Scholar
Hsu, W., M.L. Lee, and J. Wang, Temporal and Spatio-temporal Data Mining. 2008: IGI Global Snippet.
Google Scholar
Sommerville, I., Software Engineering. Ninth Edition ed. 2011: Addison-Wesley.
Google Scholar
Marban, O., et al., Toward datamining engineering: A software engineering approach. Information Systems, 2009. 34: p. 87–107.
Article Google Scholar
Ian, H.W. and F. Eibe, Data Mining Practical Machine Learning Tools and Techniques. 2005: Morgan Kaufmann Publishers.
Google Scholar
Boettcher, S.G. and C. Dethlefsen, deal: A Package for Learning Bayesian Networks. Journal of Statistical Software, 2003. 8(20): p. 1–40.
Google Scholar
Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. second edition ed. 2005: The Morgan Kaufmann Series in Data Management Systems.
Google Scholar
Conallen, J., Building Web Applications with Uml 2002, Boston, MA, USA: Addison-Wesley Longman Publishing.
Google Scholar
Ramakrishnan, R. and J. Gehrke, Database Management Systems. 2000, Berkeley: Osborne/McGraw-Hill.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique, Réseaux et Télécoms (LIRT), Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 5085, dakar-fann, Dakar, Senegal
Mamadou S. Camara
Laboratoire d’Imagerie Médicale et de BioInformatique (LIMBI), Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 5085, dakar-fann, Dakar, Senegal
Djasrabe Naguingar
UMI 209, UMMISCO - UCAD, Ecole Supérieure Polytechnique, Université Cheikh Anta Diop de Dakar, BP 15915, Dakar-Fann, Senegal
Alassane Bah

Authors

Mamadou S. Camara
View author publications
You can also search for this author in PubMed Google Scholar
Djasrabe Naguingar
View author publications
You can also search for this author in PubMed Google Scholar
Alassane Bah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mamadou S. Camara .

Editor information

Editors and Affiliations

Computer Science and Engineering, University of Bridgeport Associate Dean for Graduate Programs, Bridgeport, Connecticut, USA
Khaled Elleithy
Engineering and Computer Science, University of Bridgeport Dean of the School of Engineering, Bridgeport, Connecticut, USA
Tarek Sobh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Camara, M.S., Naguingar, D., Bah, A. (2015). Prior Data Quality Management in Data Mining Process. In: Elleithy, K., Sobh, T. (eds) New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 312. Springer, Cham. https://doi.org/10.1007/978-3-319-06764-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-06764-3_37
Published: 08 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06763-6
Online ISBN: 978-3-319-06764-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Prior Data Quality Management in Data Mining Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data Integration, Management, and Quality: From Basic Research to Industrial Application

Data mining for software engineering and humans in the loop

Data Quality Management: An Overview of Methods and Challenges

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Prior Data Quality Management in Data Mining Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data Integration, Management, and Quality: From Basic Research to Industrial Application

Data mining for software engineering and humans in the loop

Data Quality Management: An Overview of Methods and Challenges

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation