Synonyms
Data cleansing
Definition
Data scrubbing refers to the task of first identifying data that are corrupted, incomplete, invalid, missing, inconsistent, outdated, duplicated, or irrelevant and then either correcting or removing such “dirty” data. The aim of data scrubbing is to make data more accurate, more complete, and consistent both within and across different tables in a database or data warehouse.
An important challenge of data scrubbing is that “dirty” values do not necessarily contradict any database requirements, i.e., such values are consistent with the design of a database and its schema. Rather, errors occur at a higher conceptual level. Examples include credit card numbers that follow a correct grouping of four-times-four digits but that are invalid with regard to a check-sum algorithm, or addresses that have a valid zipcode value that is inconsistent with the town and state names in the same record. Such errors can occur because of a lack of checks and validation...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Batini C, Scannapieco M. Data quality: concepts, methodologies and techniques, Data-centric systems and applications. Berlin: Springer; 2006.
Christen P. Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection, Data-centric systems and applications. Berlin: Springer; 2012.
Fan W, Geerts F, Jia X, Kementsietsidis A. Conditional functional dependencies for capturing data inconsistencies. ACM Trans Database Syst. 2008;33(2):6.
Lee Y, Pipino L, Funk J, Wang R. Journey to data quality. Cambridge, MA: The MIT Press; 2009.
Maletic JI, Marcus A. Data cleansing: a prelude to knowledge discovery. In: Data mining and knowledge discovery handbook. New York: Springer; p. 19–32.
Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Christen, P. (2018). Data Scrubbing. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80621
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80621
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering