Abstract
Historical data reports on numerous events for overlapping time intervals, locations, and names. As a result, it may include severe data conflicts caused by database redundancy that prevent researchers from obtaining the correct answers to queries on an integrated historical database. In this paper, we propose a novel conflict-aware data fusion strategy for historical data sources. We evaluated our approach on a large-scale data warehouse that integrates historical data from approximately 50,000 reports on US epidemiological data for more than 100 years. We demonstrate that our approach significantly reduces data aggregation error in the integrated historical database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afrati, F., Kolaitis, P.: Repair Checking in Inconsistent Databases: Algorithms and Complexity. In: Proc. of ICDT (2009)
Agarwal, S., Keller, A., Wiederhold, G., Saraswat, K.: Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In: Proc. of ICDE (1995)
Arenas, M., Bertossi, L., Chomicki, J.: Specifying and Querying Database Repairs using Logic Programs with Exceptions. In: Proc. of FQAS (2000)
Bernstein, P., Melnik, S.: Model Management 2.0: Manipulating Richer Mappings. In: Proc. of ACM SIGMOD (2007)
Bertossi, L.: Consistent Query Answering in Databases. ACM SIGMOD Record 35(2) (2006)
Bertossi, L., Chomicki, J.: Query Answering in Inconsistent Databases. In: Logics for Emerging Applications of Databases. Springer, Heidelberg (2003)
Bleiholder, J., Naumann, F.: Data Fusion. ACM Computing Surveys 41(1) (2008)
Bohannon, P., Flaster, M., Fan, W., Rastorgi, R.: A Cost-based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: Proc. of ACM SIGMOD (2005)
Brodie, M.: Data Integration at Scale: From Relational Data Integration to Information Ecosystems. In: Proc. of AINA (2010)
Brodie, M.: Data Management Challenges in Very Large Enterprises. In: Proc. of VLDB (2002)
Bry, F.: Query Answering in Information Systems with Integrity Constraints. In: Proc. of IICIS (1997)
Caroprese, L., Greco, S.: Active Integrity Constraints for Database Consistency Maintenance. IEEE TKDE 21(7) (2009)
Chomicki, J., Staworko, S., Marcinkowski, J.: Computing Consistent Query Answers Using Conflict Hypergraph. In: Proc. of CIKM (2004)
Date, J., Darwen, H., Lorentzos: Temporal Data and the Relational Model. Morgan Kaufmann, San Francisco (2003)
Dong, X., Naumann, F.: Data Fusion - Resolving Data Conflicts for Integration. In: PVLDB, vol. 2(2) (2009)
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate Record Detection: A Survey. IEEE TKDE 19(1) (2007)
Flesca, S., Furfaro, F., Parisi, F.: Querying and Repairing Inconsistent Numerical Databases. ACM TODS 35(2) (2010)
Flesca, S., Furfaro, F., Parisi, F.: Consistent Query Answers on Numerical Databases Under Aggregate Constraints. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 279–294. Springer, Heidelberg (2005)
Fagin, R., Kolaitis, P., Popa, L.: Data Exchange: Getting to the Core. ACM TODS 30(1) (2005)
Haas, L.: Beauty and the Beast: The Theory and Practice of Information Integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)
Imelinski, T., Lipski, W.: Incomplete Information in Relational Databases. Journal of ACM 31(4) (1984)
Jensen, C., Snograss, R.: Temporal Data Management. IEEE TKDE 11(1) (1999)
Kay, S.: Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall, Englewood Cliffs (1993)
Rahm, E., Bernstein, P.: A Survey of Approaches to Automatic Schema Matching. The VLDB Journal 10(4) (2001)
Senn, S.: Overstating the Evidence - Double Counting in Meta-analysis and Related Problems. BMC Medical Research Methodology 9(10) (2009)
Snodgrass, R.: Developing Time-oriented Database Applications in SQL. Morgan Kaufmann, San Francisco (2000)
Staworko, S., Chomicki, J.: Consistent Query Answers in the Presence of Universal Constraints. Inf. Syst. 35(1) (2010)
Wijsen, J.: Consistent Query Answering under Primary Keys: A Characterization of Tractable Queries. In: Proc. of ICDT (2009)
Wijsen, J.: Database repairing using updates. ACM TODS 30(3) (2005)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Truth Discovery and Copying Detection in a Dynamic World. In: PVLDB, vol. 2(1) (2009)
Dong, X.L., Berti-Equille, L., Srivastava, D.: Integrating Conflicting Data: The Role of Source Dependence. In: PVLDB, vol. 2(1) (2009)
Yin, X., Han, J., Yu, P.: Truth Discovery with Multiple Conflicting Information Provided on the Web. In: Proc. of SIGKDD (2007)
Zadorozhny, V., Raschid, L., Gal, A.: Scalable Catalog Infrastructure for Managing Access Costs and Source Selection in Wide Area Networks. International Journal of Cooperative Information Systems 17(1) (2008)
Zadorozhny, V., Gal, A., Raschid, L., Ye, Q.: AReNA: Adaptive Distributed Catalog Infrastructure Based On Relevance Networks. In: Proc. of VLDB (2005)
Zadorozhny, V., Bright, L., Vidal, M.E., Raschid, L., Urhan, T.: Efficient Evaluation of Queries in a Mediator for WebSources. In: Proc. of ACM SIGMOD (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zadorozhny, V., Hsu, YF. (2011). Conflict-Aware Historical Data Fusion. In: Benferhat, S., Grant, J. (eds) Scalable Uncertainty Management. SUM 2011. Lecture Notes in Computer Science(), vol 6929. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23963-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-23963-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23962-5
Online ISBN: 978-3-642-23963-2
eBook Packages: Computer ScienceComputer Science (R0)