Abstract
The management of uncertainty in large databases has recently attracted tremendous research interest. Data uncertainty is inherent in many emerging and important applications, including location-based services, wireless sensor networks, biometric and biological databases, and data stream applications. In these systems, it is important to manage data uncertainty carefully, in order to make correct decisions and provide high-quality services to users. To enable the development of these applications, uncertain database systems have been proposed. They consider data uncertainty as a “first-class citizen”, and use generic data models to capture uncertainty, as well as provide query operators that return answers with statistical confidences.
We summarize our work on uncertain databases in recent years. We explain how data uncertainty can be modeled, and present a classification of probabilistic queries (e.g., range query and nearest-neighbor query). We further study how probabilistic queries can be efficiently evaluated and indexed. We also highlight the issue of removing uncertainty under a stringent cleaning budget, with an attempt of generating high-quality probabilistic answers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Antova, L., Koch, C., Olteanu, D.: Query language support for incomplete information in the maybms system. In: Proc. VLDB (2007)
Böhm, C., Pryakhin, A., Schubert, M.: The gauss-tree: Efficient object identification in databases of probabilistic feature vectors. In: Proc. ICDE (2006)
Chen, J., Cheng, R.: Efficient evaluation of imprecise location-dependent queries. In: Proc. ICDE (2007)
Chen, J., Cheng, R.: Quality-aware probing of uncertain data with resource constraints. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 491–508. Springer, Heidelberg (2008)
Cheng, R., Chen, J., Mokbel, M., Chow, C.: Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In: Proc. ICDE (2008)
Cheng, R., Chen, J., Xie, X.: Cleaning uncertain data with quality guarantees. In: Proc. VLDB (2008)
Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proc. EDBT (2009)
Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. ACM SIGMOD, pp. 551–562 (2003)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9) (September 2004)
Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J., Xia, Y.: Efficient join processing over uncertain data. In: Proc. CIKM (2006)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proc. VLDB, pp. 876–887 (2004)
Dai, X., Yiu, M.L., Mamoulis, N., Tao, Y., Vaitis, M.: Probabilistic spatial queries on existentially uncertain data. In: Proc. SSTD, pp. 400–417 (2005)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proc. VLDB (2004)
Pfoser, D., Jensen, C.: Capturing the uncertainty of moving-objects representations. In: Proc. SSDBM (1999)
Kriegel, H., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 337–348. Springer, Heidelberg (2007)
Lazaridis, I., Mehrotra, S.: Approximate selection queries over imprecise data. In: ICDE (2004)
Ljosa, V., Singh, A.: Apla: Indexing arbitrary probability distributions. In: Proc. ICDE, pp. 946–955 (2007)
Mar, O., Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB (2006)
Mayfield, C., Singh, S., Cheng, R., Prabhakar, S.: Orion: A database system for managing uncertain data, ver. 0.1 (2006), http://orion.cs.purdue.edu
Parker, A., Subrahmanian, V., Grant, J.: A logical formulation of probabilistic spatial databases. IEEE TKDE 19(11) (2007)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proc. VLDB (2007)
Sarma, A., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proc. ICDE (2006)
Shannon, C.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1949)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database support for probabilistic attributes and tuples. In: Proc. ICDE (2008)
Sistla, P.A., Wolfson, O., Chamberlain, S., Dao, S.: Querying the uncertain position of moving objects. In: Etzion, O., Jajodia, S., Sripada, S. (eds.) Dagstuhl Seminar 1997. LNCS, vol. 1399, Springer, Heidelberg (1998)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proc. VLDB, pp. 922–933 (2005)
Tao, Y., Xiao, X., Cheng, R.: Range search on multidimensional uncertain data. ACM TODS 32(3) (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, R. (2009). Querying and Cleaning Uncertain Data. In: Rothermel, K., Fritsch, D., Blochinger, W., Dürr, F. (eds) Quality of Context. QuaCon 2009. Lecture Notes in Computer Science, vol 5786. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04559-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-04559-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04558-5
Online ISBN: 978-3-642-04559-2
eBook Packages: Computer ScienceComputer Science (R0)