Abstract
We propose a scalable distributed data structure (SDDS) called SD-Rtree. We intend our structure for point, window and kNN queries over large spatial datasets distributed on clusters of interconnected servers. The structure balances the storage and processing load over the available resources, and aims at minimizing the size of the cluster. SD-Rtree generalizes the well-known Rtree structure. It uses a distributed balanced binary tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. A user/application manipulates the structure from a client node. The client addresses the tree through its image that can be possibly outdated due to later split. This may generate addressing errors, solved by the forwarding among the servers. Specific messages towards the clients incrementally correct the outdated images. We present the building of an SD-Rtree through insertions, focusing on the split and rotation algorithms. We follow with the query algorithms. We describe then a flexible allocation protocol which allows to cope with a temporary shortage of storage resources through data storage balancing. Experiments show additional aspects of SD-Rtree and compare its behavior with a distributed quadtree. The results justify our various design choices and the overall utility of the structure.
Similar content being viewed by others
References
Adelson-Velskii G.M., Landis E.M.: An algorithm for the organization of information. Doklady Akademii Nauk SSSR 146, 263–266 (1962)
Arge, L., Eppstein, D., Goodrich, M.T.: Skip-Webs: efficient distributed data structures for multi-dimensional data sets. In: Proc. Intl. Symp. on Principles of Distributed Computing (PODC), pp. 69–76 (2005)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger B.: The R*tree: an efficient and robust access method for points and rectangles. In: Proc. ACM Symp. on the Management of Data (SIGMOD), pp. 322–331 (1990)
Crainiceanu, A., Linga, P., Gehrke, J., Shanmugasundaram, J.: Querying peer-to-peer networks using P-trees. In: Proc. Intl. Workshop on the Web and Databases (WebDB), pp. 25–30 (2004)
Devine, R.: Design and implementation of DDH: a distributed dynamic hashing algorithm. In: Proc. Intl. Conf. on Foundations of Data Organization and Algorithms (FODO), pp. 101–114 (1993)
du Mouza, C., Litwin, W., Rigaux, P.: Dynamic storage balancing in a distributed spatial index. In: Proc. Intl. Symp. on Geographic Information Systems (ACM-GIS) (2007)
du Mouza, C., Litwin, W., Rigaux, P.: SD-Rtre: a scalable distributed Rtree. In: Proc. Intl. Conf. on Data Engineering (ICDE), pp. 96–305 (2007)
Gaede V., Günther O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)
US Census Bureau Geography Division. Tiger/Line files (2007). http://www.census.gov/geo/www/tiger/
Guttman, A.: R-trees : A dynamic index structure for spatial searching. In: Proc. ACM Symp. on the Management of Data (SIGMOD), pp. 45–57 (1984)
Hambrusch, S.E., Khokhar, A.A.: Maintaining spatial data sets in distributed-memory machines. In: Proc. Intl. Parallel Processing Symp. (IPPS), pp. 702–707 (1997)
Jagadish, H.V., Ooi, B.C, Vu, Q.H.: BATON: a balanced tree structure for peer-to-peer networks. In: Proc. Intl. Conf. on Very Large Data Bases (VLDB), pp. 661–672 (2005)
Jagadish, H.V., Ooi, B.C., Vu, Q.H., Zhang, R., Zhou, A.: VBI-Tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proc. Intl. Conf. on Data Engineering (ICDE) (2006)
Karlsson, J.S.: hQT*: a scalable distributed data structure for high-performance spatial accesses. In: Proc. Intl. Conf. on Foundations of Data Organization and Algorithms (FODO), pp. 37–46 (1998)
Kriakov, V, Delis, A., Kollios, G.: Management of highly dynamic multidimensional data in a cluster of workstations. In: Proc. Intl. Conf. on Extending Data Base Technology (EDBT), pp. 748–764 (2004)
Litwin, W., Neimat, M.-A., Schneider, D.A.: RP*: a family of order preserving scalable distributed data structures. In: Proc. Intl. Conf. on Very Large Data Bases (VLDB), pp. 342–353 (1994)
Litwin W., Neimat M.-A., Schneider D.A.: LH*: a scalable, distributed data structure. ACM Trans. Database Syst. (TODS) 21(4), 480–525 (1996)
Liu, B., Lee, W.-C., Lee, D.L.: Supporting complex multi-dimensional queries in P2P systems. In: Proc. Intl. Conf. on Distributed Computing Systems (ICDCS), pp. 155–164 (2005)
Mondal, A., Lifu, Y., Kitsuregawa, M.: P2PR-Tree: an R-tree-based spatial index for peer-to-peer environments. In: EDBT Workshops, pp. 516–525 (2004)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proc. ACM Symp. on the Management of Data (SIGMOD), pp. 71–79 (1995)
Samet H.: Foundations of Multi-dimensional Data Structures. Kaufmann, Morgan (2006)
Tanin E., Harwood A., Samet H.: Using a distributed quadtree index in peer-to-peer networks. VLDB J. 16(2), 165–178 (2007)
Theodoridis, Y., Silva, J.R.O., Nascimento, M.A.: On the generation of spatiotemporal datasets. In: Proc. Intl. Conf. on Large Spatial Databases (SSD), pp. 147–164 (1999)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
du Mouza, C., Litwin, W. & Rigaux, P. Large-scale indexing of spatial data in distributed repositories: the SD-Rtree. The VLDB Journal 18, 933–958 (2009). https://doi.org/10.1007/s00778-009-0135-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-009-0135-4