Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1315451.1315526dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections

Locating data sources in large distributed systems

Published: 09 September 2003 Publication History


Querying large numbers of data sources is gaining importance due to increasing numbers of independent data providers. One of the key challenges is executing queries on all relevant information sources in a scalable fashion and retrieving fresh results. The key to scalability is to send queries only to the relevant servers and avoid wasting resources on data sources which will not provide any results. Thus, a catalog service, which would determine the relevant data sources given a query, is an essential component in efficiently processing queries in a distributed environment. This paper proposes a catalog framework which is distributed across the data sources themselves and does not require any central infrastructure. As new data sources become available, they automatically become part of the catalog service infrastructure, which allows scalability to large numbers of nodes. Furthermore, we propose techniques for workload adaptability. Using simulation and real-world data we show that our approach is valid and can scale to thousands of data sources.


{1} A. Aboulnaga, A. R. Alameldeen, J. F. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. VLDB 2001.
{2} P. Albitz, C. Liu. DNS and BIND. (4th Ed.) O'Reilly and Associates, 2001.
{3} B. H. Bloom. Space/time trade-offs in hash coding with allowable errors, Communications of the ACM, July 1970.
{4} R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, K. Stocker. ObjectGlobe: Ubiquitous query processing on the Internet. VLDB Journal 10(1): 48-71 (2001).
{5} B. Choi, What are Real DTDs Like. WebDB 2002.
{6} A. Crespo, H. Garcia-Molina. Routing Indices for Peer-to-Peer Systems, ICDCS 2002.
{7} F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, I. Stoica. Wide-area cooperative storage with CFS. SOSP 2001.
{8} FreeNet:
{9} Gnutella Resources.
{10} L. Galanis, Y. Wang, S. R. Jeffery, D. J. DeWitt. Processing Queries in a Large Peer-to-Peer System. CAiSE 2003 (to appear)
{11} J. Gray, P. Helland, P. O' Neill, D. Shasha. The Dangers of Replication and a Solution. Readings In Database Systems, 3rd edition p372.
{12} M. Harren, J. M. Hellerstein, R. Huebsch, B. T. Loo, S. Shenker, I. Stoica. Complex Queries in DHT-based Peer-to-Peer Networks. IPTPS '02.
{13} Y. Ioannidis, V. Poosala. Histogram-Based Solutions to Diverse Database Estimation Problems. Data Engineering Bulletin 18(3).
{14} L. Kleinrock. Queueing Systems, Volume 1: Theory, John Wiley & Sons, New York, 1975.
{15} J. Kubiatowicz et al. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proc. ASPLOS 2000.
{16} W. Litwin, M. Neimat, D. A. Schneider: LH* - Linear Hashing for Distributed Files. SIGMOD Conference 1993: 327-336
{17} Napster.
{18} M. T. Özsu, P. Valduriez. Principles of Distributed Database Systems, Second Edition. Prentice-Hall 1999.
{19} N. Polyzotis, M. N. Garofalakis. Structure and Value Synopses for XML Data Graphs. VLDB 2002.
{20} V. Papadimos, D. Maier, K. Tufte. Distributed Query Processing and Catalogs for Peer-to-Peer Systems. CIDR 2003
{21} S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Schenker. A Scalable Content-Addressable Network. in Proc. of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications.
{22} A. Rowstron, P. Druschel, Pastry. Scalable, distributed object location and routing for large-scale peer-to-peer systems. IFIP/ACM Intl. Conference on Distributed Systems Platforms.
{23} A. Rowstron, P. Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. SOSP 2001.
{24} S. Saroiu, P. K. Gummadi, S. Gribble. Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS. IPTPS '02
{25} I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In Proc. SIGCOMM 2001.
{26} TPC-C Benchmark Standard Specification Revision 5.0.
{27} S. Waterhouse. JXTA Search: Distributed Search for Distributed Networks. White Paper
{28} R. Williams, D. Daniels, L. Haas, G. Lapis, B. Linsey, P. Ng, R. Obermarck, P. Selinger, A. Walker, P. Wilms, R. Yost. R*: An Overview of the Architecture. IBM Research Report RJ3325.
{29} XML Path Language (XPath) 2.0
{30} B. Yang, H. Garcia-Molina. Efficient Search in peer-to-peer networks. In Proc. ICDCS 2002.
{31} B. Yang, H. Garcia-Molina. Designing a Super-Peer Network, In Proc. ICDE 2003.
{32} B. Y. Zhao, J. Kubiatowicz, A. Joseph. Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing. UCB Tech. Report UCB/CSD-01-1141.

Cited By

View all
  • (2016)A distributed selectivity-driven search strategy for semi-structured data over DHT-based networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.03.01593:C(10-29)Online publication date: 1-Jul-2016
  • (2013)Web data indexing in the cloudProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452382(41-52)Online publication date: 18-Mar-2013
  • (2013)A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standardComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2013.07.002112:3(529-552)Online publication date: 1-Dec-2013
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image DL Hosted proceedings
VLDB '03: Proceedings of the 29th international conference on Very large data bases - Volume 29
September 2003
1134 pages


  • VLDB Endowment: Very Large Database Endowment


VLDB Endowment

Publication History

Published: 09 September 2003


  • Article


VLDB '03
  • VLDB Endowment
VLDB '03: Very large data bases
September 9 - 12, 2003
Berlin, Germany


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics


Cited By

View all
  • (2016)A distributed selectivity-driven search strategy for semi-structured data over DHT-based networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.03.01593:C(10-29)Online publication date: 1-Jul-2016
  • (2013)Web data indexing in the cloudProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452382(41-52)Online publication date: 18-Mar-2013
  • (2013)A new tool for sharing and querying of clinical documents modeled using HL7 Version 3 standardComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2013.07.002112:3(529-552)Online publication date: 1-Dec-2013
  • (2012)FoXtrotACM Transactions on the Web10.1145/2344416.23444196:3(1-34)Online publication date: 2-Oct-2012
  • (2012)A software tool for large-scale sharing and querying of clinical documents modeled using HL7 version 3 standardProceedings of the 2nd ACM SIGHIT International Health Informatics Symposium10.1145/2110363.2110417(473-482)Online publication date: 28-Jan-2012
  • (2012)ViP2PProceedings of the 12th international conference on Web Engineering10.1007/978-3-642-31753-8_32(386-394)Online publication date: 23-Jul-2012
  • (2011)Polymorphic queries for P2P systemsInformation Systems10.1016/ publication date: 1-Jul-2011
  • (2010)Towards large-scale sharing of electronic health records of cancer patientsProceedings of the 1st ACM International Health Informatics Symposium10.1145/1882992.1883081(545-549)Online publication date: 11-Nov-2010
  • (2010)Selectivity-based XML query processing in structured peer-to-peer networksProceedings of the Fourteenth International Database Engineering & Applications Symposium10.1145/1866480.1866513(236-244)Online publication date: 16-Aug-2010
  • (2010)Load-balanced query dissemination in privacy-aware online communitiesProceedings of the 2010 ACM SIGMOD International Conference on Management of data10.1145/1807167.1807219(471-482)Online publication date: 6-Jun-2010
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media