Nothing Special   »   [go: up one dir, main page]

Skip to main content

A High Performance Implementation of the Data Space Transfer Protocol (DSTP)

  • Conference paper
  • First Online:
Large-Scale Parallel Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

Abstract

With the emergence of high performance networks, clusters of workstations can now be connected by commodity networks (metaclusters) or high speed networks (super-clusters) such as the very high speed Backbone Network Service (vBNS) or Internet2’s Abilene. Distributed clusters are enabling a new class of data mining applications in which large amounts of data can be transferred using high performance networks and statistically and numerically intensive computations can be done using clusters of workstations.

In this paper, we briefly describe a protocol called the Data Space Transfer Protocol (DSTP) for distributed data mining. With high performance networks, it becomes possible to move large amounts of data for certain queries when necessary. This paper describes the design of a high performance DSTP data server called Osiris which is designed to efficiently satisfy data requests for distributed data mining queries. In particular, we describe 1) Osiris’s ability to lay out data by row or by column, 2) a scheduler intended to handle requests using standard network links and requests using network links enjoying some type of premium service, and 3) a mechanism designed to hide latency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. S. Bailey, E. Creel, and R. L. Grossman, DataSpace: Protocols and Languages for Distributed Data Mining, National Center for Data Mining/Laboratory for Advanced Computing Technical Report, http://www.ncdm.uic.edu, 1999.

  2. S. Bailey, R. L. Grossman, Transport Layer Multiplexing with PSocket, National Center for Data Mining Technical Report, 1999.

    Google Scholar 

  3. P. Chan and H. Kargupta, editors, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, 1999, to appear.

    Google Scholar 

  4. T. G. Dietterich, Machine Learning Research: Four Current Directions, AI Magazine Volume 18, pages 97–136, 1997.

    Google Scholar 

  5. D. J. Farber, J. D. Touch, An Experiment in Latency Reduction, IEEE Infocom, Toronto, June 1994, pp. 175–181.

    Google Scholar 

  6. R. L. Grossman, H. Bodek, D. Northcutt, and H. V. Poor, Data Mining and Tree-based Optimization, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, editors, AAAI Press, Menlo Park, California, 1996, pp 323–326.

    Google Scholar 

  7. R. L. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, August 27–31, 1998, to appear.

    Google Scholar 

  8. R. L. Grossman and Y. Guo, An Overview of High Performance and Distributed Data Mining, submitted for publication.

    Google Scholar 

  9. R. L. Grossman, S. Bailey, A. Ramu and B. Malhi, P. Hallstrom, I. Pulleyn and X. Qin, The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML), Information and Software Technology, 1999.

    Google Scholar 

  10. The Terabyte Challenge: An Open, Distributed Testbed for Managing and Mining Massive Data Sets, Proceedings of the 1998 Conference on Supercomputing, IEEE.

    Google Scholar 

  11. Y. Guo, S. M. Rueger, J. Sutiwaraphun, and J. Forbes-Millott, Meta-Learnig for Parallel Data Mining, in Proceedings of the Seventh Parallel Computing Workshop, pages 1–2, 1997.

    Google Scholar 

  12. S. Gutti, A Differentiated Services Scheduler for Papyrus DSTP Servers, Master’s Thesis, University of Illinois at Chicago, 1999.

    Google Scholar 

  13. V. Jacobson, Congestion Avoidance and Control, SIGCOMM’ 88, Stanford, CA., August 1988.

    Google Scholar 

  14. V. Jacobson, and R. Braden, TCP Extensions for Long-Delay Paths, RFC-1072, LBL and USC/Information Sciences Institute, October 1988.

    Google Scholar 

  15. H. Kargupta, I. Hamzaoglu and B. Stafford, Scalable, Distributed Data Mining Using an Agent Based Architecture, in D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings the Third International Conference on the Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, pages 211–214, 1997.

    Google Scholar 

  16. R. W. Moore, C. Baru, R. Marciano, A. Rajasekar, and M. Wan, Data-Intensive Computing, Ian Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999, pages 105–129.

    Google Scholar 

  17. A. E. Raftery, D. Madigan, and J. A. Hoeting, 1996. Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association 92:179–191.

    Article  MathSciNet  Google Scholar 

  18. S. Stolfo, A. L. Prodromidis, and P. K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, 1997.

    Google Scholar 

  19. R. Subramonian and S. Parthasarathy, An Architecture for Distributed Data Mining, to appear.

    Google Scholar 

  20. B. Teitelbaum and J. Sikora (1998), Differentiated Services for Internet2, Draft Proposal, http://www.internet2.edu/qos/may98Workshop/html/diffserv.html

  21. J. D. Touch, Parallel Communication, the proceedings of Infocomm 1993, San Francisco CA. March 28–April 1, 1993.

    Google Scholar 

  22. R. L. Grossman and A. Turinsky, Optimal Strategies for Distributed Data Mining using Data and Model Partitions, submitted for publication.

    Google Scholar 

  23. Xu, L.; and M.I. Jordan, M. I. 1993. EM Learning on A Generalised Finite Mixture Model for Combining Multiple Classifiers. In Proceedings of World Congress on Neural Networks. Hillsdale, NJ: Erlbaum

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bailey, S., Creel, E., Grossman, R., Gutti, S., Sivakumar, H. (2000). A High Performance Implementation of the Data Space Transfer Protocol (DSTP). In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-46502-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67194-7

  • Online ISBN: 978-3-540-46502-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics