A High Performance Implementation of the Data Space Transfer Protocol (DSTP)

Stuart Bailey³,
Emory Creel³,
Robert Grossman^3,4,
Srinath Gutti³ &
…
Harinath Sivakumar³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

712 Accesses
7 Citations

Abstract

With the emergence of high performance networks, clusters of workstations can now be connected by commodity networks (metaclusters) or high speed networks (super-clusters) such as the very high speed Backbone Network Service (vBNS) or Internet2’s Abilene. Distributed clusters are enabling a new class of data mining applications in which large amounts of data can be transferred using high performance networks and statistically and numerically intensive computations can be done using clusters of workstations.

In this paper, we briefly describe a protocol called the Data Space Transfer Protocol (DSTP) for distributed data mining. With high performance networks, it becomes possible to move large amounts of data for certain queries when necessary. This paper describes the design of a high performance DSTP data server called Osiris which is designed to efficiently satisfy data requests for distributed data mining queries. In particular, we describe 1) Osiris’s ability to lay out data by row or by column, 2) a scheduler intended to handle requests using standard network links and requests using network links enjoying some type of premium service, and 3) a mechanism designed to hide latency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Scalable Data Management on Modern Networks

Article 11 October 2018

I/O and File Systems for Data-Intensive Applications

Delta: Data Reduction for Integrated Application Workflows and Data Storage

References

S. Bailey, E. Creel, and R. L. Grossman, DataSpace: Protocols and Languages for Distributed Data Mining, National Center for Data Mining/Laboratory for Advanced Computing Technical Report, http://www.ncdm.uic.edu, 1999.
S. Bailey, R. L. Grossman, Transport Layer Multiplexing with PSocket, National Center for Data Mining Technical Report, 1999.
Google Scholar
P. Chan and H. Kargupta, editors, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, 1999, to appear.
Google Scholar
T. G. Dietterich, Machine Learning Research: Four Current Directions, AI Magazine Volume 18, pages 97–136, 1997.
Google Scholar
D. J. Farber, J. D. Touch, An Experiment in Latency Reduction, IEEE Infocom, Toronto, June 1994, pp. 175–181.
Google Scholar
R. L. Grossman, H. Bodek, D. Northcutt, and H. V. Poor, Data Mining and Tree-based Optimization, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, editors, AAAI Press, Menlo Park, California, 1996, pp 323–326.
Google Scholar
R. L. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, August 27–31, 1998, to appear.
Google Scholar
R. L. Grossman and Y. Guo, An Overview of High Performance and Distributed Data Mining, submitted for publication.
Google Scholar
R. L. Grossman, S. Bailey, A. Ramu and B. Malhi, P. Hallstrom, I. Pulleyn and X. Qin, The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML), Information and Software Technology, 1999.
Google Scholar
The Terabyte Challenge: An Open, Distributed Testbed for Managing and Mining Massive Data Sets, Proceedings of the 1998 Conference on Supercomputing, IEEE.
Google Scholar
Y. Guo, S. M. Rueger, J. Sutiwaraphun, and J. Forbes-Millott, Meta-Learnig for Parallel Data Mining, in Proceedings of the Seventh Parallel Computing Workshop, pages 1–2, 1997.
Google Scholar
S. Gutti, A Differentiated Services Scheduler for Papyrus DSTP Servers, Master’s Thesis, University of Illinois at Chicago, 1999.
Google Scholar
V. Jacobson, Congestion Avoidance and Control, SIGCOMM’ 88, Stanford, CA., August 1988.
Google Scholar
V. Jacobson, and R. Braden, TCP Extensions for Long-Delay Paths, RFC-1072, LBL and USC/Information Sciences Institute, October 1988.
Google Scholar
H. Kargupta, I. Hamzaoglu and B. Stafford, Scalable, Distributed Data Mining Using an Agent Based Architecture, in D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings the Third International Conference on the Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, pages 211–214, 1997.
Google Scholar
R. W. Moore, C. Baru, R. Marciano, A. Rajasekar, and M. Wan, Data-Intensive Computing, Ian Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999, pages 105–129.
Google Scholar
A. E. Raftery, D. Madigan, and J. A. Hoeting, 1996. Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association 92:179–191.
Article MathSciNet Google Scholar
S. Stolfo, A. L. Prodromidis, and P. K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, 1997.
Google Scholar
R. Subramonian and S. Parthasarathy, An Architecture for Distributed Data Mining, to appear.
Google Scholar
B. Teitelbaum and J. Sikora (1998), Differentiated Services for Internet2, Draft Proposal, http://www.internet2.edu/qos/may98Workshop/html/diffserv.html
J. D. Touch, Parallel Communication, the proceedings of Infocomm 1993, San Francisco CA. March 28–April 1, 1993.
Google Scholar
R. L. Grossman and A. Turinsky, Optimal Strategies for Distributed Data Mining using Data and Model Partitions, submitted for publication.
Google Scholar
Xu, L.; and M.I. Jordan, M. I. 1993. EM Learning on A Generalised Finite Mixture Model for Combining Multiple Classifiers. In Proceedings of World Congress on Neural Networks. Hillsdale, NJ: Erlbaum
Google Scholar

Download references

Author information

Authors and Affiliations

National Center for Data Mining, University of Illinois at Chicago, Chicago, IL, 60607, USA
Stuart Bailey, Emory Creel, Robert Grossman, Srinath Gutti & Harinath Sivakumar
Magnify, Inc., 100 South Wacker Drive, Suite 1130, Chicago, IL, 60606, USA
Robert Grossman

Authors

Stuart Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Emory Creel
View author publications
You can also search for this author in PubMed Google Scholar
Robert Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Srinath Gutti
View author publications
You can also search for this author in PubMed Google Scholar
Harinath Sivakumar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Mohammed J. Zaki
K55/B1, IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120, USA
Ching-Tien Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bailey, S., Creel, E., Grossman, R., Gutti, S., Sivakumar, H. (2000). A High Performance Implementation of the Data Space Transfer Protocol (DSTP). In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_3

Download citation

DOI: https://doi.org/10.1007/3-540-46502-2_3
Published: 17 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A High Performance Implementation of the Data Space Transfer Protocol (DSTP)

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Scalable Data Management on Modern Networks

I/O and File Systems for Data-Intensive Applications

Delta: Data Reduction for Integrated Application Workflows and Data Storage

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A High Performance Implementation of the Data Space Transfer Protocol (DSTP)

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Scalable Data Management on Modern Networks

I/O and File Systems for Data-Intensive Applications

Delta: Data Reduction for Integrated Application Workflows and Data Storage

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation