Abstract
With the emergence of high performance networks, clusters of workstations can now be connected by commodity networks (metaclusters) or high speed networks (super-clusters) such as the very high speed Backbone Network Service (vBNS) or Internet2’s Abilene. Distributed clusters are enabling a new class of data mining applications in which large amounts of data can be transferred using high performance networks and statistically and numerically intensive computations can be done using clusters of workstations.
In this paper, we briefly describe a protocol called the Data Space Transfer Protocol (DSTP) for distributed data mining. With high performance networks, it becomes possible to move large amounts of data for certain queries when necessary. This paper describes the design of a high performance DSTP data server called Osiris which is designed to efficiently satisfy data requests for distributed data mining queries. In particular, we describe 1) Osiris’s ability to lay out data by row or by column, 2) a scheduler intended to handle requests using standard network links and requests using network links enjoying some type of premium service, and 3) a mechanism designed to hide latency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Bailey, E. Creel, and R. L. Grossman, DataSpace: Protocols and Languages for Distributed Data Mining, National Center for Data Mining/Laboratory for Advanced Computing Technical Report, http://www.ncdm.uic.edu, 1999.
S. Bailey, R. L. Grossman, Transport Layer Multiplexing with PSocket, National Center for Data Mining Technical Report, 1999.
P. Chan and H. Kargupta, editors, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, 1999, to appear.
T. G. Dietterich, Machine Learning Research: Four Current Directions, AI Magazine Volume 18, pages 97–136, 1997.
D. J. Farber, J. D. Touch, An Experiment in Latency Reduction, IEEE Infocom, Toronto, June 1994, pp. 175–181.
R. L. Grossman, H. Bodek, D. Northcutt, and H. V. Poor, Data Mining and Tree-based Optimization, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, editors, AAAI Press, Menlo Park, California, 1996, pp 323–326.
R. L. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the Workshop on Distributed Data Mining, The Fourth International Conference on Knowledge Discovery and Data Mining New York City, August 27–31, 1998, to appear.
R. L. Grossman and Y. Guo, An Overview of High Performance and Distributed Data Mining, submitted for publication.
R. L. Grossman, S. Bailey, A. Ramu and B. Malhi, P. Hallstrom, I. Pulleyn and X. Qin, The Management and Mining of Multiple Predictive Models Using the Predictive Modeling Markup Language (PMML), Information and Software Technology, 1999.
The Terabyte Challenge: An Open, Distributed Testbed for Managing and Mining Massive Data Sets, Proceedings of the 1998 Conference on Supercomputing, IEEE.
Y. Guo, S. M. Rueger, J. Sutiwaraphun, and J. Forbes-Millott, Meta-Learnig for Parallel Data Mining, in Proceedings of the Seventh Parallel Computing Workshop, pages 1–2, 1997.
S. Gutti, A Differentiated Services Scheduler for Papyrus DSTP Servers, Master’s Thesis, University of Illinois at Chicago, 1999.
V. Jacobson, Congestion Avoidance and Control, SIGCOMM’ 88, Stanford, CA., August 1988.
V. Jacobson, and R. Braden, TCP Extensions for Long-Delay Paths, RFC-1072, LBL and USC/Information Sciences Institute, October 1988.
H. Kargupta, I. Hamzaoglu and B. Stafford, Scalable, Distributed Data Mining Using an Agent Based Architecture, in D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy, editors, Proceedings the Third International Conference on the Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, pages 211–214, 1997.
R. W. Moore, C. Baru, R. Marciano, A. Rajasekar, and M. Wan, Data-Intensive Computing, Ian Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999, pages 105–129.
A. E. Raftery, D. Madigan, and J. A. Hoeting, 1996. Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association 92:179–191.
S. Stolfo, A. L. Prodromidis, and P. K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California, 1997.
R. Subramonian and S. Parthasarathy, An Architecture for Distributed Data Mining, to appear.
B. Teitelbaum and J. Sikora (1998), Differentiated Services for Internet2, Draft Proposal, http://www.internet2.edu/qos/may98Workshop/html/diffserv.html
J. D. Touch, Parallel Communication, the proceedings of Infocomm 1993, San Francisco CA. March 28–April 1, 1993.
R. L. Grossman and A. Turinsky, Optimal Strategies for Distributed Data Mining using Data and Model Partitions, submitted for publication.
Xu, L.; and M.I. Jordan, M. I. 1993. EM Learning on A Generalised Finite Mixture Model for Combining Multiple Classifiers. In Proceedings of World Congress on Neural Networks. Hillsdale, NJ: Erlbaum
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bailey, S., Creel, E., Grossman, R., Gutti, S., Sivakumar, H. (2000). A High Performance Implementation of the Data Space Transfer Protocol (DSTP). In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_3
Download citation
DOI: https://doi.org/10.1007/3-540-46502-2_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive