Data management and migration are important research challenges of novel Cloud environments. While moving data among different geographical domains, it is important to lower the transmission cost for performance purposes. Efficient scheduling methods allow us to manage data transmissions with lower number of steps and shorter transmission time. In previous research efforts, several methods have been proposed in literature in order to manage data and minimize transmission cost for the case of Single Cluster environments. Unfortunately, these methods are not suitable to large-scale and complicated environments such as Clouds, with particular regard to the case of scheduling policies. Starting from these motivations, in this paper we propose an efficient data transmission method for data-intensive scientific applications over Clouds, called Cloud Adaptive Dispatching (CAD). This method adapts to specialized characteristics of Cloud systems and successfully shortens the transmission cost, while also avoiding node contention during moving data from sites to sites. We conduct an extensive campaign of experiments focused to test the effective performance of CAD. Results clearly demonstrate the improvements offered by CAD in supporting data transmissions across Clouds for data-intensive scientific applications.
References
[1]
de Assuncao, M.D., di Costanzo, A., Buyya, R.: Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 141-150 (June 2009).
Baptiste, P., Brucker, P., Chrobak, M., Dürr, C., Kravchenko, S.A., Sourd, F.: The Complexity of Mean Flow Time Scheduling Problems with Release Times. Journal of Scheduling 10(2), 139-146 (2007).
Brucker, P., Kravchenko, S.A.: Scheduling Jobs with Equal Processing Times and Time Windows on Identical Parallel Machines. Journal of Scheduling 11(4), 229-237 (2008).
Castillo, C., Rouskas, G.N., Harfoush, K.: Efficient Resource Management Using Advance Reservations for Heterogeneous Grids. In: Proceedings of 21st IEEE International Parallel and Distributed Processing, pp. 1-12 (April 2008).
Cohen, J., Jeannot, E., Padoy, N., Wagner, F.c.: Messages Scheduling for Parallel Data Redistribution between Clusters. IEEE Transactions on Parallel and Distributed Systems 17(10), 1163-1175 (2006).
Guo, M., Pan, Y., Liu, Z.: Symbolic Communication Set Generation for Irregular Parallel Applications. The Journal of Supercomputing 25(3), 199-214 (2003).
Hsu, C.-H., Chen, M.-H., Yang, C.-T., Li, K.-C.: Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers. IEEE Transactions on Parallel and Distributed Systems 17(11), 1226-1241 (2006).
Huang, J.-W., Chu, C.-P.: A Flexible Processor Mapping Technique toward Data Localization for Block-Cyclic Data Redistribution. The Journal of Supercomputing 45(2), 151-172 (2008).
Jeannot, E., Wagner, F.: Scheduling Messages for Data Redistribution: An Experimental Study. The International Journal of High Performance Computing Applications 20(4), 443-454 (2006).
Kalpakis, K., Dasgupta, K., Wolfson, O.: Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs. IEEE Transactions on Parallel and Distributed Systems 12(6), 628-637 (2001).
Karwande, A., Yuan, X., Lowenthal, D.K.: An MPI Prototype for Compiled Communication on Ethernet Switched Clusters. Journal of Parallel and Distributed Computing 65(10), 1123-1133 (2005).
Liu, H., Orban, D.: GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008, pp. 295-304 (2008).
Prylli, L., Touranchean, B.: Fast Runtime Block Cyclic Data Redistribution on Multiprocessors. Journal of Parallel and Distributed Computing 45(1), 63-72 (1997).
Rauber, T., Rünger, G.: A Data Re-Distribution Library for Multi-Processor Task Programming. International Journal of Foundations of Computer Science 17(2), 251-270 (2006).
Sudarsan, R., Ribbens, C.J.: Efficient Multidimensional Data Redistribution for Resizable Parallel Computations. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds.) ISPA 2007. LNCS, vol. 4742, pp. 182-194. Springer, Heidelberg (2007).
Tu, M., Li, P., Ma, Q., Yen, I.-L., Bastani, F.B.: On the Optimal Placement of Secure Data Objects over Internet. In: Proceedings of 19th IEEE International Parallel and Distributed Processing, pp. 14-14 (April 2005).
Wang, H., Guo, M., Wei, D.: Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers. The Journal of Supercomputing 29(2), 157-170 (2004).
Wang, H., Guo, M., Wei, D.: Message Scheduling for Irregular Data Redistribution in Parallelizing Compilers. IEICE Transactions on Information and Systems E89-D(2), 418- 424 (2006).
Wu, J.-J., Lin, Y.-F., Liu, P.: Optimal Replica Placement in Hierarchical Data Grids with Locality Assurance. Journal of Parallel and Distributed Computing 68(12), 1517-1538 (2008).
Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D., Jin, H.: An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows. In: Proceedings of the 4th IEEE International Conference on eScience, pp. 374-375 (December 2008).