Abstract
Irregular and iterative I/O-intensive jobs need a different approach from parallel job schedulers. The focus in this case is not only the processing requirements anymore: memory, network and storage capacity must all be considered in making a scheduling decision. Job executions are irregular and data dependent, alternating between CPU-bound and I/O-bound phases. In this paper, we propose and implement a parallel job scheduling strategy for such jobs, called AnthillSched, based on a simple heuristic: we map the behavior of a parallel application with minimal resources as we vary its input parameters. From that mapping we infer the best scheduling for a certain set of input parameters given the available resources. To test and verify AnthillSched we used logs obtained from a real system executing data mining jobs. Our main contributions are the implementation of a parallel job scheduling strategy in a real system and the performance analysis of AnthillSched, which allowed us to discard some other scheduling alternatives considered previously.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Utgoff, P., Brodley, C.: An incremental method for finding multivariate splits for decision trees. In: Proceedings of the Seventh International Conference on Machine Learning. Morgan Kaufman, San Francisco (1990)
Veloso, A., Meira, W., Ferreira, R., Guedes, D., Parthasarathy, S.: Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 422–433. Springer, Heidelberg (2004)
Beynon, C.M., Ferreira, R., Kurc, T., Sussmany, A., Saltz, J.: Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In: Proceedings of the IEEE Mass Storage Systems Symposium (2000)
Nascimento, L.T., Ferreira, R.: LPSched — dataflow application scheduling in grids. Master’s thesis, Federal University of Minas Gerais (2004) (in Portuguese)
Neto, E.S., Cirne, W., Brasileiro, F., Lima, A.: Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 210–232. Springer, Heidelberg (2005)
Beaumont, O., Boudet, V., Robert, Y.: A realistic model and an efficient heuristic for scheduling with heterogeneous processors. In: Proceedings of the IEEE Heterogeneous Computing Workshop (2002)
Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999)
Feitelson, D., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)
Feitelson, D., Rudolph, L.: Evaluation of design choices for gang scheduling using distributed hierarchical control. Journal of Parallel and Distributed Computing, 18–34 (1996)
Feitelson, D.: A survey of scheduling in multiprogrammed parallel systems research. Technical Report Report RC 19790, IBM T. J. Watson Research Center (1997)
Franke, H., Jann, J., Moreira, J., Pattnaik, P., Jette, M.: An evaluation of parallel job scheduling for ASCI Blue-Pacific. In: Proceedings of the ACM/IEEE Conference on Supercomputing (1999)
Frachtenberg, E., Feitelson, D., Petrini, F., Fernandez, J.: Flexible CoScheduling: Mitigating load imbalance and improving utilization of heterogeneous resources. In: Guo, M. (ed.) ISPA 2003. LNCS, vol. 2745. Springer, Heidelberg (2003)
Góes, L.F.W., Martins, C.A.P.S.: Proposal and development of a reconfigurable parallel job scheduling algorithm. Master’s thesis, Pontific Catholic University of Minas Gerais (2004) (in Portuguese)
Góes, L.F.W., Martins, C.A.P.S.: Reconfigurable gang scheduling algorithm. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 81–101. Springer, Heidelberg (2004)
Streit, A.: A self-tuning job scheduler family with dynamic policy switching. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 1–23. Springer, Heidelberg (2002)
Zhang, Y., Franke, H., Moreira, E.J., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (2000)
Zhou, B.B., Brent, R.P.: Gang scheduling with a queue for large jobs. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium (2001)
Andrade, N., Cirne, W., Brasileiro, F., Roisenberg, P.: Ourgrid: An approach to easily assemble grids with equitable resource sharing. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 61–86. Springer, Heidelberg (2003)
Batat, A., Feitelson, D.: Gang scheduling with memory considerations. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pp. 109–114 (2000)
Silva, F.A.B., Hruschka, S.C.E.R.: A scheduling algorithm for running bag-of-tasks data mining applications on the grid. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 254–262. Springer, Heidelberg (2004)
Wiseman, Y., Feitelson, D.: Paired gang scheduling. IEEE Transactions Parallel and Distributed Systems, 581–592 (2003)
Zhang, Y., Yang, A., Sivasubramaniam, A., Moreira, E.J.: Gang scheduling extensions for I/O intensive workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 183–207. Springer, Heidelberg (2003)
Fonseca, R., Meira, W., Guedes, D., Drummond, L.: Anthill: A scalable run-time environment for data mining applications. In: Proceedings of the 17th Symposium on Computer Architecture and High-Performance Computing (SBAC-PAD), SBC (2005)
Acharya, A., Uysal, M., Saltz, J.: Active disks: Programming model, algorithms and evaluation. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), pp. 81–91 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Góes, L.F. et al. (2005). AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2005. Lecture Notes in Computer Science, vol 3834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605300_5
Download citation
DOI: https://doi.org/10.1007/11605300_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31024-2
Online ISBN: 978-3-540-31617-6
eBook Packages: Computer ScienceComputer Science (R0)