Abstract
Video decoding and image processing in embedded systems are subject to strong resource constraints, particularly in terms of memory. List-scheduling heuristics with static priorities (HEFT, SDC, etc.) being the oft-cited solutions due to both their good performance and their low complexity, we propose a method aimed at introducing the notion of memory into them. Moreover, we show that through adequate adjustment of task priorities and judicious resort to insertion-based policy, speedups up to 20 % can be achieved. We also show that our technique allows to prevent deadlock and to substantially reduce the required memory footprint compared to classic list-scheduling heuristics. Lastly, we propose a methodology to assess the appropriateness of dynamic scheduling in this context.
Similar content being viewed by others
Notes
The bottom level is also sometimes referred to as the upward rank.
The orders of magnitude of the latency for local and external memories are respectively 1 cycle and 100 cyles.
A token is the smallest unit of data that can be processed by a task. It is application specific; e.g. for an image-processing algorithm, it can be a line of pixels.
Thus, from a processing viewpoint, pixel lines are independent.
If a node can be pebbled more than once, then the problem is PSPACE-complete (and hence NP-Hard), but probably not in NP [10].
As a reminder, the bigger its priority value, the earlier a task is scheduled.
The sole purpose of this reference schedule is to serve as a basis to construct the self-timed schedule. Thus, the makespans resulting from it are not meaningful for our study and, as such, are not represented.
To keep the model simple, interprediction is not considered.
References
Adam, T.L., Chandy, K., Dickson, J.: Comparison of list schedules for parallel processing systems. Commun. ACM 17(12), 685–690 (1974)
Baker, T.P.: Stack-based scheduling for realtime processes. Real-Time Syst. 3(1), 84–88 (1991)
Batat, A., Feitelson, D.: Gang scheduling with memory considerations. In: Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, pp. 109–114. IEEE (2000)
Benini, L., Flamand, E., Fuin, D., Melpignano, D.: P2012: building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pp. 983–987 (2012)
Buck, J., Lee, E.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-93), vol. 1, pp. 429–432. IEEE (1993)
Canon, L.C., Jeannot, E., Sakellariou, R., Zheng, W.: Comparative evaluation of the robustness of dag scheduling heuristics. In: Gorlatch, S., Fragopoulou, P., Priol, T. (eds.) Grid Computing, pp. 73–84. Springer, US (2008). doi:10.1007/978-0-387-09457-1_7
Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., Markstein, P.W.: Register allocation via coloring. Comput. Lang. 6, 47–57 (1981)
Fradet, P., Girault, A., Poplavkoy, P.: SPDF: a schedulable parametric data-flow MoC. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, p. 769774 (2012)
Geng, T., et al.: Parallelization of computing-intensive tasks of the h.264 high profile decoding algorithm on a reconfigurable multimedia system. IEICE Trans. Inf. Syst. E93–D(12), 3223–3231 (2010)
Gilbert, J., Lengauer, T., Tarjan, R.: The pebbling problem is complete in polynomial space. SIAM J. Comput. 9(3), 513–524 (1980)
Guermouche, A., L’Excellent, J.Y.: Memory-based scheduling for a parallel multifrontal solver. In: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, p. 71. IEEE (2004)
Herrmann, J., Marchal, L., Robert, Y.: Model and complexity results for tree traversals on hybrid platforms. In: Euro-Par 2013 Parallel Processing, pp. 647–658. Springer (2013)
Herrmann, J., Marchal, L., Robert, Y.: Memory-aware list scheduling for hybrid platforms. Rapport de recherche RR-8461, INRIA (2014). http://hal.inria.fr/hal-00944336
Jian, G.A., Chu, J.C., Huang, T.Y., Chang, T.C., Guo, J.I.: A system architecture exploration on the configurable HW/SW co-design for h.264 video decoder. In: Circuits and Systems, 2009. ISCAS 2009. IEEE International Symposium on, pp. 2237–2240 (2009). doi:10.1109/ISCAS.2009.5118243
Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999). doi:10.1145/344588.344618
Lawler, E.L., Lenstra, J.K., Kan, A.R., Shmoys, D.B.: Sequencing and scheduling: algorithms and complexity. Handb. Oper. Res. Manag. Sci. 4, 445–522 (1993)
Lee, E., Messerschmitt, D.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)
Lee, E.A., Ha, S.: Scheduling strategies for multiprocessor real-time DSP. IEEE Global Telecommunications inproceedings and Exhibition 2 (1989)
Liu, J.W.: On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Trans. Math. Softw. TOMS 12(3), 249–264 (1986)
Marchal, L., Sinnen, O., Vivien, F.: Scheduling tree-shaped task graphs to minimize memory and makespan. In: Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 839–850. IEEE (2013)
Melpignano, D., Benini, L., Flamand, E., Jego, B., Lepley, T., Haugou, G., Clermidy, F., Dutoit, D.: Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In: Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pp. 1137–1142 (2012)
Saponara, S., et al.: Performance and complexity co-evaluation of the advanced video coding standard for cost-effective multimedia communications. EURASIP J. Appl. Signal Process. 2004(2), 220–235 (2004)
Sethi, R.: Complete register allocation problems. SIAM J. Comput. 4(3), 226–248 (1975)
Shi, Z., Dongarra, J.J.: Scheduling workflow applications on processors with different capabilities. Futur. Gener. Comput. Syst. 22(6), 665–675 (2006). doi:10.1016/j.future.2005.11.002
Sih, G.C., Lee, E.A.: Compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)
Sullivan, G., Ohm, J.R.: Recent developments in standardization of high efficiency video coding (HEVC). In: Proceedings of SPIE: The International Society for Optical Engineering, vol. 7798 (2010)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Task scheduling algorithms for heterogeneous processors. In: 8th IEEE Heterogeneous Computing Workshop (HCW’99), pp. 3–14. San Juan, Puerto Rico (1999)
Wang, S.H., et al.: A software-hardware co-implementation of mpeg-4 advanced video coding (AVC) decoder with block level pipelining. J. VLSI Signal Process. Syst. Signal Image Video Technol. 41(1), 93–110 (2005)
Wiegand, T., et al.: Overview of the h.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arras, PA., Fuin, D., Jeannot, E. et al. List Scheduling in Embedded Systems Under Memory Constraints. Int J Parallel Prog 43, 1103–1128 (2015). https://doi.org/10.1007/s10766-014-0338-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-014-0338-1