Abstract
A Cellular Automaton is a bio-inspired discrete model of computation with multiple applications, consisting of a regular grid of cells that have different states along time. Our research group at the University of Seville (Spain) collaborated in the past with Prof. Mohammad S. Obaidat in the fusion of Cellular Automata (CA) with another bio-inspired approach, the Address-Event-Representation (AER) neuromorphic communication protocol, for implementing a vision filter based on 3\(\,\times \,\)3 convolutions [22]. In the last years, our group has continued working on the optimization of CA implementations in current high-performance computational systems. In this chapter, several of these optimizations will be described, focusing on those especially aimed at current processors, including hardware alternatives (e.g. GPUs, BTBs, etc.), different forms of parallelism such as instruction-level parallelism, thread-level parallelism, data-level parallelism, as well as software approaches (such as ‘if-else’ statement elimination, loop unrolling, data pipelining and blocking, etc.). The effect of these optimizations will be qualitatively illustrated by means of the Roofline model, considering simple CAs such as the well-known Game-of-Life (GOL), which has been extensively used to explore CA characteristics. This CA was invented by John Horton Conway, an English mathematician that recently died of complications from COVID-19, so we want this case-study to serve also as a tribute to him.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hoekstra, A.G., Kroc, J., Sloot, P. (eds.): Simulating Complex Systems by Cellular Automata. Springer, Berlin, Heidelberg (2010)
Bajzát, T., Hajnal, E.: Cell automaton modelling algorithms: implementation and testing in GPU systems. In: INES 2011, 15th International Conference on Intelligent Engineering Systems (2011)
Balasalle, J., Lopez, M., Rutherford, M.: Optimizing Memory Access Patterns for Cellular Automata on GPUs, pp. 67–75. Elsevier–Morgan Kaufmann–NVIDIA (2011)
Bandman, O.: Using multi core computers for implementing cellular automata systems. Lect.ure Notes Comput. Sci. 6873(1), 140–151 (2011)
Cagigas-Muñiz, D., Diaz-del Rio, F., López-Torres, M., Jiménez-Morales, F., Guisado, J.L.: Developing efficient discrete simulations on multicore and GPU architectures. Electronics 9, 189 (2020). https://doi.org/10.3390/electronics9010189
Cattaneo, R., Natale, G., Sicignano, C., Sciuto, D., Santambrogio, M.D.: On how to accelerate iterative stencil loops: a scalable streaming-based approach. ACM Trans. Archit. Code Optim. 12(4), 1–26 (2015)
Chopard, B., Droz, M.: Cellular Automata Modeling of Physical Systems. Cambridge University Press, Cambridge, MA, USA (1998)
Duesterwald, E., Gupta, R., Soffa, M.L.: Register pipelining: an integrated approach to register allocation for scalar and subscripted variables. In: Kastens, U., Pfahler, P. (eds.) Compiler Construction, pp. 192–206. Springer, Berlin, Heidelberg (1992)
Berlekamp, E.R., Conway, J.H., Guy, R.K.: Winning Ways for your Mathematical Plays, 2nd edn. A K Peters/CRC Press, New York, USA (2001)
Gardner, M.: Mathematical games: the fantastic combinations of John Conway’s new solitaire game & “Life’’. Sci. Am. 223(4), 120–123 (1970)
Gibson, M.J., Keedwell, E.C., Savić, D.A.: An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware. J. Parallel Distrib. Comput. 77, 11–25 (2015)
Guisado, J., Jiménez-Morales, F., Fernández-de Vega, F.: Cellular automata and cluster computing: an application to the simulation of laser dynamics. Adv. Complex Syst. 10(Suppl. 1), 167–190 (2007)
Hennessy, J.L., Patterson, D.A.: Computer Architecture, Sixth Edition: A Quantitative Approach, 6th edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2017)
Hwu, W.m.: GPU Computing Gems Jade Edition, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)
Ilachinski, A.: Cellular Automata: A Discrete Universe. World Scientific, Singapore (2001)
Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014). https://doi.org/10.1109/L-CA.2013.6
Intel: Intel intrinsics guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Kari, J.: Theory of cellular automata: a survey. Theor. Comput. Sci. 334(1–3), 3–33 (2005)
Kirk, D.B., Hwu, W.m.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers, Burlington, MA (2010)
Koskela, T., Matveev, Z., Yang, C., Adedoyin, A., Belenov, R., Thierry, P., Zhao, Z., Gayatri, R., Shan, H., Oliker, L., Deslippe, J., Green, R., Williams, S.: A novel multi-level integrated roofline model approach for performance characterization. In: Yokota, R., Weiland, M., Keyes, D., Trinitis, C. (eds.) High Performance Computing, pp. 226–245. Springer, Cham (2018)
Li, Z., Song, Y.: Automatic tiling of iterative stencil loops. ACM Trans. Progr. Lang. Syst. 26(6), 975–1028 (2004)
Linares-Barranco, A., Sevillano, J., Obaidat, M.S.: AER filtering using glider: VHDL cellular automata description. In: 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 614–617 (2008)
Lopez-Torres, M., Guisado, J., Jimenez-Morales, F., Diaz-del Rio, F.: GPU-based cellular automata simulations of laser dynamics. In: Proceedings of the XXIII Jornadas de Paralelismo: Jornadas SARTECO 2012, pp. 261–266. SARTECO, Elche (2012). http://www.jornadassarteco.org/js2012/papers/paper_151.pdf
Matsumura, K., Zohouri, H., Wahib, M., Endo, T., Matsuoka, S.: AN5D: automated stencil framework for high-degree temporal blocking on GPUS. In: International Symposium on Code Generation and Optimization, pp. 199–211 (2020). https://doi.org/10.1145/3368826.3377904
Millñin, E., Martínez, P., Gil Costa, G., Piccoli, M., Printista, A., Bederian, C., García Garino, C., Bringa, E.: Parallel implementation of a cellular automata in a hybrid CPU/GPU environment. In: XVIII Congreso Argentino de Ciencias de la Computación, pp. 184–193 (2013)
von Neumann, J.: Theory of Self-reproducing Automata. University of Illinois Press, Urbana (1966)
Nguyen, A.D., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUS and GPUS . In: SC, pp. 1–13. IEEE (2010)
Oxman, G., Weiss, S., Be’ery, Y.: Computational methods for Conway’s Game of Life cellular automaton. J. Comput. Sci. 5(1), 24–31 (2014)
Bryant, R.E., O’Hallaron, D.R.: Computer Systems: A Programmer’s Perspective, 3rd edn. Pearson, London, UK (2016)
Rybacki, S., Himmelspach, J., Uhrmacher, A.: CPU and GPU based simulation of cellular automata—a performance comparison. In: Proceedings of the 1st SIMUL, pp. 62–67 (2009)
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI ’99, pp. 215–228. Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/301618.301668
Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, pp. 207–216. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2751205.2751240
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Yang, C., Kurth, T., Williams, S.: Hierarchical Roofline analysis for GPUS: accelerating performance optimization for the NERSC-9 Perlmutter system. Concurr. Comput. 32(20), 1–12 (2020)
Acknowledgements
This research was funded by the following research project of Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO) and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU): MABICAP (Bio-inspired machines on High Performance Computing platforms: a multidisciplinary approach, TIN2017-89842P), Par-HoT (Parallel Data Processing based on Homotopy Connectivity: Applications to Stereoscopic Vision and Biomedical Data, PID2019-110455GB-I00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Diaz-del-Rio, F., Cagigas-Muñiz, D., Guisado-Lizar, J.L., Sevillano-Ramos, J.L. (2022). Efficient Parallel Implementation of Cellular Automata and Stencil Computations in Current Processors. In: Nicopolitidis, P., Misra, S., Yang, L.T., Zeigler, B., Ning, Z. (eds) Advances in Computing, Informatics, Networking and Cybersecurity. Lecture Notes in Networks and Systems, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-030-87049-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-87049-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87048-5
Online ISBN: 978-3-030-87049-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)