Efficient Parallel Implementation of Cellular Automata and Stencil Computations in Current Processors

Fernando Diaz-del-Rio¹⁴,
Daniel Cagigas-Muñiz¹⁴,
Jose Luis Guisado-Lizar¹⁴ &
…
Jose Luis Sevillano-Ramos¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 289))

Abstract

A Cellular Automaton is a bio-inspired discrete model of computation with multiple applications, consisting of a regular grid of cells that have different states along time. Our research group at the University of Seville (Spain) collaborated in the past with Prof. Mohammad S. Obaidat in the fusion of Cellular Automata (CA) with another bio-inspired approach, the Address-Event-Representation (AER) neuromorphic communication protocol, for implementing a vision filter based on 3$\,\times \,$3 convolutions [22]. In the last years, our group has continued working on the optimization of CA implementations in current high-performance computational systems. In this chapter, several of these optimizations will be described, focusing on those especially aimed at current processors, including hardware alternatives (e.g. GPUs, BTBs, etc.), different forms of parallelism such as instruction-level parallelism, thread-level parallelism, data-level parallelism, as well as software approaches (such as ‘if-else’ statement elimination, loop unrolling, data pipelining and blocking, etc.). The effect of these optimizations will be qualitatively illustrated by means of the Roofline model, considering simple CAs such as the well-known Game-of-Life (GOL), which has been extensively used to explore CA characteristics. This CA was invented by John Horton Conway, an English mathematician that recently died of complications from COVID-19, so we want this case-study to serve also as a tribute to him.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bio-inspired heterogeneous architecture for real-time pedestrian detection applications

Article 29 March 2016

Differences Between 2D Neighborhoods According to Real Time Computation

Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices

References

Hoekstra, A.G., Kroc, J., Sloot, P. (eds.): Simulating Complex Systems by Cellular Automata. Springer, Berlin, Heidelberg (2010)
MATH Google Scholar
Bajzát, T., Hajnal, E.: Cell automaton modelling algorithms: implementation and testing in GPU systems. In: INES 2011, 15th International Conference on Intelligent Engineering Systems (2011)
Google Scholar
Balasalle, J., Lopez, M., Rutherford, M.: Optimizing Memory Access Patterns for Cellular Automata on GPUs, pp. 67–75. Elsevier–Morgan Kaufmann–NVIDIA (2011)
Google Scholar
Bandman, O.: Using multi core computers for implementing cellular automata systems. Lect.ure Notes Comput. Sci. 6873(1), 140–151 (2011)
Google Scholar
Cagigas-Muñiz, D., Diaz-del Rio, F., López-Torres, M., Jiménez-Morales, F., Guisado, J.L.: Developing efficient discrete simulations on multicore and GPU architectures. Electronics 9, 189 (2020). https://doi.org/10.3390/electronics9010189
Article Google Scholar
Cattaneo, R., Natale, G., Sicignano, C., Sciuto, D., Santambrogio, M.D.: On how to accelerate iterative stencil loops: a scalable streaming-based approach. ACM Trans. Archit. Code Optim. 12(4), 1–26 (2015)
Article Google Scholar
Chopard, B., Droz, M.: Cellular Automata Modeling of Physical Systems. Cambridge University Press, Cambridge, MA, USA (1998)
Book Google Scholar
Duesterwald, E., Gupta, R., Soffa, M.L.: Register pipelining: an integrated approach to register allocation for scalar and subscripted variables. In: Kastens, U., Pfahler, P. (eds.) Compiler Construction, pp. 192–206. Springer, Berlin, Heidelberg (1992)
Chapter Google Scholar
Berlekamp, E.R., Conway, J.H., Guy, R.K.: Winning Ways for your Mathematical Plays, 2nd edn. A K Peters/CRC Press, New York, USA (2001)
MATH Google Scholar
Gardner, M.: Mathematical games: the fantastic combinations of John Conway’s new solitaire game & “Life’’. Sci. Am. 223(4), 120–123 (1970)
Article Google Scholar
Gibson, M.J., Keedwell, E.C., Savić, D.A.: An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware. J. Parallel Distrib. Comput. 77, 11–25 (2015)
Article Google Scholar
Guisado, J., Jiménez-Morales, F., Fernández-de Vega, F.: Cellular automata and cluster computing: an application to the simulation of laser dynamics. Adv. Complex Syst. 10(Suppl. 1), 167–190 (2007)
Article Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture, Sixth Edition: A Quantitative Approach, 6th edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2017)
Google Scholar
Hwu, W.m.: GPU Computing Gems Jade Edition, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)
Google Scholar
Ilachinski, A.: Cellular Automata: A Discrete Universe. World Scientific, Singapore (2001)
Book Google Scholar
Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014). https://doi.org/10.1109/L-CA.2013.6
Article Google Scholar
Intel: Intel intrinsics guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Kari, J.: Theory of cellular automata: a survey. Theor. Comput. Sci. 334(1–3), 3–33 (2005)
Article MathSciNet Google Scholar
Kirk, D.B., Hwu, W.m.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers, Burlington, MA (2010)
Google Scholar
Koskela, T., Matveev, Z., Yang, C., Adedoyin, A., Belenov, R., Thierry, P., Zhao, Z., Gayatri, R., Shan, H., Oliker, L., Deslippe, J., Green, R., Williams, S.: A novel multi-level integrated roofline model approach for performance characterization. In: Yokota, R., Weiland, M., Keyes, D., Trinitis, C. (eds.) High Performance Computing, pp. 226–245. Springer, Cham (2018)
Chapter Google Scholar
Li, Z., Song, Y.: Automatic tiling of iterative stencil loops. ACM Trans. Progr. Lang. Syst. 26(6), 975–1028 (2004)
Article Google Scholar
Linares-Barranco, A., Sevillano, J., Obaidat, M.S.: AER filtering using glider: VHDL cellular automata description. In: 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 614–617 (2008)
Google Scholar
Lopez-Torres, M., Guisado, J., Jimenez-Morales, F., Diaz-del Rio, F.: GPU-based cellular automata simulations of laser dynamics. In: Proceedings of the XXIII Jornadas de Paralelismo: Jornadas SARTECO 2012, pp. 261–266. SARTECO, Elche (2012). http://www.jornadassarteco.org/js2012/papers/paper_151.pdf
Matsumura, K., Zohouri, H., Wahib, M., Endo, T., Matsuoka, S.: AN5D: automated stencil framework for high-degree temporal blocking on GPUS. In: International Symposium on Code Generation and Optimization, pp. 199–211 (2020). https://doi.org/10.1145/3368826.3377904
Millñin, E., Martínez, P., Gil Costa, G., Piccoli, M., Printista, A., Bederian, C., García Garino, C., Bringa, E.: Parallel implementation of a cellular automata in a hybrid CPU/GPU environment. In: XVIII Congreso Argentino de Ciencias de la Computación, pp. 184–193 (2013)
Google Scholar
von Neumann, J.: Theory of Self-reproducing Automata. University of Illinois Press, Urbana (1966)
Google Scholar
Nguyen, A.D., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUS and GPUS . In: SC, pp. 1–13. IEEE (2010)
Google Scholar
Oxman, G., Weiss, S., Be’ery, Y.: Computational methods for Conway’s Game of Life cellular automaton. J. Comput. Sci. 5(1), 24–31 (2014)
Article MathSciNet Google Scholar
Bryant, R.E., O’Hallaron, D.R.: Computer Systems: A Programmer’s Perspective, 3rd edn. Pearson, London, UK (2016)
Google Scholar
Rybacki, S., Himmelspach, J., Uhrmacher, A.: CPU and GPU based simulation of cellular automata—a performance comparison. In: Proceedings of the 1st SIMUL, pp. 62–67 (2009)
Google Scholar
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. In: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI ’99, pp. 215–228. Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/301618.301668
Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS ’15, pp. 207–216. Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2751205.2751240
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Yang, C., Kurth, T., Williams, S.: Hierarchical Roofline analysis for GPUS: accelerating performance optimization for the NERSC-9 Perlmutter system. Concurr. Comput. 32(20), 1–12 (2020)
Article Google Scholar

Download references

Acknowledgements

This research was funded by the following research project of Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO) and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU): MABICAP (Bio-inspired machines on High Performance Computing platforms: a multidisciplinary approach, TIN2017-89842P), Par-HoT (Parallel Data Processing based on Homotopy Connectivity: Applications to Stereoscopic Vision and Biomedical Data, PID2019-110455GB-I00).

Author information

Authors and Affiliations

Department of Computer Architecture and Technology, Universidad de Sevilla, Seville, Spain
Fernando Diaz-del-Rio, Daniel Cagigas-Muñiz, Jose Luis Guisado-Lizar & Jose Luis Sevillano-Ramos

Authors

Fernando Diaz-del-Rio
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cagigas-Muñiz
View author publications
You can also search for this author in PubMed Google Scholar
Jose Luis Guisado-Lizar
View author publications
You can also search for this author in PubMed Google Scholar
Jose Luis Sevillano-Ramos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Luis Sevillano-Ramos .

Editor information

Editors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Petros Nicopolitidis
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, West Bengal, India
Sudip Misra
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA
Bernard Zeigler
Dalian University of Technology, Dalian, China
Zhaolng Ning

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Diaz-del-Rio, F., Cagigas-Muñiz, D., Guisado-Lizar, J.L., Sevillano-Ramos, J.L. (2022). Efficient Parallel Implementation of Cellular Automata and Stencil Computations in Current Processors. In: Nicopolitidis, P., Misra, S., Yang, L.T., Zeigler, B., Ning, Z. (eds) Advances in Computing, Informatics, Networking and Cybersecurity. Lecture Notes in Networks and Systems, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-030-87049-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-87049-2_4
Published: 03 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87048-5
Online ISBN: 978-3-030-87049-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Efficient Parallel Implementation of Cellular Automata and Stencil Computations in Current Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bio-inspired heterogeneous architecture for real-time pedestrian detection applications

Differences Between 2D Neighborhoods According to Real Time Computation

Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Parallel Implementation of Cellular Automata and Stencil Computations in Current Processors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bio-inspired heterogeneous architecture for real-time pedestrian detection applications

Differences Between 2D Neighborhoods According to Real Time Computation

Vectorization of Cellular Automaton-Based Labeling of 3-D Binary Lattices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation