Download textbook Sustained Simulation Performance 2016 Proceedings Of The Joint Workshop On Sustained Simulation Performance University Of Stuttgart Hlrs And Tohoku University 2016 1St Edition Michael M Resch ebook all chapter pdf
Download textbook Sustained Simulation Performance 2016 Proceedings Of The Joint Workshop On Sustained Simulation Performance University Of Stuttgart Hlrs And Tohoku University 2016 1St Edition Michael M Resch ebook all chapter pdf
Download textbook Sustained Simulation Performance 2016 Proceedings Of The Joint Workshop On Sustained Simulation Performance University Of Stuttgart Hlrs And Tohoku University 2016 1St Edition Michael M Resch ebook all chapter pdf
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
https://textbookfull.com/product/strategic-organizational-
learning-using-system-dynamics-for-innovation-and-sustained-
performance-1st-edition-martha-a-gephart/
https://textbookfull.com/product/building-performance-simulation-
for-design-and-operation-2nd-edition-jan-l-m-hensen/
Seismic Performance of Soil-Foundation-Structure
Systems: Selected Papers from the International
Workshop on Seismic Performance of ... Auckland, New
Zealand, 21-22 November 2016 1st Edition Nawawi Chouw
https://textbookfull.com/product/seismic-performance-of-soil-
foundation-structure-systems-selected-papers-from-the-
international-workshop-on-seismic-performance-of-auckland-new-
zealand-21-22-november-2016-1st-edition-nawawi-ch/
https://textbookfull.com/product/simulation-driven-design-with-
inspire-2nd-edition-altair-university-team/
https://textbookfull.com/product/high-performance-simulation-
based-optimization-thomas-bartz-beielstein/
https://textbookfull.com/product/energy-geotechnics-proceedings-
of-the-1st-international-conference-on-energy-geotechnics-
icegt-2016-kiel-germany-29-31-august-2016-bauer/
Michael M. Resch · Wolfgang Bez
Erich Focht · Nisarg Patel
Hiroaki Kobayashi Editors
Sustained Simulation
Performance
2016
123
Sustained Simulation Performance 2016
Michael M. Resch Wolfgang Bez
•
Hiroaki Kobayashi
Editors
Sustained Simulation
Performance 2016
Proceedings of the Joint Workshop
on Sustained Simulation Performance,
University of Stuttgart (HLRS)
and Tohoku University, 2016
123
Editors
Michael M. Resch Nisarg Patel
High Performance Computing Center High Performance Computing Center
(HLRS) (HLRS)
University of Stuttgart University of Stuttgart
Stuttgart Stuttgart
Germany Germany
Erich Focht
NEC High Performance Computing
Europe GmbH
Stuttgart
Germany
Figure on Front Cover: Domain decomposition of a hierarchical Cartesian mesh. A Hilbert curve is used
to partition the grid at a relatively coarse refinement level. Due to the depth-first ordering of the cells, this
leads to complete subtrees being distributed among the available MPI ranks, improving the parallel
performance of coupled multiphysics simulations
Figure on Back Cover: Hierarchical Cartesian mesh with local refinement towards the lower boundary.
Among neighbouring cells, the level difference is at most one, leading to a size ratio of 2:1 (2D) or 4:1
(3D) between the cells
Mathematics Subject Classification (2010): 68Wxx, 68W10, 68Mxx, 68U20, 76-XX, 86A10, 70FXX,
92Cxx, 65-XX
v
vi Preface
We would like to thank all the contributors and organizers of this book and the
sustained simulation performance project. We thank especially Prof. Hiroaki
Kobayashi for the close collaboration over the past years and are looking forward to
intensify our cooperation in the future.
vii
viii Contents
Vl. V. Voevodin
Abstract Each new computing platform required software developers to analyze the
algorithms over and over, each time having to answer the same two questions. Does
the algorithm possess the necessary properties to meet the architectural requirements?
How can the algorithm be converted so that the necessary properties can be easily
reflected in parallel programs? Changes in computer architecture do not change
algorithms, but this analysis had to be performed again and again when a program was
ported from one generation of computers to another, largely repeating the work that
had been done previously. Is it possible to do the analysis “once and for all,” describing
all of the key properties of an algorithm so that all of the necessary information can
be gleaned from this description any time a new architecture appears? As simple
as the question sounds, answering it raises a series of other non-trivial questions.
Moreover, creating a complete description of an algorithm is not a challenge, it is a
large series of challenges, and some of them are discussed in the paper.
1 Introduction
Parallel computing system architectures have gone through at least six generations
over the past 40 years, each requiring its own algorithm properties and a special
program writing style. In each case, it was important not only to find suitable features
for the algorithms, but also to express them properly in the code, using special
programming technologies. In fact, each new generation of computing architecture
required a review of the entire software pool.
The generation of vector pipeline computers got off to a rapid start in the early
seventies with the launch of the Cray-1 supercomputer. Machines of this class were
based on pipeline processing of data vectors, supported by vector functional units and
vector instructions in machine code. Full vectorization was the most efficient program
implementation, which implied complete replacement of any innermost loops in the
program body with vector instructions. Hence the requirements for algorithms and
Each new computing platform required software developers to analyze the algorithms
over and over, each time having to answer the same two questions. Does the algorithm
possess the necessary properties to meet the architectural requirements? How can
the algorithm be converted so that the necessary properties can be easily reflected in
parallel programs? Changes in computer architecture do not change algorithms, but
this analysis had to be performed again and again when a program was ported from
one generation of computers to another, largely repeating the work that had been
done previously.
This begs a natural question: is it possible to do the analysis “once and for all,”
describing all of the key properties of an algorithm so that all of the necessary infor-
mation can be gleaned from this description any time a new architecture appears? As
simple as the question sounds, answering it raises a series of other questions. What
does it mean “to perform analysis” and what exactly needs to be studied? What kind
of “key” properties need to be found in algorithms to ensure their efficient imple-
mentation in the future? What form can (or should) the analysis results take? What
makes a description of algorithm properties “complete?” How does one guarantee
that a description is complete and that all of the relevant information for any computer
architecture is included?
The questions are indeed numerous and non-trivial. Obviously, a complete
description needs to reflect many ideas: computational kernels, determinacy, infor-
mation graphs, communication profiles, a mathematical description of the algorithm,
performance, efficiency, computational intensity, the parallelism resource, serial
complexity, parallel complexity…[3] All of these concepts, and many others, are
used to describe an algorithm’s properties from different perspectives, and they all
are quite necessary in practice under various situations.
To immediately introduce some order to these diverse concepts, one can begin
by breaking up an algorithm’s description into two parts. The first part is dedicated
to the algorithm’s theoretical properties, and the second part describes its particular
implementation features. This division allows the machine-independent properties of
algorithms to be separated from the numerous issues arising in practice. Both parts of
the description are equally important: the first one describes the algorithm’s theoreti-
cal potential, and the second one demonstrates the practical use of that potential. The
first part of the description explains the mathematical formulation of the algorithm,
its computational kernel, input and output data, information structure, parallelism
resources and properties, determinacy and computational balance of the algorithm,
etc. The second part contains information on an algorithm’s implementation: locality,
performance, efficiency, scalability, communication profile, implementation features
on various architectures, and so on.
6 Vl.V. Voevodin
Many of the ideas described above are very well known. However, as you start
describing the properties of real algorithms, you realize that creating a complete
description of an algorithm is not a challenge, it is a large series of challenges!
Unexpected problems arise at each step, and a seemingly simple action becomes a
stumbling block. Let’s look at the information structure of an algorithm mentioned
above. It is an exceptionally useful term that contains a lot of information about the
algorithm. An information graph is a convenient representation of an algorithm’s
information structure. In many cases, looking at the information graph is enough to
understand its parallel implementation strategy. Figure 1a, b show the information
structure for typical computational kernels in many algorithms, Fig. 1c shows the
information structure of a Cholesky decomposition algorithm.
An information graph can be simple for many examples. However, in general,
the task of presenting an information graph is not a trivial exercise. To begin with,
a graph can potentially be infinite, as the number of vertices and arcs is determined
by the values of external input variables which can be very large. In this situation it
helps to look at likenesses: graphs for different values of external variables look very
“similar” to one another, so it is almost always enough to present one small graph,
stating that the graphs for other values will look “exactly the same.” Not everything
is so simple in practice, however; and one should be very careful here.
Next, an information graph is potentially a multi-dimensional object. The most
natural coordinate system for placing vertices and arcs in an information graph relies
Fig. 3 Sequence of steps in the parallel execution of an algorithm based on a canonical parallel
layer form
using just the information about its algorithm? On the one hand, there are no data
structures in algorithms—they only appear in programs; so talking about locality for
algorithms is not exactly right. On the other hand, it is the algorithm that determines
the structure and properties of a program to be coded, including its locality. Many
have probably heard the expression “the algorithm’s locality” or “this algorithm
has better locality than the other.” How appropriate are these statements, given that
algorithms do not contain data structures?
Determinacy is an important practical aspect of algorithms and programs, but how
can one describe all of the potential sources which violate this property? A serious
cause of indeterminacy in parallel programs is related to changes in the order of
executing associative operations. A typical example is the use of global operations
in Message Passing Interface (MPI) by a number of parallel processes, e.g., when
summing the elements of a distributed array. The MPI runtime system chooses the
order of execution on its own, assuming compliance with the associative law, which
results in various round-off errors and ultimately in different results when executing
the same application. This is a serious issue often encountered in massively parallel
computing systems that causes results of parallel program execution to not be repro-
ducible. If the analysis of an algorithm’s structure shows that the resulting parallel
application cannot work without global operations, this property must be included in
the algorithm description. To analyze this problem properly, a communication profile
should be built for the parallel program, pointing out the structure and interaction
method between parallel processes. A clear definition of the communication profile
hasn’t been produced to date, so it is premature to consider in-depth analysis in this
area.
Indeed, there are many open questions, and the list can go on. The main question
that still remains unanswered is “What does it mean to create a complete description
of an algorithm?” What must be included in this description, so that we can glean all
of the necessary information from it every time a new computing platform appears?
The task seems simple at first sight: an algorithm is just a sequence of mathematical
formulas, often short and simple, which should easily be analyzed. But at the same
time, no one can guarantee the completeness of such a description.
The properties of the algorithms and programs discussed in this work became
the foundation for the AlgoWiki project [1]. The project’s main goal is to provide a
description for fundamental algorithm properties which will enable a more compre-
hensive understanding of their theoretical potential and their implementation features
in various classes of parallel computing systems. The project is expected to result in
the development of an open online encyclopedia based on wiki technologies which
will be open to contributions by the entire academic and educational community.
The first version of the encyclopedia is available at http://AlgoWiki-Project.org/en,
where users can describe both their own pedagogical experience and their knowledge
of specific parallel algorithms.
10 Vl.V. Voevodin
4 Conclusion
All of the issues discussed in this work are highly important for training future spe-
cialists [4–6]. Right from the beginning of the education process, focus should be
placed on algorithm structure since it determines both the implementation quality
and the potential for efficiently executing programs in a parallel environment. The
algorithm structure and its close relationship to parallel computing system architec-
ture are central ideas in parallel computing, which are included in many courses for
Bachelor’s and Master’s degree programs at the Faculty of Computational Mathe-
matics and Cybernetics at Lomonosov Moscow State University, as well as in the
lectures and practical courses offered by the annual MSU Summer Supercomputing
Academy [7]. We are also trying to expand this concept to the Supercomputing Con-
sortium of Russian Universities [8] in order to develop a comprehensive supercom-
puter education system, rather than offering occasional training aimed at rectifying
the situation.
Acknowledgements This project is being conducted at Moscow State University with financial
support from the Russian Science Foundation, Agreement No 14-11-00190.
References
1. Antonov, A., Voevodin, V., Dongarra, J.: AlgoWiki: an open encyclopedia of parallel algorithmic
features. Supercomput. Front. Innov. 2(1), 4–18 (2015)
2. Dongarra, J., Beckman, P., Moore, T., Aerts, P., Aloisio, G., Andre, J.C., Barkai, D., Berthou,
J.Y., Boku, T., Braunschweig, B., et al.: The international exascale software project roadmap.
Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)
3. Voevodin, V.V., Voevodin. Vl.V.: Parallel Computing. BHV-Petersburg, St. Petersburg (2002).
(in russian)
4. Computing Curricula Computer Science. http://ai.stanford.edu/users/sahami/CS2013 (2013)
5. Future Directions in CSE Education and Research, Workshop Sponsored by the Society for
Industrial and Applied Mathematics (SIAM) and the European Exascale Software Initia-
tive (EESI-2), http://wiki.siam.org/siag-cse/images/siag-cse/f/ff/CSE-report-draft-Mar2015.
pdf (2015)
6. NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing. http://www.cs.
gsu.edu/~tcpp/curriculum/
7. Summer Supercomputing Academy. http://academy.hpc-russia.ru/
8. Supercomputing Education in Russia, Supercomputing Consortium of the Russian Universities.
http://hpc.msu.ru/files/HPC-Education-in-Russia.pdf (2012)
High Performance Computing
and High Performance Data
Analytics—What is the Missing Link?
Abstract Within this book chapter, technologies for data mining, data processing
and data interpreting are introduced, evaluated and compared. Especially, traditional
High Performance Computing, and the newly emerging fields High Performance
Data Analytics and Cognitive Computing are put into context in order to understand
their strengths and weaknesses. However, the technologies have not been evaluated
solely, but also the missing links between them have been identified and described.
1 Introduction
At this point of time, there are various technologies in the market that target data
analysis, data processing, data interpreting and data mining. So far, it has not been
clear if all of those technologies are direct competitors or can be seen in a comple-
mentary fashion. This book chapter therefore analyses the technologies carefully and
introduces as well as compares their direct angles. Being more concrete, traditional
High Performance Computing, the newly emerging field High Performance Data
Analytics as well as Cognitive Computing are evaluated. In particular, the interac-
tions between those technological fields are visualized in addition.
The book chapter is organized as follows: Section 2 is providing the High Perfor-
mance Computing context, Sect. 3 is introducing High Performance Data Analytics
whereas Sect. 4 compares the approaches and describes the missing links. Finally,
Sect. 5 concludes this book chapter.
Within this section of the book chapter, a generic view on High Performance Com-
puting (HPC) and its evolution over time is given. Although the purpose of such
HPC systems is in principle the same, the available performance, the customer base
as well as the computational and applications models changed in the last decade. In
summary, various application areas such as computational fluid dynamics, climate or
physics simulations are considered HPC relevant at the moment, which are executed
on innovative systems that may be equipped by vector central processing units, by
commonly used x86 processors or even accelerators.
Within the last decade, there was a huge evolution with regards to the HPC systems.
Reaching from vector machines to the widely adopted x86 architecture and modern
accelerators, especially hardware evolved quickly. In the meantime, HPC systems
with more than 1.000.000 cores are not an utopia any more1 so that besides the
efficiency of the systems, also the models and applications can benefit from the huge
amount of provided computational performance.
But not just the hardware evolved, also the customer basis changes: industrial
applications from the automotive world, academic applications dealing with, for
instance, climate simulation as well as applications from small and medium sized
1 Top500: http://www.top500.org
High Performance Computing and High Performance Data . . . 13
enterprises from various kinds of areas are targeting the High Performance Com-
puting systems. However, with the evolved systems and the immense performance,
also the execution models get more complicated. On the one hand, there are still
traditional applications that require a huge amount of resources for a single run and
on the other, parametric studies with less constant performance requirements but
generating a huge amount of results are common in state-of-the-art HPC systems.
Nevertheless, HPC driving applications are still usual in the High Performance
Computing area, but due to the changing application and executions models, general-
purpose systems are becoming more evident as large computational intensive appli-
cations typically produce a huge amount of results. So there is currently a trade-off
between providing generic systems that are flexible enough to cope with different
kinds of workloads and such systems that are solely made to provide one single key
performance type.
In contrast to Sect. 2, this chapter focuses High Performance Data Analytics (HPDA),
a new emerging field for the High Performance Computing sector. High Performance
Data Analytics target the efficient analytics of various kinds of data, reaching from
structured up to unstructured as well as streaming data, which cannot be analysed
anymore on standard workstations or Clouds due to their volume, their variety or
their velocity.
As already highlighted in the sections before, HPC and HPDA approaches in terms of
hardware and software require different technologies. Therefore, these requirements
will be discussed and addressed in particular in this sub-section to bridge the gap
between both technologies.
In terms of hardware, data intensive workloads require different key performance
indicators than standard HPC applications. The differences between both approaches
are highlighted below:
• Processors
In traditional HPC systems, fast processors with fast memory pipelines are focused.
For HPDA systems, the amount of Floating Point Operations Per Second is still
important, however the performance of the system is determined by the storage
system.
• Memory
The more memory available for data analytics, the better for the overall application
execution since most of the data and results can be kept in memory instead of check
pointing them to the storage backend. For HPC systems, the same statement holds,
although much smaller memory systems are targeted than in the HPDA area.
• Networks
Whenever data needs to be transferred, fast interconnects come into play. So both,
HPC and HPDA systems require fast memory and latency-oriented networks in
order to transfer the data efficiently.
• Storage
Typical HPC systems provide a central system storage from which all the required
data gets read and written. An approach like this is not possible for HPDA since
the data accessibility is the key performance indicator for the whole applications.
Therefore, data analytics systems provide fast local disks that can be used to
provide and cache the data in order to optimize the application execution.
As can be seen, the main differences between HPC and HPDA systems are located
in the area of processors and storages, since fast number-crunching processors are
required for HPC only. In contrast, very fast input/output systems with large capacity
are mandatory for efficient data processing.
The software requirements come along with the hardware requirements. In con-
trast to traditional HPC applications, which require programming models and para-
digms such as message passing or shared memory parallelism, data analytics appli-
cations rely on in-memory processing and programming languages such as Java,
Python or Scala. So the most important applications for data analytics are currently
High Performance Computing and High Performance Data . . . 15
the Apache tools Spark2 , Hadoop3 , Storm4 and Flink5 as well as some smaller projects
such as Disco Project6 , DataTorrent7 or BashReduce8 .
Most of those applications build on the MapReduce algorithm, which has been
introduced by the global player Google9 . The MapReduce algorithm consists of
three phases—map, shuffle and reduce, whereas the map and the reduce parts are
directly specified by the user in order to allow parallel processing of data on manifold
machines. Using his concept enables processing different kinds of data, reaching from
structured data including files and databases up to unstructured and real-time data
such as online data composed of several data structures.
In order to proof the statements of the last sections and sub-sections, the information
shall be complemented with a practical example from the Global Systems Science
community, which represents an emerging field in the HPC sector. Within the EC-
funded CoeGSS project10 , a set of applications is focused that require particular
workflows to retrieve the results. In particular, the workflow foresees HPDA, huge-
scale HPC, small-scale HPC and visualization to generate synthetic populations,
execute the resulting agent-based models and finally, visualize the results [1]. For
clarification, the workflow and its targeted technologies is depicted in Fig. 1.
Thus, those kinds of applications demonstrate that there is a new need to support
other methods and techniques than the classical HPC applications demand. As a
consequence, being competitive in terms of hardware and software reaches a new
level of complexity.
the resulting outputs, especially in terms of data variety and data size get hard to
handle for a human in the loop.
We see a tendency in so-called “business-ready solutions” to stress the support of
the human in the loop by application of technological fields such as machine learn-
ing, artificial intelligence or cognitive computing. For the remainder of this paper
we will stick to the term cognitive computing as a placeholder for the above men-
tioned disciplines, which can be described as the variety of scientific disciplines of
Artificial Intelligence and Signal Processing11 . A similar view has been presented by
James Kobielus, Big Data Evangelist, 2013, in a blog entry on Cognitive Computing:
Relevant at all Speeds, Scales and Scopes of Thought, where he defines cognitive
computing as
the ability of automated systems to handle the conscious, critical, logical, attentive, reasoning
mode of thought that humans engage in when they, say, play Jeopardy or try to master some
academic discipline.
The principles of cognitive computing are not new, and nearly everyone who is in the
Information Technology business has at a certain point in time heard of this topic.
Thus is it also not surprising, that it’s base assumptions and ideas were even reported
already at the end of the 19th century, when Boole proposed its book on “The Laws
of Thoughts” [2]. Even though this was just a conceptual approach, and the first
programmable computer by Zuse needed As already mentioned before, during the
evolution of these principles, the domain of cognitive methodologies and artificial
intelligence went either side by side or showing clear overlaps. A variety of theories
and implementation approaches were taken, the probably most prominent ones being
so far IBM’s Watson [3] and the recently presented AlphaGo [4].
4.2 Benefits
Figure 2 shows how High Performance Computing, High Performance Data Analyt-
ics and Cognitive Techniques can complement each other. High Performance Com-
puting (HPC) delivers the needed processing power for those kind of applications,
requiring massive parallel execution. At the same time, these kind of applications
produce partially enormous amounts of data, which may be too big to be manually
analysed, even having current support tools at hand. Thus the discipline of High
Performance Data Analytics can be used to analyse and handle these (and other
sources’ data sets) in a sufficient way. Cognitive techniques can provide support to
both disciplines, to help to interpret and present the results in a best possible way.
In a general way, the expected benefits from applying these concepts, are manifold.
In general support for those fields where big amounts of data are collected, handled
and interpreted is improved, examples are:
• Enhanced analysis of business potentials of new offerings/new activities. This
can reach from the virtual testing of new opportunities, e.g. in drug design or
on combined virtual and real world simulations such as finding new geographic
locations for drilling
• Support of staff (e.g. engineers) in decision processes by providing them a selection
of potential paths to follow
• Improving Operations by understanding of performed operations and their para-
meters, so that either in real time or after longer-duration analysis processes can
be optimized
Taken this complementarity into account, the workflow as described in Fig. 1 can
be extended to the one presented in Fig. 3.
18 B. Koller et al.
Fig. 2 Cognitive Techniques complementing the global picture of HPC and HPDA
Within this document, we also want to have a short look at those technologies, which
may act as baseline to realize an integration of cognitive concepts into a traditional
HPC/HPDA based workflow (e.g. the one presented in Fig. 3.
In the case of Watson, a variety of APIs is available for selected developers and
business users, as well as the Watson Analytics Solution12 . Furthermore there is a
variety of Open Source alternative available, which shall be discussed on a high level
in the following overview:
DARPA DeepDive
DeepDive [5, 6] is a free version of a Watson like system. It was developed within
the frame of the US Defense Advanced Research Projects Agency (DARPA) and
in opposite to Watson has the aim to extract structured data from unstructured data
sources. DeepDive uses machine learning technologies to train itself and targets
especially those users with moderate to no machine learning expertise.
UIMA
Apache Unstructured Information Management (UIMA)13 is supporting the analy-
sis of large sets of unstructured information. Its an implementation of the Oasis
Unstructured Information Management standard14 OpenCog
OpenCog [7] is a project targeting artificial intelligence and delivering an open
source framework. One output of OpenCog is the cognitive architecture OpenCog
Prime [8] for robot and virtual embodied cognition.
5 Conclusions
The previous sections have pointed out that High Performance Computing and High
Performance Data Analytics can be seen as rather complementary approaches, then
as direct competitors. Even though there are activities to provide a common software
stack, which may run on both, HPC and HPDA specific hardware, there is only a
subset of concrete problems in the problem space which can be addressed efficiently
in such a manner. Mainly, this is a result of the partially quite different hardware set
up of the respective technological environment.
Now, assuming that HPC and HPDA work with a high performance, we also
have to face the fact that the size and amount of data sets proceeded and again
resulting from this processing enter a dimension, which makes a satisfactory manual
processing by a human in the loop (e.g. an engineer) nearly impossible. Thus we see
that even if there is an issue (e.g. data analytics) solved with those appliances, another
issue pops up which is the understanding and respectively handling of information.
12 http://www.predictiveanalyticstoday.com/ibm-watson-analytics-beta-open-business/
13 http://uima.apache.org/
14 https://www.oasis-open.org/committees/download.php/28492/uima-spec-wd-05.pdf
20 B. Koller et al.
For that purpose we have introduced cognitive technologies, which can act as some
sort of “helper” technology to simplify the life of the end user and enable for improved
use of simulation results. This technology, even if it appears to be still in its infancy,
can support the (human) end user and provide decision baselines allowing improved
processing of information. We have shown that a variety of implementations already
exist, next steps need to see in how far they can cover the requirements of selected
use cases.
References
1. Wolf, S., Paolotti, D., Tizzoni, M., Edwards, M., Fuerst, S., Geiges, A., Ireland, A., Schuetze,
F., Steudle, G.: D4.1 - First report on pilot requirements. http://coegss.eu/wp-content/uploads/
2016/03/CoeGSS_D4_1.pdf
2. Boole, G.: Investigation of the Laws of Thought on Which are Founded the Mathematical
Theories of Logic and Probabilities (1853)
3. Ferrucci, D.A., Brown, E.W., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A., Lally, A., Mur-
dock, J.W., Nyberg, E., Prager, J.M., Schlaefer, N., Welty, C.A.: Building Watson: an overview
of the DeepQA project. AI Mag. 31(3), 59–79 (2010)
4. Silver, D., Hassabis, D.: AlphaGo: mastering the ancient game of Go with Machine Learn-
ing, Blogpost. https://research.googleblog.com/2016/01/alphago-mastering-ancient-game-of-
go.html (2016)
5. Niu, F., Zhang, C., Re, C., Shavlik, J.W.: DeepDive: web-scale knowledge-base construction
using statistical learning and inference. 884. In: VLDS: CEUR-WS.org. (CEUR Workshop
Proceedings), pp. 25–28 (2012)
6. Zhang, C.: DeepDive: a data management system for automatic knowledge base construction,
Ph.D. Dissertation, University of Wisconsin-Madison (2015)
7. Hart, D., Goertzel, B.: OpenCog: a software framework for integrative artificial general intelli-
gence. In: Wang, P., Goertzel, B., Franklin, S. (eds.) ’AGI’, pp. 468–472. IOS Press (2008)
8. Goertzel, B.: OpenCog Prime: a cognitive synerfy based architecture for artificial general intel-
ligence
9. Hurwitz, J.S., Kaufman, M., Bowles, A.: Cognitive Computing and Big Data Analytics. Wiley,
Indianapolis (2015)
A Use Case of a Code Transformation Rule
Generator for Data Layout Optimization
1 Introduction
When data are stored in a memory space, the layout of data often needs to be opti-
mized so as to make a better use of memory hierarchy and architectural features.
Today, such data layout optimization is critically important to achieve high perfor-
mance on a modern high-performance computing (HPC) system, because the system
performance is very sensitive to memory access patterns. Memory access can easily
become a performance bottleneck of an HPC application.
The data layout of an application can be optimized by changing data structures
used in the code. One problem is that a human-friendly, easily-understandable data
representation is often different from a computer-friendly data layout. This means
that, if the data layout of a code is completely optimized for computers, the code
may be no longer human-friendly.
We have been developing a code transformation framework, Xevolver, so that
users can define their own rules to transform an application code [1, 2]. In this
article, such a user-defined code transformation rule is adopted to separate the data
representation in an application code from the actual data layout in a memory space.
Instead of simply modifying a code for data layout optimization, the original code
is usually maintained in a human-friendly way and then mechanically transformed
just before the compilation so as to make the transformed code computer-friendly.
One important question is how to describe code transformation rules. A con-
ventional way of developing such a code translator is to use compiler tools, such as
ROSE [3]. Actually, at the lowest abstraction level, Xevolver allows users to describe
a code transformation rule as an AST transformation rule. Since AST transforma-
tion is exactly what compilers internally do, compiler experts can implement various
code transformation rules by using the framework. However, standard programmers
who optimize HPC application codes are not necessarily familiar with such compiler
technologies. Therefore, we are also developing several high-level tools to describe
the rules more easily.
Xevtgen [4] is one of high-level tools to help users define custom code transfor-
mation rules. This article shows a use case of Xevtgen for data layout optimization,
and discusses how it can help users define their own transformations.