1 Introduction
Macroprogramming refers to the theory and practice of conveniently expressing the macro(scopic) behaviour of a system using a single program, often leveraging macro-level abstractions (e.g., collective state, group, or spatiotemporal abstractions). This is not to be confused with the use of macros (abbreviation for macroinstructions)—that is, the well-known mechanism for compile-time substitution of program pieces (e.g., characters, tokens, or abstract syntax trees), available in programming languages ranging from C and Common Lisp to Scala and Rust. Macros may be a mechanism for implementing macroprogramming, but not all uses of macros are macroprogramming, which concerns programming the overall behaviour of a system of multiple computational entities. Macroprogramming is a paradigm driven by the need of designers and application developers to capture system-level behaviour while abstracting, in part, the behaviour and interaction of the individual components involved. It can be framed as a paradigm since it embodies a (systemic) view or perspective of programming, and accordingly provide lenses to the programmer for understanding and working on particular aspects of systems—especially those related to collective behaviour, interaction, and global, distributed properties.
In the past, this style of programming has been primarily adopted to describe the behaviour of Wireless Sensor Networks (WSNs) [Mottola and Picco
2011], where data gathered from sensors are to be processed, aggregated, and possibly moved across different parts or regions of the network to be consolidated into useful, actionable information. More recently, certain research trends and niches have provided renewed interest in macro approaches. Research in the contexts of Internet of Things (IoT) and Cyber-Physical Systems (CPSs) has proposed macroprogramming approaches (cf. [Mizzi et al.
2018; Azzara et al.
2014]) to simplify the development of systems involving a multitude of connected sensors, actuators, and smart devices. In the spatial computing thread [Beal et al.
2012], space can represent both a means and a goal for macroprogramming . Indeed, declaring what has to be done in a spatiotemporal region allows systems to self-organise to effectively carry out the task at hand, dynamically adapting to the specifics of the current deployment and spatial positions of the components involved. Similarly, one can program a system, such as a drone fleet, in a high-level fashion to make it seek and maintain certain shapes and connectivity topologies. Indeed, swarm-level programming models have been proposed in robotics research [Pinciroli and Beltrame
2016]. In distributed artificial intelligence and multi-agent systems research [Adams
2001], an important distinction is made between the
micro level of individual agents and the
macro level of an “agent society”, sometimes explicitly addressed by organisation-oriented programming approaches [Boissier et al.
2013]. In the field of Collective Adaptive System (CAS) engineering [Ferscha
2015; De Nicola et al.
2020], macroprogramming abstractions can promote collective behaviour exhibiting self-* properties (e.g., self-organising, self-healing, self-configuring) [Kephart and Chess
2003; de Lemos et al.
2010]. In Software-Defined Networking (SDN), the centralised view of the control plane has promoted programming networks as “one big switch” [Kang et al.
2013].
This work draws motivation from a profusion of macroprogramming approaches and languages that have been proposed in the past two decades, aiming to capture the aggregate behaviour of certain classes of distributed systems. However, contributions are sparse, isolated in research niches, and tend to be domain-specific as well as technological in nature. This survey aims to consolidate the state of the art, provide a map of the field, and foster research on macroprogramming .
This article is organised as follows. Section
2 covers the method adopted for carrying out the survey. Section
3 provides an overview of the research fields where macroprogramming techniques have been proposed, also tracing the history of the field. Section
4 defines a conceptual framework and taxonomy for macroprogramming . Section
5 is the core of the survey: it classifies and presents the selected primary studies. Section
6 provides an analysis of the surveyed approaches and discusses opportunities and challenges of macroprogramming . Section
7 covers related work, discussing the contributions of other secondary studies. Finally, Section
8 provides a wrap-up.
2 Survey Method
This section briefly describes how the survey has been carried out. It focusses on motivation, research questions, data sources, presentation of results, and terminology.
2.1 Survey Method
Although this is not a systematic literature review, the survey has been developed by considering guidelines by systematic literature review methodologies like that of Kitchenham and Charters [
2007]. More details follow.
2.1.1 Review Motivation.
As anticipated in Section
1, the survey draws motivation by the emergence of a number of works that more or less explicitly identify themselves as macroprogramming approaches. Related secondary studies have been carried out in the past: they are reviewed in Section
7. However, they focus on particular perspectives or domains (e.g., spatial computing or WSN programming), are a bit outdated, and consider macroprogramming as a particular class of approaches in their custom scope. Critically, macroprogramming has never been investigated as a field per se, yet. Another major motivation lies in the fragmentation of macroprogramming -related works across disparate research fields and domains. Thus, a goal of this very survey is to provide a map of macroprogramming -related literature, promoting interaction between research communities and development of the field. More motivation is given by the urge of the following research questions.
2.1.2 Research Goals and Questions.
The goal of this article is to explore the literature on macroprogramming in breadth, synthesise the major contributions, and provide a basis for further research. The focus is on the
programming perspective rather than, for example, modelling formalisms for analysis and prediction; namely, the contribution can be framed in
language-based software engineering [Gupta
2015]. To better structure the investigation, we focus on the following research questions, inspired by the “six honest serving men” [Kipling
1902] as, for example, in the work of Flood [
1994]:
RQ0
Why, where, and for who is macroprogramming most needed?
RQ1
What is macroprogramming, and, especially, what is not?
RQ2
How is macroprogramming implemented? Namely, what are the main macroprogramming approaches and abstractions?
RQ3
What opportunities can arise from research on macroprogramming ?
RQ4
What are the key challenges in macroprogramming systems?
RQ0 is addressed in Section
3.
RQ1 is addressed in Section
4.
RQ2 is addressed in Section
5. Finally,
RQ3 and
RQ4 are addressed in Section
6.
2.1.3 Identification, Selection, and Quality Assessment of Primary Research.
Primary research studies have been identified by searching literature databases (e.g., Google Scholar, DBLP, IEEE Xplore, ACM Digital Library) for keywords such as “macroprogramming”, “global-level programming”, “network-wide programming”, and “swarm programming”. Terminology is fully covered and discussed in Section
2.2. Additional sources include other secondary and primary studies, which are surveyed in Section
7 and Section
5, respectively.
The survey scope is wide and includes Ph.D. theses, technical reports, and papers presented at workshops, at conferences, and in journals as well as across different domains and research communities. Works that are deemed too preliminary (e.g., position papers), not enough “macro” (refer to Section
4), or neglecting the “programming” aspects (e.g., describing a middleware but no programming language) have been excluded, after being manually inspected.
2.1.4 Data Extraction, Synthesis, and Dissemination.
For each primary study, notes are taken regarding its
self-positioning (i.e., how the authors define their contribution), its
programming model (i.e., what main abstractions are provided), its
implementation (i.e., how macro-abstractions are mapped to micro-level operations), and
source-code examples. The data is synthesised using the conceptual framework introduced in Section
4. When covering and summarising the primary works in the survey (Section
5), we tend to keep and report the terminology originally used in the referenced papers, possibly explained and compared with the terminology used in this article. This should help to preserve the richness and nuances of each work while the common perspective is ensured by proper selection and emphasis of the information included in the descriptions. Examples—adapted from those already included in the primary studies or created anew from composing code snippets described in those papers—are provided when they are reasonably “effective” or “diverse” from those already presented—that is, they are brief and simple in transmitting how the reviewed approach looks and works.
2.2 A Note on Terminology
A first issue in macroprogramming research is the fragmentation and ambiguity of terminology, which—together with domain fragmentation (see Section
3)—leads to (i) difficulty when searching for related work and (ii) obstacles in the formation of a common understanding. Across the literature, multiple terms such as
macroprogramming,
system-level programming, and
global-level programming are used to refer to the same or similar concepts: this does not promote a unified view of the field and hinders progress by preventing the spread of related ideas. At the same time, there is a problem of usage of both over- and under-specific terms. Overly general terms both witness the lack and prevent the formation of a common ground. However, overly specific terms, mainly due to domain specificity of research endeavours, fail at recognising the general contributions or at advertising the effort in the context of a bigger picture.
In the following, we list some terms that have been used (or might be used)—with more or less good reason—when referring to macroprogramming, and analyse their semantic precision (by reasoning on their etymology and other common uses) as well as alternative meanings in the literature (for conflicts with more or less widespread acceptations).
Macroprogramming, Macro-Programming, Macro Programming, Macro-Level Programming. These are the premier terms for the subject of this article and may indeed refer to
programming macroscopic aspects of systems (often, by leveraging macro-level abstractions). However, these terms are sometimes also used in other computer programming related contexts. The potentially ambiguity stems from the word “macro”, which is and can be used to abbreviate both “macroscopic” and “macroinstructions”—often used in the sense of
macros—that is, the well-known programming language mechanism for compile-time substitutions of program pieces. Indeed, it is common to say that macros are written using a macro (programming) language. The result is that searching for these terms leads to a mix of results from both worlds. Unfortunately, with macros being a very common mechanism [Lilis and Savidis
2020], macroscopic programming-related entries remain relatively little visible in search results, unless other keywords are used to narrow the context scope—but then, only a fragment of the corpus can be located.
System Programming, System-Level Programming, System-Oriented Programming. All these terms are also ambiguous. Indeed, they strongly and traditionally refer to
low-level programming—that is, programming performed at a level close to the (computer) system (i.e., to the machine) [Appelbe and Hansen
1985]. System programming languages include, for example, C, C++, Rust, and Go. A better name for these would probably be, as suggested by Dijkstra,
machine-oriented languages, but such a “system” acceptation is a sediment of the field by now. The scarce accuracy of the term was also somewhat acknowledged by researchers in the object-oriented programming community [Nygaard
1997]. However, in some cases, system-level programming is contrasted with device-level programming, to mean approaches that address “a system as a whole” [Liang et al.
2016].
Centralised Programming. This term [Gude et al.
2008; Lima et al.
2006] commonly refers to programming a distributed system through a single program where distribution is (partially [Waldo et al.
1996]) abstracted away—that is, like if the distributed system were a centralised system, namely a software system on a single computer deployment. An example of centralised programming is
multi-tier programming [Weisenburger et al.
2020]. This notion is certainly related to macroprogramming, since a “centralised perspective” where several distributed components can be addressed at once is a macroscopic perspective. However, as discussed in Section
4, programming the macro level often implies more than programming the individual components from a centralised perspective.
High-Level Programming. This term, identifying a style of programming that abstracts many details of the underlying platform, lacks precision. Macroprogramming is a form of high-level programming, but not all high-level programming is macroprogramming (for a conceptual framework for macroprogramming, refer to Section
4).
Examples of Domain-Specific or Alternative Terminology: Global-Level Programming, Network-Wide Programming, Organisational Programming, Swarm Programming, Aggregate Programming, Ensemble Programming, Global-to-Local Programming, Team-Level Programming, Organisation-Oriented Programming. These terms will be explained and properly organised in the following sections. From this list of terms, however, it is already possible to get a sense of (i) an intimate need, from different research communities, to linguistically emphasise a focus on macroscopic aspects of systems, and (ii) the urge for a common conceptual framework where such disparate contributions can be framed.
4 A Conceptual Framework and Taxonomy
In this section, after some preliminaries (Section
4.1), we define macroprogramming, describe its essential elements (Section
4.2), and distinguish it from other related notions like centralised programming (Section
4.3). Then, we propose a taxonomy and conceptual framework (Section
4.4) for classifying and studying the macroprogramming approaches surveyed in Section
5.
4.1 Preliminaries
Consider the problem of programming the behaviour of a computational system \(\mathcal {S}\) composed of multiple computational entities. Let A and B be two different entities of that system. We have three main modes for affecting their behaviour to promote the behaviour or properties ascribable to the overall system \(\mathcal {S}\) (which, as we will shortly see, is essentially the goal of macroprogramming):
(1)
Change their context (e.g., inputs). The entities will be indirectly influenced by the different context. For instance, if A is a sensor, it might sense a different value, which may in turn affect B and so on.
(2)
Interaction (e.g., trigger/orchestrate their behaviour). For instance, if A is an actuator, it might be commanded to act upon the environment, which may in turn affect B and so on.
(3)
Set their behaviour. Part of the behaviour of A and B may be set or changed such that, when activated (e.g., in a reactive or proactive way), certain global outcomes will be produced.
Let us use the term program to mean an (abstract) description that can be executed by some (abstract) computational entity. Note that modes (1) and (2) allow a program to affect A or B, and hence \(\mathcal {S}\), by having it executed by another entity, say C, assumed to be external to the arbitrary boundary of \(\mathcal {S}\).
4.2 Macroprogramming: Definition and Basic Concepts
We define
macroprogramming as
“an abstract paradigm for programming the macro(scopic) behaviour of systems of computational entities.”1 As a paradigm (see Section
4.3.2 for a discussion on this), it is “an approach to programming based on a mathematical theory
or a coherent set of principles” [Van Roy
2009] (emphasis added). Macroprogramming is based on the following principles, which can be partially extracted from the various definitions given in literature (cf. Table
1):
P1
Micro-macro distinction: Two main levels of a system are considered: a macro level (of global structures, of state, of behaviour) and a micro level (of computational entities).
P2
Macroscopic perspective: The programming activity tends to focus on macroscopic aspects of a system, which may include summary observations and views whereby micro-level entities are considered by a global (or non-local) and conceptually centralised perspective.
P3
Macroprogram: The output of the macroprogramming activity is a program that is conceptually executed by the system as a whole and whose intended meaning adopts the macroscopic perspective.
P4
Macro-to-micro mapping: A macroprogramming implementation has to define
how a macro-program is executed, by the system as a whole, which entails defining a
macro-to-micro mapping logic—sometimes also known as
global-to-local mapping [Hamann
2010]. In other words, from a macroprogram, micro-level programs or behaviours are derived or affected (cf. Section
4.1).
Figure
1 shows the general idea of the approach, graphically.
4.2.1 On Micro-Macro and Local-Global Distinction.
The micro-macro levels and the local-global scales usually used as equivalent concepts to distinguish smaller elements/scopes and larger elements/scopes somewhat “containing” or “being implied by” the former. The micro-macro distinction [Alexander
1987] (sometimes also spaced out by an intermediate, or
meso, level) is typical in many scientific areas including social sciences, systemics, and distributed artificial intelligence [Schillo et al.
2000] (cf. multi-agent systems [Wooldridge
2009]). For the sake of programming, just like a system (as an ontological and epistemological element) can be defined according to a boundary condition [Mobus and Kalton
2014], the distinction between two dimensions, micro and macro, is similarly made through a design-oriented boundary or membership decision defining what belongs to one level or the other.
The intended meaning of macroprograms, and hence the ultimate goal of macroprogramming, seems to be related to the notion of
emergence [Holland
1998; Wolf and Holvoet
2004; Gignoux et al.
2017; Kalantari et al.
2020]. Gignoux et al. [
2017] use graph theory to provide formal definitions of macroscopic states and microscopic states, and characterise emergence by analysing the general relationships between microscopic and macroscopic states.
What can we say, in general, about the entities at the micro and macro levels in macroprogramming ? Micro entities have a computational behaviour, which may be autonomous (proactive), active, or reactive, and may or may not interact with other micro entities. So, for instance, data elements do not make for micro entities (they have no behaviour), whereas agents, actors, objects, and microservices do.
2 Regarding the macro level, we can distinguish between macro-level observables and macro-level constructs. A
macro-level observable is a high-level observation of the system behaviour—that is, a macro state as defined in the work of Gignoux et al. [
2017]—which is associated to the system as a whole and might be difficult to derive from micro state (the set of observations about the micro-level entities). The intended meaning, or goal, of a macroprogram is generally a function of macro-level observables over some notion of time. A
macro-level construct or
abstraction is, instead, a description that can be mapped down to affect the behaviour of two or more micro-level entities (cf. Section
4.1). Implementing such a mapping is the macro-to-micro problem of macroprogramming .
4.2.2 On Collectives.
Macroprogramming usually targets so-called
collectives—see Section
3. The term
collective derives from the Latin
colligere, which means “to gather together”. Typically [Masolo et al.
2020], a collective is an entity that gathers multiple
congeneric elements together by some notion of
membership. The term
congeneric means “belonging to the same genus”, namely, of related nature. In other words, a collective is a group of similar individuals or entities that share something (e.g., a trait, a goal, a plan, a reason for unity, an environment, an interface) which justifies seeing them as a collective, overall. A group of co-located workers, a swarm of drones, the cells of an organ are examples of collectives, whereas a gathering of radically different or unrelated entities such as cells, rivers, and monkeys is not, intuitively. Being congeneric, the elements of a collective generally share goals and mechanisms for interaction and hence collaboration. The differences among the elements, often promoting larger collective capabilities by collaboration, may be due to genetic factors, individual historical developments, and the current environmental contexts driving diverse responses on similar inputs.
Heterogeneous collectives also exist (e.g., aggregates involving humans, autonomous robots, and sensors) and can be addressed by macroprogramming [Scekic et al.
2020]. However, heterogeneity tends to complicate macroprogramming by posing more importance on individuals’ perspectives or widening the macro-to-micro gap—see Section
6.4.3 for a discussion.
4.2.3 On Declarativity.
A typical aspect of macroprogramming is
declarativity.
Declarative programming [Lloyd
1994] is a paradigm which focusses on expressing what the goal of computation is rather than how it must be achieved. Common and concrete aspects of a computation that can be abstracted away include the order of function evaluation (cf. functional programming), proving theorems from facts (cf. logic programming), and the specifics of data access (cf. query plans in databases and SQL). The general idea is to provide high-level abstractions capturing system-wide concerns by making assumptions promoting convenient mapping to component-level concerns. As such assumptions tend to be specific to an application domain, macroprogramming languages typically take the form of Domain-Specific Languages (DSLs) [Beal et al.
2012].
4.3 What Macroprogramming Is (Not)
Programming essentially always deals with multiple interacting software elements, be them functions, objects, actors, or agents. Even though paradigms are more a matter of mindset and abstractions rather than a matter of strict demarcation, a demarcation issue may be considered to better delineate a (nevertheless, fuzzy) boundary of macroprogramming. Macroprogramming is often centred around macro-abstractions: informally, constructs that involve, in some abstract way, (the context, state, or activity of) two or more micro-level entities. For instance:
•
macro-statements (or macro-instructions), for imperative macroprogramming languages (e.g., “move the entire swarm to that target location”, or “update the WSN state history to record the current temperature of the area”);
•
macro-expressions, evaluating to a macro-value (e.g., “the direction vector of the swarm towards the target location”, “the mean temperature of the area covered by the network”);
Other examples of macro-abstractions can be found in Section
6.2.
Consider the following artificial Scala program:
The
swarm object provides a macro-abstraction over the set of underlying
robots. Indeed, such a code might be written to abstract from a series of low-level details: the obstacle avoidance behaviour of individual robots; the fact that robots of the swarm move collectively in flock formation; the way sensors and actuators perceive distances to other robots, obstacles, and acceleration, to control stability and speed of each moving robot. The intended meaning of the program may refer to macro-observables that may or not may accessible by the program (cf. side effects). The library code provides an implementation of the macroprogramming system. It maps the expressions of the user macro-program down to micro-level behaviour. Here, the macro-to-micro approach may be interpreted as an interaction mode (it is the running thread that interacts with the micro-level entities through the program control flow) or an execution mode (the macro-program is executed by the micro-level entities). This simplified example shows a macroprogramming language as a library/API within an existing host language, also called an
internal DSL; actual examples of internal macroprogramming DSLs include Chronus [Wada et al.
2010] and ScaFi [Casadei et al.
2020b].
Doing macroprogramming is very much a matter of perspective. If the micro-macro distinction we are considering is robots vs. a swarm, then the library code (lines 1–9), individually addressing each robot of the swarm with a specific instruction, is not macroprogramming, properly; vice versa, the user code (lines 11–16), addressing the swarm as a whole, does represent an example of macroprogramming . However, the library code could be considered macroprogramming under a micro-macro viewpoint of sensors/actuators vs. a robot.
4.3.1 Weak vs. Strong Macroprogramming.
In a nutshell, the central idea of macroprogramming is considering the
entire system as the abstract machine for the operations. Notice that adopting a centralised perspective to programming, where a centralised program has access to all the individual entities, is not generally sufficient for effective macroprogramming : there should typically be at least
one intermediate level of indirection,
3 where macro-operations turn into micro-operations. In the preceding example, while the library code can directly access the individual robots, the user code indirectly accesses them through the
swarm macro-abstraction.
Essentially,
directly feeding micro-operations to the micro-level entities or specifying the individual behaviours of the parts breaks the macroprogramming abstraction, or makes it
leaky [Spolsky
2004; Kiczales
1992]. This is one reason (in addition to limited emphasis on behaviour) for which, for example, formalisms for concurrent systems such as process-algebraic approaches [Baeten
2005], certain component-based approaches, and multi-tier programming [Weisenburger et al.
2020] are not generally considered macroprogramming . However, several approaches in the literature defined themselves as macroprogramming despite basically embodying merely a form of centralised programming. Some of these may provide some macroprogramming abstractions (e.g., an object from which individual entities can be dynamically retrieved) but would nevertheless appear as a
weak form of macroprogramming . We may consider the macroscopic stance as a degree, and hence define
strong macroprogramming approaches those where only macro-abstractions are provided. For demarcation purposes, we propose to call those centralised programming approaches that inherently adopt a macro-level, global perspective but directly address individuals through micro-level instructions as
weak macroprogramming or
meso-programming. Considering the “macro-ness” as a continuum, and hence admitting that languages can be “more macro” or “less macro”, allows us to be more comprehensive in these early stages.
4.3.2 Macroprogramming as a Paradigm.
Van Roy [
2009] defines a
programming paradigm as “an approach to programming a computer[-based system] based on a mathematical theory
or a coherent set of principles” (emphasis added). Van Roy classifies paradigms according to (i) whether or not they can express observable nondeterminism and (ii) how strongly they support state (e.g., according to whether it is named, deterministic, and concurrent). Also interesting is Van Roy’s view of computer programming as a way to deal with complexity (e.g., number of interacting components) and randomness (non-determinism) to make aggregates (unorganised complexity) and machines (organised simplicity) into systems (organised complexity). Macroprogramming effectively deals with aggregates, turning them into programmable systems.
We argue that the principles outlined in this section form sufficient ground for macroprogramming to be considered a paradigm, and hence aggregate multiple approaches under its umbrella. It is a paradigm in a way similar to
declarative programming [Lloyd
1994], which is “concerned with writing down
what should be computed and much less with
how it should be computed” [Finkelstein et al.
2003]. Then, paradigms like functional and logic programming are considered as more specific forms of declarative programming. As shown in Section
5, also concrete macroprogramming languages can adopt a specific paradigm (e.g., functional, logic, or object-oriented).
The notion itself of a paradigm has sometimes been criticised in teaching programming [Krishnamurthi and Fisler
2019] for its fuzziness and coarse grain, preferring epistemological devices like notional machines [Fincher et al.
2020]. However, our stance is that the notion of a paradigm may still be useful as a lens or perspective for observing, comparing, and relating several concrete programming approaches, and as a core notion around which researchers on disparate topics can self-identify and connect through shared terms and ideas.
4.4 Taxonomy
We propose to classify and analyse macroprogramming approaches according to the following elements, succinctly represented in Figure
2:
(1)
Target domain: This refers to the application domain explicitly addressed by a macroprogramming approach. It is relevant since domain-specific abstractions and assumptions are typically leveraged to properly deal with the abstraction gap induced by declarativity. The label “General” is used to indicate that an approach addresses distributed systems in general, whereas “Other” means that the approach addresses a specific domain different from the others.
(2)
Approach: We propose to classify macroprogramming languages according to the main approach they follow:
–
Control-oriented: Emphasis is on specifying control flow and instructions for the system.
–
Data-oriented: Emphasis is on specification of data and dataflow.
–
Space-time-oriented: Emphasis is on specification of spatial, geometric, or topological aspects and their evolution over time.
–
Ensemble-oriented: Emphasis is on specification of organisational structures as well as tasks and interaction between groups of components.
–
Ad-hoc: The approach is peculiar and cannot be easily related with the previous ones.
(3a)
Paradigm: The paradigm upon which macroprogramming abstractions are supported (the main one in case of multi-paradigm languages).
(3b)
Macroprogramming design: Elements characterising a particular macroprogramming language:
–
Micro level: The individual components and aspects that collectively make up the system.
–
Macro level: The system as a whole and its macroscopic aspects.
–
Macro-to-micro: The approach followed by macro-programs to affect micro-level behaviour. We distinguish four main modalities based on the discussion in Section
4.1: (i)
context, where global state, inputs, or node parameters are set; (ii)
interaction, where a process is used to orchestrate micro-level entities; (iii)
compilation, where the macroprogram is translated into the micro-programs; (iv)
execution, where the macro-program is executed by the micro-level entities according to some (ad-hoc) execution model.
–
Macro-goals: The objectives that macro-programs are meant to reach (typically, abstraction, flexibility, and optimisability—as a result of declarativity).
–
Macro-abstractions: The abstractions provided by a macroprogramming approach that are instrumental for achieving or capturing macroscopic aspects or goals of the system.
–
Micro-level dependency: The extent to which the macroprogramming language depends on micro-level components or aspects. We consider three levels: (i) Dependent (if micro-level elements are always visible), (ii) Independent (if micro-level elements are abstracted away), or (iii) Scalable (if micro-level elements can be abstracted away as well as accessed, in case).
Elements of this taxonomy integrate and are partially inspired by some perspectives of previous work covered in Section
7.
5 Macroprogramming Approaches
This section provides a survey of macroprogramming languages, which are analysed as per the conceptual framework of Section
4. The contributions are classified and organised as per the approach classes proposed in Section
4.4. A summary of the survey is provided in Table
2.
5.1 Control-Oriented Approaches
Control-oriented approaches emphasise an imperative macroprogramming style where control flow is specified and/or explicitly controlled for the system and instructions are issued to query or act on system components. This contrasts with data-driven approaches where control flow is a consequence of relationships among data. With control orientation, implicit or explicit sequences, conditionals, and loops may be used to describe what the macro-system or its components have to perform.
Representative Example: Kairos [Gummadi et al. 2005]. Kairos is a procedural macroprogramming language for WSNs that assumes loose synchrony and leverages eventual consistency to keep low overhead. The approach is
control-driven and
node-dependent—that is, nodes and node state are explicitly manipulated at the programming level. In Kairos, the programmer writes a centralised program expressing the global specification of a distributed computation, which is compiled to a node-specific program. Kairos exposes three main abstractions: addressing of arbitrary nodes (e.g., by names or iterators like
|node_list|), inspection of one-hop neighbour nodes (e.g., via function
|get_neighbors|), and remote data access at nodes (e.g., with expressions
|variable@node|). As an example, consider a simple self-healing hop-gradient computation—that is, an algorithm that makes each node in the system yield the corresponding hop-by-hop distance towards a root node [Audrito et al.
2017].
Concerning macro-to-micro mechanics and implementation, during the translation of the macro-program into node-level programs, references to remote data are expanded into calls to the Kairos runtime, a software component which is assumed to be available in every node of the system. Specifically, the Kairos runtime deals with managed objects (objects owned by a node that are to be made available to remote notes) and cached objects (local views of managed objects owned by remote nodes), through asynchronous hop-by-hop communication—contrast this with synchronous data access calls in Kairos programs. Issues at the middleware level include supporting end-to-end reliable routing and management of dynamic topologies.
5.2 Data-Oriented and Database Abstraction Approaches
Data-oriented approaches define the macro-level behaviour of a system in terms of goals and activities of data gathering and processing. Sometimes, this is taken to the extreme, considering the system as a kind of distributed database keeping spatiotemporal or aggregated data.
Representative Example #1: TinyDB [Madden et al. 2002]. TinyDB is a query processing system that considers a WSAN as a database. TinyDB supports an SQL-like language for expressing queries and actuations. A query looks like the following:
Therefore, the approach is fully declarative and the system must find itself a strategy to map the global goal to local behaviour of the sensor nodes. We remark that the
behaviour of the individual nodes is driven partly by the query-like macro-program and partly by a basic “execution protocol” (providing a structure for the emergence of global behaviour) which is the same for all the nodes. Nodes work in
epochs, corresponding to sampling periods, in a synchronised fashion. They sleep for most of the time; they wake up to sample sensors, gather neighbour data, process data, and send results to their parent node. This execution protocol is very similar to those used by other macroprogramming approaches, such as aggregate computing [Viroli et al.
2019], which is a paradigm for self-organising systems of agents.
Representative Example #2: Semantic Streams [Whitehouse et al. 2006]. Semantic Streams is a logic-based, declarative language for expressing semantic queries over WSN data. It builds on two main abstractions:
event streams and
inference units (processes on event streams). For instance, the following program can be used to query for and plot
|objectDetected| events in a given area across time:
The macroprogramming system implementation is based on service composition and embedding. The query planner builds a task graph to be deployed to individual nodes, which will dynamically instantiate services, resolve conflicts between tasks and resources, and execute the queries.
5.3 Space-Time-Oriented Approaches
Space-time-oriented macroprogramming approaches are those that leverage spatial and temporal abstractions to organise the behaviour of a system. These approaches work by defining ways to connect devices (or their data, activities, and interactions) to space-time locations or regions.
Reference Example #1: SpatialViews [Ni et al. 2005]. The SpatialViews approach works by abstracting a MANET into
spatial views (i.e., collections of
virtual nodes) of a configurable space-time granularity, that can be iterated on to visit nodes and request services. In detail, the model is as follows. A physical network consists of physical nodes. A physical node has a spatiotemporal location and a set of provided services. A virtual node is the digital twin of a physical node: its programming abstraction. A spatial view defines a virtual network over the physical network which is discovered and instantiated when iterated. Operationally, the system works by migratory execution of the program during iteration. The SpatialViews language is implemented as an extension to Java.
Space-time granularities are used to distinguish virtual nodes, which are visited once per iteration; instead, the underlying physical nodes might be visited more than once (e.g., because of mobility or after a quantum of time granularity). We remark that this work did not use any “macroprogramming”-like term to label SpatialViews, although clearly embracing the paradigm.
Reference Example #2: SpaceTime-Oriented Programming (STOP) [Wada et al. 2007], a.k.a. Chronus [Wada et al. 2010]. This WSN macroprogramming system exposes a spacetime abstraction to support collection and processing of past or future data in arbitrary spatiotemporal resolutions. Architecturally, it consists of a network of battery-powered sensors (where data is gathered) and base stations (where data is processed) linked to a gateway connected to the STOP server, which holds network data in the so-called
spatiotemporal database. Operationally, the system is implemented through mobile agents carrying data to the STOP server, which in turn updates the database:
event agents detect events and replicate themselves to move hop-by-hop towards a base station, where they finally
push data; by contrast,
query agents move across a spatial region to
pull relevant data. The STOP/Chronus language is an object-oriented, Ruby DSL enabling on-command and on-demand (event-driven) data collection and processing. An example, selected and adapted from Wada et al. [
2007], is the following:
This program queries data in space-time “slices” that abstract the data generation activity of the underlying collection of sensor nodes. Indeed, it focusses on a macroscopic perspective.
5.4 Collective Adaptive Systems and Ensemble-Based Approaches
Macroprogramming is also popular in the field of both multi-agent system engineering [Wooldridge
2009] and CAS [Ferscha
2015] engineering. CAS approaches are quite related to spatiotemporal approaches since CASs are often situated and space represents a foundational structure for coordination. In these approaches, it is common to consider large, dynamic groups of devices as first-class abstractions, which are commonly referred to as
ensembles,
collectives, or
aggregates. The general idea is to support interaction between (sub)-groups of devices by abstracting certain details away (e.g., membership, connections, concurrency, failure). With respect to the network abstraction and other macroprogramming approaches, the works focus more on addressing the specification of dynamic ensembles, do not take an explicit, spatial space, or are not limited to data gathering and processing.
Reference Example on CAS Programming: Aggregate Programming [Viroli et al. 2019]. Aggregate programming is a macroprogramming paradigm, founded on
field calculi [Viroli et al.
2019], for programming CASs. It builds on the
computational field abstraction, a conceptually distributed data structure that maps any device of a system to a value, over time. Then, macroscopic behaviour can be expressed in terms of a single program which manipulates fields through constructs for state management, neighbourhood-based interaction, and domain partitioning (i.e., the ability to run a computation on a subset of the system nodes). Aggregate programming is supported by languages such as the Scala-internal DSL ScaFi [Casadei et al.
2020b] and the stand-alone DSL Protelis [Pianini et al.
2015]. For instance, the problem of counting, in any device, the number of neighbour devices experiencing a high temperature can be expressed in ScaFi as follows:
where |foldhood(init)(acc)(f)| folds over the neighbourhood of each device by aggregating the neighbours’ evaluation of |f| through accumulation function |acc|, starting with |init|. The interesting aspect about aggregate programming is that it is possible to capture collective behaviour into reusable functions (from which libraries of domain-specific features can be defined) and compose functions “from fields to fields” to define increasingly complex behaviour. For instance, the following |channel| functionality reuses functions provided by the ScaFi library to build a minimum-width path field from a source to a destination device, which is—crucially—able to self-adapt to input changes (i.e., different source or destination) and topology changes (e.g., as devices move or leave the system).
Notice how this program abstracts from the individual devices at the micro level: such a |channel| function denotes a macro-level structure that is sustained by repeated computation and interaction from the underlying network of devices. In virtue of this flexibility, aggregate programming can be deemed a scalable macroprogramming approach as it retains the ability to address individual devices but provides tools for raising the abstraction level.
Reference Example for Ensemble-Based Programming: PaROS (PROgramming Swarm) [Dedousis and Kalogeraki 2018]. PaROS is a framework for programming swarms of robots. It proposes an
abstract swarm abstraction, implemented through a Java API, to promote swarm orchestration and spatial organisation. The API consists of functions for path planning, declaration of points of interest or spatial areas to be inspected, enumeration of the robots in the swarm, task partitioning, and setting handlers for detection events or robot failure. A program in PaROS looks like the following:
Many details regarding the coordination of the swarm are abstracted away. Therefore, PaROS promotes a multi-paradigm approach comprising elements from imperative, declarative, and event-driven programming.
5.5 Ad-Hoc Approaches
Ad-hoc approaches are those that make very peculiar assumptions on the programming model or on the underlying system.
For instance, in MBM (Market-Based Macroprogramming) [Mainland et al.
2004], a sensor network is programmed as a
virtual market. The nodes of the network follow a fixed behaviour protocol where they “sell” actions to get a profit. They choose actions according to a local
utility function that expresses a trade-off between the profit and the cost of performing the action.
Another example is WOSP [Varughese et al.
2020], an approach for swarm-level programming that requires minimalistic communication, inspired by two biological mechanisms: (i) scroll waves in slime mould and (ii) periodic light emission in fireflies. Each robot of the swarm follows a protocol where it is initially
inactive, listening for incoming pings; upon reception of a ping, it runs a “relay code block” and goes into an
active state where it emits a ping; after the emission of a ping, it goes in the
refractory state, where it does nothing, being insensible to pings, and finally turns back to the inactive state after a refractory period.
Other examples are given by languages for SDN, like NetKAT [Anderson et al.
2014], SNAP (Stateful Network Abstractions for Packet processing) [Arashloo et al.
2016]. These consider the network as “one big switch” [Kang et al.
2013] with state. The NetKAT language is based on KAT (Kleene Algebra with Tests) plus constructs for networking. Conceptually, a macro-program in these languages is a function of a packet and network state (represented through global variables) that produces a set of packets and a new network state as output. In practice, a program consists of the classical imperative constructs (assignment, conditionals, loops) which are however interpreted in the SDN domain. The compiler translates the macro-program into micro-programs for the network devices dealing with
traffic routing and
placement of state variables.
7 Related Work
This work integrates, extends upon, and differentiates with respect to other survey papers. The main difference is that the secondary studies presented in the following, while similarly rich and detailed, adopt a narrower perspective (spatial computing, WSN, microelectromechanical systems, and swarm robotics, respectively). By contrast, this survey aims to relate various macroprogramming approaches across disparate domains and adopts a general software engineering viewpoint. Moreover, due to their publication time, other surveys only cover works published before 2012. Indeed, by analysing the 20-year time frame from early 2000s to 2020, we can also make considerations about trends (see Section
3).
The most related survey is that of Beal et al. [
2012], which focusses on
spatial computing languages. It proposes a conceptual framework where spatial computation can be described in terms of constructs for (i) measuring space-time (sensors), (ii) manipulating space-time (actuators), (iii) computation, and (iv) physical evolution (inherent spatiotemporal dynamics). The device model accounts for the way devices are discretised in space-time (distinguishing between discrete, cellular, and continuous models), the way they are programmed (e.g., by giving them a uniform programs, heterogeneous programs, or leveraging mobile code), their communication scope (e.g., through local, neighbourhood, global regions), and their communication granularity (e.g., unicast, multicast, or broadcast). The survey classifies languages in the following groups: (i) amorphous computing (including pattern languages and manifold programming languages), (ii) biological modelling, (iii) agent-based modelling (including multi-agent and distributed systems modelling), (iv) WSNs (distinguishing between region-based, dataflow-based, database abstraction-based, centralised-view, and agent-based languages), (v) pervasive computing, (vi) swarm and modular robotics, (vii) parallel and reconfigurable computing (including dataflow, topological, and field languages), and (viii) formal calculi for concurrency and distribution (i.e., process algebras/calculi). Languages are further analysed based on characteristics of the language (type, DSL implementation pattern, platform, layers), supported spatial computing operators, and abstract device characteristics. Language type ranges over functional, imperative, declarative, graphical, process calculus, and any.
Very related is also the work of Mottola and Picco [
2011], a survey that covers programming approaches for WSNs. In their taxonomy, the
interaction pattern is classified into (i) one-to-many, (ii) many-to-one, and (iii) many-to-many. Moreover, the extent of distributed processing in space can be (i) global (e.g., in environment monitoring applications) or (ii) regional (e.g., in intrusion detection or HVAC systems in buildings). Other dimensions include goal (sense-only or sense-and-react), mobility (static, mobile), and time (periodic or event-driven). Regarding WSN programming abstractions, they define a taxonomy as follows.
Communication aspects cover scope (system-wide, physical neighbourhood-based, or multi-hop group), addressing (physical or logical), and awareness (implicit or explicit).
Computation aspects include scope of computation (local, group, or global). The
model of data access could be database, data sharing, mobile code, or message passing. Finally, the
paradigm could be imperative (sequential or event-driven), declarative (functional, rule-based, SQL-like, special-purpose), or hybrid.
The review by Brambilla et al. [
2013] of swarm robotics from an engineering perspective neglects the programming viewpoint. However, they provide a taxonomy where
collective behaviour is classified into behaviour for (i) spatial organisation (e.g., pattern formation, morphogenesis), (ii) navigation and mobility (e.g., coordinated motion and transport), (iii) collective decision making (e.g., consensus achievement and task allocation), and (iv) other.
Design methods are categorised into behaviour-based (e.g., finite state machines, virtual physics-based) and automatic (e.g., evolutionary robotics and reinforcement learning-based methods).
Analysis methods are categorised into microscopic models, macroscopic models (e.g., via rate/differential equations or control theory), and real-robot analysis.
Finally, certain works proposed concepts useful for classifying and understanding macroprogramming approaches. These elements have been considered and integrated into the taxonomy provided in Section
4.4. A possible classification of macroprogramming approaches [Choochaisri et al.
2012] distinguishes between
(1)
node-dependent macroprogramming, where the nodes (or, more generally, the components of the micro-level) and their states are referred to explicitly by the macroprogram, and
(2)
node-independent macroprogramming, where the underlying nodes are not visible at all to the programmer.
As per the discussion of Section
4.3, node-dependent approaches tend to enact a weak form of macroprogramming . Examples of node-independent approaches include, for example, those that abstract a WSN as a database. Another distinction can be made between
(1)
data-driven macroprogramming [Pathak and Prasanna
2010], where macro-programs define tasks consuming and producing data, and
(2)
control-driven macroprogramming [Bakshi and Prasanna
2005], where macro-programs specify control flow and instructions operating on distributed memory.
The classification in data-driven and control-driven approaches has been applied in other fields such as coordination [Papadopoulos and Arbab
1998], where the latter are also known as
task- or
process-oriented coordination models.
8 Conclusion
For the first time, we provide an explicit and integrated view of research on macroprogramming —the paradigm aimed at expressing and executing the global behaviour of systems of computational entities. The article discusses what macroprogramming is per se, its core application domains, and its main concepts, and analyses and classifies a wide range of works addressing system development by a more-or-less macroscopic perspective. Thus, it provides a more general, comprehensive, and up-to-date coverage of macroprogramming with respect to previous works, which covered it in the context of engineering approaches for WSNs [Mottola and Picco
2011], spatial computing systems [Beal et al.
2012], and swarm robotics [Brambilla et al.
2013].
We argue that a macro-level stance could be beneficial for software engineering especially in forthcoming distributed computing scenarios (cf. swarm robotics, large-scale CPSs, the IoT, and smart cities), and for promoting language-based solutions to collective adaptive behaviour and intelligence [De Nicola et al.
2020]. Indeed, for the
collective computing revolution [Abowd
2016] to fully unfold, there will be needed tools to harness the complexity of large ecosystems involving machines as well as humans [Hendler and Berners-Lee
2010]. In particular, the macro-level perspective could represent a
complementary viewpoint for addressing structure, behaviour, and interaction in complex socio-technical systems. However, macroprogramming comes with peculiar challenges, at the border of science and engineering, such as those related to “steering emergent behaviour” (i.e., promoting desired emergents while avoiding undesired emergents [Schmeck
2005]), “guiding self-organisation” [Prokopenko
2014], promoting collective intelligence [Suran et al.
2020], and, in general, formally expressing global/system-level intents, and mapping those to micro-level instructions—possibly with guarantees.
We suggest that macroprogramming can be considered as an
abstract paradigm (e.g., similarly to the notion of declarative programming), for it conveys a distinguishing perspective to programming and a coherent set of principles (cf. Section
4). Then, concrete macroprogramming languages can adopt specific programming paradigms (e.g., imperative, functional, logic, or object-oriented), approaches (e.g., control-, data-, space-time-, and ensemble-oriented), and mechanisms (e.g., first-class groups, collective communication interfaces, distributed state/data structures, etc.). Macroprogramming languages tend to be domain-specific (e.g., addressing data collection and transformation in WSANs, or behaviour and actuation in robot swarms), since domain assumptions are generally instrumental to properly and efficiently map high-level abstractions to activity on the low-level platform. However, there is arguably margin for recovering
general principles through inter-domain discussion and sharing of ideas, but this would require a more integrated and structured view of macroprogramming as a field, which this article aims to cultivate.