1 Introduction

Privacy by design is a conceptual principle that regards data privacy as critical in software development [1, 2]. The central assumption is that privacy by design is an effective way to integrate data privacy principles and regulations (e.g., GDPR [3] or CCPA [4]) into a system’s design, development, and related processes, such as risk management [5]. Nevertheless, even though developers acknowledge the relevance of data privacy and privacy by design, they find the concepts difficult to translate into their practices [1]. This happens for various reasons, including lack of familiarity with privacy principles, primary focus on functionality, conflicts between privacy and functionality, and lack of guidance. This suggests that privacy by design has not yet fully fulfilled its primary purpose as the essential factor of closing the gap between regulation and implementation.

Generally speaking, design concerns the cyclic process from identifying goals and constraints to abstract problem definition, abstract solution development, and validation in concrete settings [6]. This cycle is repeated until the solution is considered satisfactory [7]. To implement data privacy, developers must integrate privacy goals and constraints into the software design process. From a socio-technical perspective, it has been suggested that enactments of privacy by design should become more material, using standard artifacts and templates [8]. This allows developers to increase confidence in their design decisions. The current research responds to this identified need with a decision framework that provides guidance and supports design decisions in implementing data privacy using microservices.

Microservices architectures have significantly changed the software design process [9, 10]. Microservices are implemented and operated as small, independent systems that give access to functionality and data through a well-defined interface. Using microservices, developers can break down complex systems into small, loosely coupled, and conceptually simple parts [11]. Each part is independently developed and tested, usually using DevOps practices and tools, which accelerate development and support continuous deployment and redeployment [12]. Front-end (user-facing) and back-end (resources and processing) functionality can be broken down into microservices [13]. The result is a significant increase in development flexibility and agility, which explains why microservices are so popular nowadays [14, 15].

This study analyzes privacy by design through a microservices architectural lens. We consider three qualities brought by microservices that help build privacy by design: decentralization (of privacy protections), packaging (of functionality and data), and guarding (of perimeters, zones, and components). A decision framework is proposed that helps developers make critical design decisions regarding the implementation of data privacy at the architectural level.

In summary, the argument put forward by the present study articulates the following key points:

  1. 1.

    Setting: The primary purpose of privacy by design is bringing data privacy into software systems through design;

  2. 2.

    Problem: So far, privacy by design has been challenging to translate into implementation practices;

  3. 3.

    Viewpoint: Adapt privacy by design to the qualities brought by microservices at the architecture level;

  4. 4.

    Solution: A decision framework that provides guidance and supports design decisions in the implementation of data privacy using microservices;

  5. 5.

    Contribution: Close the gap between regulation and implementation through design.

The paper is organized in the following way. First, we provide background information regarding privacy regulation, privacy by design, and microservices. This is followed by a discussion of the adopted research approach based on a systematic literature review. Then, we synthesize the review results. A detailed discussion of the framework construction follows this. The framework is then illustrated using a case study related to electronic voting. Finally, we discuss the study contributions and provide some concluding remarks.

2 Background

2.1 Regulation

Privacy and data protection regulations are among the most fashionable types of law worldwide. EU [3], US [4, 16, 17], Canada [18], Brazil [19], China [20], and India’s [21] laws are evidence enough that privacy and data protection in the digital space are universally established.

Independently of the nuances in each regulation’s motivations, some originating from a personal rights perspective, some designed for consumer protection, there are common concepts. First, they share an often implicit assumption that data protection is necessary for assuring privacy. A discussion of the differences between privacy and data protection is out of the scope of this study [22, 23]. So, we will use the terminology ‘data privacy.’ Wherever data privacy is mentioned, we assume data protection is the key issue discussed.

Second, regulations define privacy principles. GDPR adopts the following key principles [24]:

• Lawfulness, fairness, and transparency.

• Purpose limitation.

• Data minimization.

• Accuracy.

• Storage limitation.

• Integrity and confidentiality (security).

• Accountability.

These principles limit how companies collect and process personally identifiable information (PII) [24], with direct implications on the relationships with customers and ways into which PII is collected, processed, stored, and shared. As companies seek to comply with regulations, they urge software developers to embed privacy requirements into software systems and digital services [8].

Privacy requirements have to be handled by software developers at various development stages from analysis, (e.g., requirements elicitation and analysis) to design (e.g., user journeys, software services, application interfaces, systems architectures, and information structures) and coding (e.g., algorithms, components and features) [25]. In this study, we focus specifically on the software design stage.

Third, regulations generally define the individuals’ rights and relationships with the companies supplying digital services. These usually take a contractual nature in the form of “notice and consent” of the conditions stated by the supplying parties. The topic of consent is dense [26]. In this study, we limit ourselves to the primary requirements for user consent pondered at the design stage.

2.2 The gap between regulation and implementation

Prior research shows that software developers find it challenging to bring data privacy regulation into implementation [1, 5, 25, 27,28,29]. Identified reasons include the lack of familiarity with privacy principles, low priority given to privacy issues, being primarily focused on functionality, negative impacts of privacy on functionality, difficulties encoding privacy into software, and lack of guidance. This underlines a significant gap between more abstract or high-level principles, guidelines, protocols, and legal frameworks, and more concrete or low-level methods and technologies developers use to build software systems [30].

Furthermore, as responsibility for privacy permeates from the broader organization towards the individual software developer, there is an increasing understanding that more safeguarded, techno-regulatory, and collective approaches toward privacy implementation are necessary to increase the developers’ confidence in their own decisions [8]. In this study, we take an architectural view of this techno-regulatory problem, considering different privacy models at the design/architecture level.

2.3 Privacy by design

The concept of privacy by design has emerged as a foundation for closing the gap between regulation and implementation [31, 32]. It is based on the idea that data privacy should be the default setting of every software system instead of being an afterthought or a remediation. Design becomes the center stage for integrating privacy and business needs, technical requirements, problems, and solutions.

Privacy by design brings with it a new set of principles, which developers are expected to embrace [33]: being proactive and preventative, with privacy as the default setting, embedding privacy into the design, bringing privacy and legitimate business interests together, complete lifecycle protection, visibility and transparency, and respect for the user. Adopting these principles helps avoid privacy risks, comply with legal requirements, and engage with regulators, rights advocates, users, and other stakeholders [32, 34].

However, privacy by design also has the reputation of being challenging for developers [32]. This happens because the concept is aspirational. It is framed as an organizational rather than a technical goal [8]. It centers on attitudes and beliefs, lacking direct links to real-life development practices [27, 30, 32]. In particular, privacy by design is still distant from developers’ decisions at the systems architecture level [31]. Microservices raise the opportunity to address the problem at this level.

2.4 Microservices

Microservices have gained immense popularity in the last few years [11, 35]. They are recognized to increase flexibility and agility in software development, particularly when combined with new practices such as DevOps [14, 15, 36]. Microservices are characterized by dividing application functionality into many small, decentralized, and autonomous components coping with particular duties [9]. Duties involve dealing with a particular function or representing a specific resource [37]. In DevOps environments, microservices can be independently developed, deployed, tested, and continuously modified [15]. The functionality provided by a microservice may concern an application’s front-end or back-end. Several front-end and back-end microservices can work together to deliver micro-frontends, which can be regarded as semi-independent, reusable mini-applications [38].

Microservice architectures have several distinct features. In particular, the communication with (and between) microservices is standardized, using a lightweight request-response protocol [39]. Another distinctive feature is that microservices can be individually deployed and scaled on demand. Focusing on data, a relevant feature is decentralized data management [9]. Each microservice can be seen as a complete vertical solution, which provides its interface, service logic, and data storage [14]. Hence, to some extent, data privacy is the responsibility of individual microservices.

2.5 Qualities brought by microservices

Microservices architectures have some qualities that can benefit data privacy and privacy by design:

2.5.1 Decentralization

Microservices allow splitting functionality and data throughout fine-grained and loosely-coupled components, affording minimal centralized management, which can benefit data privacy [40, 41]. If a vulnerability is exploited, its reach could be limited to a few components [42]. Furthermore, each affected component is easier to analyze and review. Microservices also allow further splitting of functionality and data into multiple dimensions, applying concepts such as separation of concerns and single responsibility [37]. For instance, functionality can be separated into individual front-end and back-end microservices and between message routing and processing, thus separating user interaction from data processing and storage [40].

2.5.2 Packaging

Wrapping functionality and data within individual microservices can contribute to data privacy if microservices are attentive and protective [42]. Additionally, microservices can be placed into software containers to continuously monitor and control how microservices are deployed and used, detect intrusions, and analyze suspicious behavior [43, 44].

2.5.3 Guarding

Microservices can protect perimeters, zones, and even individual components [41, 45, 46]. Various data privacy features can be built into these microservices [47, 48].

In Fig. 1, we present a diagram that relates microservices qualities to privacy and privacy by design principles [24, 32]. The diagram was built by asking the question: can quality X positively impact the implementation of principle X? The exercise highlights the significant impact of microservices on data privacy.

Fig. 1
figure 1

Diagram relating microservices qualities to data privacy and privacy by design principles

2.6 Related work

Several studies have previously discussed how to close the gap between regulation and implementation in the scope of privacy by design and focusing particularly on architectural issues. We organize the related works into a set of categories and briefly discuss them.

2.6.1 Architectural principles and guidelines

The main concern in this category is translating privacy regulations into architectural principles and design guidelines. They address what to do with application data (e.g., hiding, separating, aggregating, and reducing granularity) and supporting processes (e.g., inform, control, and enforce) [2, 49, 50]. Ontologies and models have been proposed to organize these principles and guidelines [51,52,53]. However, principles and guidelines suffer from the “checklist approach,” directing developers more toward compliance than supporting decision-making [54].

2.6.2 Formal methods

The main focus is applying formal methods in architectural analysis. This requires creating a representation of the targeted architecture and, through formal mechanisms, analyzing if it satisfies certain privacy properties, such as data minimization and integrity [55,56,57,58,59]. Tools have been proposed to support the analytic process [60] and facilitate critical discussions [61]. However, there are significant limitations. One is the restricted set of properties addressed by these methods [55]. Another is that software development flexibility and agility may rapidly outdate these representations, reducing value and accountability over time [58]. Finally, applying these methods involves a significant effort, which has to be weighed against benefits [59].

2.6.3 Architectural tactics

Studies in this category propose goal-oriented architectural tactics, or design patterns, that can be contextualized to specific privacy requirements [62, 63]. Example tactics include batching data, delaying messages [63], and adding a privacy layer between the application interfaces (APIs) and the data layer [48].

2.6.4 Procedural models

This category considers procedural approaches for integrating privacy requirements into system architectures [30, 34, 62, 64,65,66]. They focus on how to analyze privacy requirements, perform impact assessments, and select adequate architectural tactics [62]. As such, this category complements the previous one with decision guidelines.

The present research concerns the last two categories, focusing on tactics and decisions that specifically apply to microservices architectures. This is a unique proposition, given that, as far as we could find, only one prior study has focused on microservices. The study by Mashaly et al. [48] discusses how to implement data privacy with microservices. However, it has significant limitations, proposing a one-size-fits-all privacy layer, thus overlooking the potential advantages of combining various architectural strategies. The present study addresses these limitations.

3 Research approach

A systematic literature review is the centerpiece of the selected research approach. Systematicity in literature reviews contributes to rigor in summarizing existing evidence and provides fairness and transparency regarding what is included and excluded from the review [67]. Consequently, systematicity also contributes to increasing the soundness of the design propositions extracted from the review. Next, we overview the research approach, followed by a detailed discussion of the review procedure.

3.1 Overview

The adopted research approach considers four steps (Fig. 2). The first step systematically searches microservices and data privacy literature to characterize the domain. The second step synthesizes the domain using a set of data privacy models. The third step is focused on decision guidance, for which a decision framework is developed. The fourth and final step aims to illustrate using the decision framework in a case study.

Fig. 2
figure 2

Research approach

This research approach follows a familiar pattern adopted by other studies in the domain. For example, the study by Ataei and Staegemann [45] reviews the literature on the use of microservices in big-data systems, synthesizes a set of big-data requirements and design patterns, and provides guidance for the design of big-data systems. The study by Al-Doghman et al. [43] reviews the literature on the use of AI models in microservices, synthesizes a set of technical approaches from the literature, and builds a framework that provides design guidance on how to secure microservices.

3.2 Review procedure

An initial search was conducted in SCOPUS (Fig. 3, left). This database was selected because it offers comprehensive coverage in computer science and provides quality control over peer-reviewed literature. The initial searches used the strings [‘microservices’ and ‘data privacy’] and [‘microservices’ and ‘data protection’]. These searches considered title, abstract, and keywords. This initial search identified 28 papers.

Fig. 3
figure 3

Review procedure

The 28 papers were analyzed in detail to identify new search keywords and conduct sensitive searches on the topic [68], turning the search process more inclusive. The selected new keywords were: personal identifiable information, sensitive data, trust, privacy by design, and GDPR. Further, an additional search was conducted considering the string [‘microservices’ and ‘privacy’] to avoid false negatives caused by a restrictive use of ‘data privacy.’ Fig. 3 (left) summarizes the results from these sensitive searches, considering the number of items returned from each search, new and duplicate items, and removed items. All removed items were editorials in proceedings, as they did not have research content. The set of sensitive searches increased the dataset from 28 to 163 papers.

As searching one single database may be limiting, additional searches were conducted in Google Scholar (Fig. 3, right). These searches used the strings [‘microservices’ and ‘data privacy’] and [‘microservices’ and ‘data protection’]. However, they contemplated anywhere in the article. Considering this more expansive scope and the broader coverage of Scholar, the results were much higher (2.500 and 2.820 papers, respectively) than those from SCOPUS. A cutoff line was defined, and only the first 50 results from each search were selected (ordered by relevance). These choices reflect a compromise between practicality (manually processing more than 5.000 items would be unfeasible) and the chances of finding relevant papers missing from the SCOPUS results.

The two Scholar searches produced several duplicate items, which were removed to identify 78 unique papers. The 78 papers from the Scholar dataset were merged with the 163 papers from the SCOPUS dataset. The identified duplicates were removed, resulting in a combined dataset with 220 papers. To avoid dealing with too many false positives, no sensitive searches were done in Scholar.

This procedure was supported by the Zotero reference management tool. The search results from SCOPUS were exported in the RIS format and imported into Zotero. The search results from Scholar were imported into Zotero using a browser plugin provided by Zotero.

The 220 papers were screened using a set of exclusion criteria. Only conference papers and journal articles were considered (excluding editorials, work-in-progress, posters, extended abstracts, theses, and non-peer-reviewed preprints). Papers in regional/local conferences and non-English papers were also excluded. Two references to unreachable abstracts were removed from the dataset. Finally, after analyzing the abstracts imported into Zotero, 45 papers were excluded for being unrelated to this research. Papers were excluded if they explicitly did not address the design of systems. For example, papers focused on education, visualization, hardware security, wireless networks, workflow management, high-performance computing, and cryptographic algorithms were excluded. The resulting dataset contained 156 papers.

The 156 papers were reviewed in more detail, considering the full text. Once again, a set of exclusion criteria was applied. Unreachable papers (8 items), papers of inadequate types (7 items, see Fig. 3 for details), and papers not in English (1 item) were removed. Furthermore, three additional exclusion criteria were considered:

  • Papers where microservices were not a significant concern. This included papers where microservices were adopted for implementation but not conceptualization (e.g., intelligent agents and edge resources) and cases where microservices were regarded as subsidiary to another concept (e.g., Internet-of-Things and containerization).

  • Papers where privacy was not a significant concern. For example, cases where the main focus was security, cloud computing, and AI.

  • Papers where design was not a significant concern. This included, in particular, research focused on algorithms, protocols, and computing platforms with a significant focus on implementation, testing, and performance.

While doing the second review, particular attention was committed to doing backward searches, where the references cited by papers in the dataset were checked for relevance [69]. Through this procedure, 18 additional papers were added to the dataset and reviewed in detail.

The resultant dataset comprised 77 papers, relatively equally divided between conference papers and journal articles (35 and 40 papers, respectively), plus two book chapters (Fig. 3, bottom-left). The distribution according to publication date (Fig. 3, bottom-left) indicates that the interest in the topic addressed by this study is recent (starting in 2016) and has been increasing. Next, we synthesize the results.

4 Synthesis from the literature

We identified seven data privacy models in the searched literature, listed in Table 1 according to their relative standing in the analyzed dataset (Annex A links the identified models to specific dataset items). Next, we characterize the distinctive features of the identified models.

Table 1 Identified data privacy models

4.1 Gatekeeper model

This model defines private and public zones separated by a gatekeeper (Fig. 4). If a consumer in the public zone requires data or functionality from the private zone, it has to go through the gatekeeper. The ‘gatekeeper’ term reflects a generic purpose. Depending on the specific functionality, gatekeepers are often designated as gateways and other times as proxies, protection layers/components, and data filters. The gatekeeper can be a microservice, a collection of microservices, or a software layer/component.

Fig. 4
figure 4

Gatekeeper model

Gatekeepers manage policies related to the interactions between public and private zones. For instance, Bugshan et al. [47] developed a gatekeeper that perturbs health-related data gathered by smart applications in the public zone from Internet-of-Things devices in the private zone.

4.2 Service model

The service model adopts a paradigm known as X-as-a-service, where a dedicated service provider implements functionality X. Privacy-as-a-service (PaaS) microservices oversee the exchanges between consumers in public zones and microservices and data in private zones (Fig. 5). PaaS microservices manage service descriptions and policies. They also manage local data stores with sensitive data in case the private zone is not trusted or data has to be filtered or perturbed [47].

Fig. 5
figure 5

Service model

Alic et al. [70] developed two PaaS microservices providing authentication-as-a-service and privacy-as-a-service. The former deals with authentication requests such as sign-in/out, token verification, password management, and authorizations. The latter protects sensitive data by applying anonymization policies to data exchanges, integrations, and analytics.

4.3 Enclave model

The enclave model relies on trusted software enclaves (e.g., Docker Containers) to isolate and secure the boundary of specific microservices (Fig. 6). Enclaves black-box the internal activities of isolated microservices. Enclaves rely on event-based policies to protect microservices. Enclaves can be deployed on untrusted software, e.g., cloud storage and software developed by third parties.

Fig. 6
figure 6

Enclave model

Fetzer et al. [71] developed two enclaves, one that forces microservices to use a secure message bus for communication (with message encryption) and another that secures the storage of sensitive data in untrusted environments.

4.4 Endpoint model

This model is focused on the endpoints provided by microservices to consumers (Fig. 7). Microservices have APIs with multiple endpoints. Each endpoint interacts with the microservice’s internal resources through CRUD (create, read, update, and delete) operations. Since CRUD operations can expose sensitive data, access to specific endpoints may have to be protected using software techniques. These software techniques grant or deny endpoint access based on defined access policies. Policies can use attributes related to the types of data, operations, use cases, and consumers [72].

Fig. 7
figure 7

Endpoint model

Preuvenners and Joosen [73] developed a feature-toggle approach that restricts or allows endpoint access based on policies and dependencies that can be dynamically determined. The dependencies between consumers, requested resources, and operations are checked before access to the endpoint is granted.

4.5 Mesh model

The mesh model fully embraces the idea that microservices should be fine-grained and loosely coupled, where each microservice performs a small, clearly defined, and manageable function [43]. According to this model, data privacy is implemented by a mesh of privacy-preserving microservices standing between application microservices operating as requestors and responders (Fig. 8).

Fig. 8
figure 8

Mesh model

As privacy-preserving microservices stand between application microservices, they can protect specific interconnections and operational workflows. Furthermore, given that privacy-preserving microservices are single-purpose, they can protect specific functions or data.

Privacy-preserving microservices share policies. Policies can contemplate a variety of attributes, including contextual information such as reasons for requesting data, allowed usage, and circumstances [74]. For instance, a particular case could be the location of an application microservice, either locally, in a server, or in the cloud.

Alanezi and Mishra [75] demonstrate the use of this privacy model in Internet-of-Things applications, where a set of system microservices (privacy-preserving) administrate the interactions between application microservices and sensor and actuator microservices. The system microservices may require information about the currently running application microservices to determine what policies to apply.

4.6 Federation model

The federation model adopts the concepts of federation and syndication, where a set of independent entities are integrated, represented, and controlled by a single entity [76]. Federated microservices communicate and share information freely to accomplish their mutually agreed goals [77]. However, their interactions with the outside world depend on the federation microservice, which syndicates the applied policies (Fig. 9).

Fig. 9
figure 9

Federation model

Bhargava et al. [78] developed a system adopting the federation model. The system implements a federation microservice (central monitor) that manages a set of microservices according to domain characteristics, such as running on a mobile device. The central monitor intercepts service requests to federated microservices and performs other unified roles, like anomaly detection.

4.7 Sidecar model

The sidecar model has similarities with the enclave model, as both seek to protect single microservices. However, while the enclave model wraps around microservices through a secure software component, the sidecar model puts that component alongside the microservice.

Sidecars protect microservices from the inside rather than the outside (Fig. 10). Consumers send their requests to the microservice, which communicates with the sidecar for policy checking. By operating alongside the microservice, the sidecar can closely interact with the microservice, therefore applying highly localized and contextualized policies, which can consider the consumers’ requests, the requested operations, and the sensitivity of operations and data. Sidecars operate independently alongside microservices; they can assume that microservices are not trustworthy [79].

Fig. 10
figure 10

Sidecar model

Meadows et al. [79] developed a system that uses two types of sidecars. One type checks the incoming requests (ingress sidecar), while the other type (egress sidecar) checks the post-processing activities of a microservice.

4.8 Discussion

The literature review highlights the wide variety of data privacy models brought by microservices. This has been made possible by decentralization (of privacy protections), packaging (of functionality and data), and guarding (of perimeters, zones, and components). Together, they give developers plenty of options. More so if we consider that developers can combine multiple models in their implementations, this allows developers to implement hybrid solutions. Since microservices operate independently and use common communication standards, they do not pose constraints to hybridization [80]. The “one size fits all” adage does not apply in this area.

Even though developers have many options, a comprehensive decision framework helping developers adopt microservices for privacy protection is missing. Next, we develop such a framework.

5 Decision framework for the implementation of data privacy

The decision framework is derived from the literature review using a set of logical considerations detailed below.

5.1 The granularity of privacy protections

According to the Cambridge Dictionary, granularity is the quality of including many small details. The set of privacy models synthesized from the literature review shows that data privacy can be defined and implemented at different granularity levels. For instance, the gatekeeper model has low granularity because it uniformly protects an undefined, potentially sizable number of microservices. As such, no details related to individual microservices are considered. On the other hand, the endpoint model has high granularity because it targets the attributes of specific microservices, endpoints, and consumers. Therefore, granularity is crucial when deciding what privacy approaches to adopt for a system.

In Fig. 11, we align the identified models according to granularity. The gatekeeper model is regarded as the least granular because protections are defined to manage a sizable number of private microservices uniformly. Next, we find the service model. It is more granular because each PaaS microservice can be specified to address more specific details of managed microservices. The federation model comes next. Its granularity increases because the federation microservice targets a small set of microservices, albeit not considering the details of what happens inside the federation.

Fig. 11
figure 11

Granularity dimension

The mesh model appears in the middle of the scale. Its granularity increases because the mesh targets specific microservices and their interactions. The enclave model follows this. It has higher granularity because it targets the external details of individual microservices. The sidecar model has increased granularity because it targets individual microservices’ external and internal details. Finally, we have the endpoint model, which has the most granularity. This occurs because it targets individual microservices, specific endpoints, and external and internal details.

5.2 Versatility of privacy protection

According to the Cambridge Dictionary, versatility is the ability to change easily or to be used for different purposes. We can also regard the synthesized privacy models from this perspective. To do this, we contrast comparable models (Fig. 12). The gatekeeper model is similar to the service model, as both are intended to manage a large number of microservices. However, the service model offers more versatility, as it can break down data privacy into independent services. That is, there is more separation of concerns.

Fig. 12
figure 12

Versatility dimension

The federation and mesh models are comparable approaches that seek to manage fewer microservices. However, the mesh model is more versatile than the federation model because a single entry point does not constrain it and allows mesh reconfigurations without impacting the privacy-preserving or application microservices.

The enclave and sidecar models have similarities, which can be compared in terms of versatility. Both rely on a software component that protects individual microservices. However, the sidecar model white-boxes those protections, while the enclave model black-boxes them. The white-box approach is more versatile, as it provides more options.

Finally, we do not compare the endpoint model to other models since the focus on endpoints is an exclusive characteristic of this model.

5.3 Putting together the granularity and versatility dimensions

We now bring together the two discussed dimensions (Table 2). We categorize the identified privacy models using three granularity categories: low, medium, and high. Using categories rather than a continuous classification of models (as in Fig. 10) simplifies the decision-making process. Fine-grained decisions do not have to be made about the granularity dimension. They can be made considering the combination of granularity and versatility dimensions.

We consider the two dimensions of versatility already discussed, high and low. This arrangement highlights that gatekeepers and PaaS microservices provide low granularity. The choice for one or the other depends on versatility: implementations requiring more versatility should adopt PaaS microservices; if requiring less, then gatekeepers should be adopted.

Table 2 Arranging privacy models according to granularity and versatility dimensions

Overall, the arrangement of models in these dimensions highlights that to implement data privacy, developers should make two sequential decisions: (1) decide what level of granularity is required; and (2) decide what level of versatility is desired. With these two decisions, the most adequate privacy model to implement is identified.

5.4 Implementing multiple defensive lines

There is no particular reason for implementing one single privacy model. Various models can be integrated to reinforce data privacy. For instance, the gatekeeper model can be adopted to define a private zone, and then, within that zone, multiple federation microservices can be used to protect specific groups of microservices. Indeed, the adoption of multiple privacy models increases data privacy. The reason is supported by the widely acknowledged Swiss Cheese Model (SCM) developed in the safety domain [81]. SCM posits that systems with fewer barriers are more vulnerable than systems with more. SCM explains metaphorically that accidents happen when unsafe events find a trajectory through a series of holes in these barriers (cheese slices). Similarly, systems with fewer privacy defensive lines are more vulnerable than systems with more defensive lines [82, 83].

From Table 3, three defensive lines emerge. The first line operates at low granularity, for which gatekeepers or PaaS microservices can be used. The second defensive line operates at medium granularity, based on federation microservices or privacy-preserving meshes. Finally, the third defensive line operates at high granularity, based on enclaves, sidecars, or endpoints. Table 3, in combination with the models synthesized from the literature review, provides the conceptual infrastructure for integrating multiple defensive lines.

5.5 Considering transparency in the implementation of privacy protections

Transparency is a widely accepted principle of data privacy. It gives individuals the right to understand how their data is managed [1, 30, 84]. The categorization of privacy models in Table 2 provides a foundation for communicating critical privacy by design features in a simple way. In particular, Table 2 can be used to convey how data is handled by a system at the architectural level using the designations adopted for the different privacy models as “nutrition labels” [84].

Table 3 provides such a template. It identifies data sets relevant to the users (which can differ from how they are structured using microservices) and the privacy models adopted to protect them. The granularity categories are presented to the users, indicating the level of detail adopted by privacy protections. Furthermore, having multiple categories also helps communicate whether various defensive lines have been used. The versatility dimension is not considered, as it concerns an internal design decision.

Table 3 Template for using privacy models as “nutrition labels”

5.6 User consent and policy management

Consent is another widely accepted principle of data privacy. It provides individuals with a right to manage privacy [84]. Policies are the centerpiece for defining user consent. They are applied by different components, as specified by the privacy models (Figs. 4, 5, 6, 7, 8, 9 and 10). They also address a variety of attributes, which depend on the model’s granularity. Policies can be applied at a general level, as defined by the gatekeeper and service models. They can also be used at a collective level, as specified by the federation and mesh models. Finally, they can be applied at a component level, as in the enclave, endpoint, and sidecar models. The template shown in Table 4 can be used to delineate how users can intervene at these different levels to provide consent and manage policies.

Table 4 Template for defining privacy policies

5.7 Decision framework

Finally, we synthesize a decision framework integrating the abovementioned elements. The framework defines several decision steps (Fig. 13).

Fig. 13
figure 13

Decision framework

First, developers must make a critical decision about the granularity of privacy protection. Low granularity is easier to manage; however, failures in low-granularity elements may lead to significant privacy breaches. On the other hand, high-granularity elements afford more contextualized and contained protections at the cost of additional complexity. The second decision concerns the number of defensive lines to implement. Having more defensive lines decreases the overall risk of the system at the cost of additional complexity.

The third decision concerns versatility. Developers must decide whether to implement high or low-versatile privacy models for each selected granularity level. This depends on the expected frequency of changes to the system. Even though microservices architectures afford frequent changes, an associated privacy protection cost must be considered. Continuous changes in the system require continuous privacy assessment and adaptation. Therefore, a decision has to be made on whether it is better to adopt more versatile privacy models immediately or later. Since microservices are decentralized and autonomous, changes can be made continuously. Therefore, selecting a particular privacy model to protect specific data or functionality should not be considered fixed. An increased frequency of changes may lead to a shift from low to high versatile models.

After selecting the privacy models, developers must consider how to communicate their choices to the users. This involves identifying the managed data sets and selected privacy models, including indicating adopted granularity levels and defensive lines. Finally, the consent and user-defined policies must be specified for entities providing privacy protection. According to the defined privacy models, entities include individual microservices, collections of microservices, or software components and techniques.

6 Example

We illustrate the use of the framework with an example. The example considers an electronic voting system (Fig. 14). The system comprehends several application microservices divided between front-end and back-end. Each microservice is responsible for specific data, including the electoral registry, list of candidates, choices, ballots, and voting results.

Fig. 14
figure 14

Example: electronic voting system

The voting process starts with voter registration. The registration page interacts with the voter while the registrar checks whether the voter can participate in the poll and records whether the voter has recorded a choice. Then, the process proceeds with the actual voting. The poll manager delivers the list of candidates (e.g., local, regional, and national) to the voting page. The voting page displays the list and allows the voter to choose. The recorder saves the choice while the controller adds the choice to the ballot box. The counter tallies the votes. Finally, the reporter delivers the results.

Figure 14 only shows one instance of each microservice. However, the system is expected to deploy multiple instances. For example, multiple registration and voting pages are needed for parallel voting; the controller and counter could be instantiated locally or nationally to increase performance and resilience.

Regarding this system, we consider the following data privacy requirements:

  • R1: The integrity of the vote must be assured, and the voting process should be resilient.

  • R2: Only registered voters can vote; the voter is irreplaceable and can only vote once.

  • R3: The vote must be kept secret, i.e., the link between registry and choice must be kept anonymous.

  • R4: Any direct or indirect links (e.g., time stamps) between the registrar, recorder, and controller should be removed to avoid breaking anonymity.

  • R5: The voting page should not display the voter’s final choice (to avoid coercing users to take photos).

Developers must make several design decisions regarding these requirements. The first decision concerns granularity. Given the societal sensitivity associated with electronic voting, granularity has to be high. To assure the integrity of the vote (R1), all application microservices can be protected using enclaves like Docker Containers. Enclaves can provide general access control tailored to the voting process: the registrar can only communicate with the registration and voting pages; the poll manager can only communicate with the voting page and recorder; and so on.

Another decision concerns the adoption or not of multiple defensive lines. Given the risks associated with electronic voting systems, implementing all three defensive lines considering low, medium, and high granularity is a sensible choice. The system can implement a gatekeeper, which only gives access to users with unique, valid, and single-use access keys to the election (R2). This provides a first line of defense for the voting process.

A second defensive line may be implemented using federation services and privacy-preserving meshes. A federation service can hide the real information displayed by the registration and voting pages, e.g., through data masking or obfuscation [45] (R5). A privacy-preserving mesh between the registrar and the recorder is necessary to anonymize the link between the voter’s registry and choice (R3). Another privacy-preserving mesh is required to avoid traceability between the registrar, recorder, and controller, e.g., through segmentation or tokenization [85] (R4).

Developers also need to consider the versatility of the voting system. Voting processes do not change frequently. Therefore, the preferred choice is for low-versatility models. This explains the preference for using a gatekeeper instead of PaaS microservices to implement R2. It also explains the preference for a federation microservice to implement R5. However, regarding R3 and R4, privacy-preserving meshes are preferred to federation microservices. This is because the mesh model is more versatile than the federation model in providing the required protections. While the federation microservice would offer a single interface to communicate with the registrar, recorder, and controller, mesh microservices can combine a variety of protections.

The final decisions concern transparency, user consent, and policy management. Regarding transparency, voters should be aware of four data sets: all process data related to voting; private data displayed on pages; the individual votes, including links to registry and choice; and the vote workflows, considering the interactions between registrar, recorder, and controller. The adopted granularity and selected protections should be disclosed for each data set. Considering user consent and policy management, we note that voting processes are peculiar because not much control is usually given to users. The capacity to revoke a vote after it has been cast is impossible in paper-based elections and is not adopted in electronic voting. In this example, users can only revoke unused credentials.

Figure 15 overviews the various decisions to implement data privacy in the voting system. This simplified example illustrates how developers can break down an application into a set of microservices and then add multiple layers of privacy-preserving microservices to ensure privacy requirements.

Fig. 15
figure 15

Decisions made for the voting system

7 Discussion

7.1 Privacy models

One key discussion point concerns the most direct outcome from the literature review, i.e., the privacy models. The identified models define different ways privacy protection can be built into a system using microservices. Each model highlights what is protected, how microservices can provide protection, and how to manage privacy policies. Even though these models have individual traits, they are defined from a common viewpoint. Therefore, the value brought by the privacy models extends beyond the obvious, i.e., a list of options. What they provide is more substantive: a comprehensive blueprint for architecting systems based on microservices.

A system designed using this set of models can have higher specialization/division of labor in satisfying privacy requirements and express a signature of compounded functionalities that can be systematically mapped to technical requirements and principles defined by privacy regulations. These models are not contingent on specific regulations, as privacy principles are addressed from an architectural perspective using three qualities associated with microservices: decentralization, packaging, and guarding. These models are also not contingent on technical requirements and technologies, as these concerns are built into individualized microservices.

7.2 Framework

Another critical discussion point concerns the framework. The framework organizes the identified privacy models using two criteria: granularity and versatility. These two criteria bring together systems architecture and data privacy. They help consider data privacy an integral aspect of systems architecture, which fulfills the privacy by design paradigm. They also help assess the variety of choices offered by privacy models closer to software development, as privacy protections are built into specific system components.

Interestingly, the notions of granularity and versatility concern both the privacy and software viewpoints. From a privacy viewpoint, granularity helps consider which defensive lines a particular system requires, and versatility allows one to assess what will happen if the technical and privacy requirements change. From a software viewpoint, granularity helps to break down data and functionality, and versatility allows one to manage changes.

The framework also helps developers deal with transparency and consent, which are present in most privacy regulations [30]. Developers should communicate design choices impacting privacy, indicating what specific data is managed, which privacy models have been adopted to protect that data, the granularity of selected protections, distinct entities providing those protections, and adopted user consent policies. This level of detail is not necessarily overwhelming or obfuscating for the users. The advantage offered by the framework is taking a design perspective over transparency and consent, where problems and solutions are abstracted at an intermediate level of detail. The framework can be tailored to address communication and visualization requirements [84].

A greater awareness by the users about the developers’ architectural choices may reinforce trust and consent. The trust required for users to consent in data processing emerges from the evidence requirements imposed on developers (see, for example, GDPR [3] Article 24). The proposed framework can help increase users’ awareness of architecting and protecting systems. From this point of view, we may say that microservices are a society of software components, some trustworthy while others not. The proposed framework, along with privacy models, highlights these relationships. In this society, some (if not all) microservices play a role in protecting privacy. Trustworthiness depends on how this society operates as a whole.

7.3 Architecture-level regulation

Regulations on data privacy have evolved from the pronunciation of principles to the privacy by design provision [8, 28]. As the concept of privacy by design develops, regulations are expected to evolve accordingly. This study explores the next step in privacy by design, which goes beyond the enunciation of principles toward architecture-level decisions. Therefore, it may be argued that regulation should evolve towards pronouncing architecture-level rules. This could be accomplished using the framework developed in this study.

To illustrate this point, let us consider GDPR [3]. Article 28 asserts that “the controller shall use only processors providing sufficient guarantees to implement appropriate technical and organizational measures [to] ensure the protection of the rights of the data subject” and also that “[p]rocessing by a processor shall be governed by a contract or other legal act […] that is binding on the processor with regard to the controller […].” These enunciations do not exploit microservices’ architectural characteristics, particularly decentralization, packaging, and guarding. A microservices-savvy enunciation would state that sensitive microservices should be packaged and guarded at low, medium, and high granularity; each sensitive microservice should provide mechanisms for data subjects to manage privacy policies. The pronunciation should also indicate that sensitive data should be decentralized and packed into single-purpose microservices. This pronunciation is closer to development and embraces some design-level concerns revealed by microservices, such as developing systems that use microservices from multiple independent sources and protecting the interactions between microservices (which is imprinted in the guarding concept). In summary, the framework developed in this study can be seen as a foundation for architecture-level regulation. Even though there are risks associated with infrastructure standardization [86], the framework is not closed or prescriptive.

7.4 Research outputs, benefits, and contributions

Table 5 summarizes the abovementioned topics and highlights the study’s main outputs, benefits, and contributions. Overall, the study helps researchers and practitioners engage with the problem of privacy by design through microservices.

Table 5 Research outputs, benefits, and contributions

7.5 Limitations, future work, and future research directions

The present study adopts a research methodology that utilizes the systematic literature review to identify privacy models and a framework that provides guidance and supports design decisions in implementing data privacy using microservices. However, a different research methodology could be adopted based on empirical research, where the framework would be experimentally evaluated, e.g., through field tests with developers. Such an approach would provide empirical evidence about the framework, complementing the logical propositions in this study. We will consider such empirical studies in future work.

The research is centered on the positive characteristics of microservices (decentralization, packaging, and guarding). It does not reflect on the potentially negative impacts of microservices. For instance, microservices increase system complexity, which in turn increases risk. Threats may come from multiple parties, either inside or outside [87]. Future work should also reflect on the negative characteristics of microservices.

Regarding future research directions, the set of privacy models provides a foundation for building a toolkit with standardized and reusable privacy-preserving microservices. Having such a toolkit would promote the framework’s adoption. The framework could be further developed to consider a dynamic view of privacy models. The decision on which models to use does not have to be static; it could be made in runtime based on continuous risk assessment. For instance, a system could rely on gatekeepers if the risk is assessed as low, but federation microservices and privacy-preserving meshes could be added as the risk increases. The collaboration between privacy-preserving microservices should also be considered in this dynamic setting. Finally, future work could also consider using the framework for visualizing how data is handled by a system at the architectural level, e.g., using “nutrition labels.” Empirical validation would indicate the understandability and utility of the templates in Tables 4 and 5.

8 Conclusions

Mapping high-level requirements usually defined in regulatory frameworks onto system components is hard as it requires a conceptual jump. On the other hand, the abundance of tech-related regulations and laws imposes specific system behaviors and functionalities, such as transparency, auditability, fairness, and risk assessment possibilities (see the recent European Union AI Act [88]). This paper adopts privacy as the problem domain, a domain addressed by legislation worldwide [89], and uses microservices to enact privacy by design.

Microservices architectures have brought several advantages to software systems and software development. Developers widely recognize them. This study explores bringing those advantages to the data privacy domain. Microservices support a design-oriented data privacy analysis focused on systems architecture. The present study analyses how to architect systems using microservices, where privacy is built into microservices and in between microservices. Multiple defensive lines can be made using microservices to protect systems.

Microservices are regarded as a society of software components, some trustworthy while others not. Privacy-protecting microservices take the responsibility for protecting the whole system. A variety of privacy models has been extracted from the literature. These models have been consolidated using a common understanding of data privacy, common language, and methodical consideration for three qualities of microservices: decentralization, packaging, and guarding.

The analysis of different ways of organizing microservices supports the decision framework. The framework helps developers make decisions related to privacy protection using two criteria: granularity and versatility. Decisions are made at the architectural level. On the one hand, they are logically separated from technical requirements and implementations. On the other hand, they are also logically separated from broader considerations about privacy principles, regulations, and compliance. Indeed, the primary purpose of the framework is to function as a middle ground, or a bridge, between regulation and implementation. As such, the framework realizes the main goals and desires of privacy by design.