Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends

Colombo and Ferrari Cybersecurity (2019) 2:3
https://doi.org/10.1186/s42400-018-0020-9
Cybersecurity
S UR V EY Open Access
Access control technologies for Big Data

management systems: literature review and
future trends
Pietro Colombo and Elena Ferrari*
Abstract
Data security and privacy issues are magnified by the volume, the variety, and the velocity of Big Data and by the lack,
up to now, of a reference data model and related data manipulation languages. In this paper, we focus on one of the
key data security services, that is, access control, by highlighting the differences with traditional data management
systems and describing a set of requirements that any access control solution for Big Data platforms may fulfill. We
then describe the state of the art and discuss open research issues.
Keywords: Big Data, Access control, Privacy, NoSQL data management systems
Introduction Data platforms outperform traditional systems even with

The term Big Data refers to a phenomenon character- respect to performance and scalability.
ized by “5 V”. By analysing huge Volumes of data with a However, BigData systems do not show the same level
high Variety of formats, Big Data analytic platforms allow of excellence with data protection features (Colombo
making predictions with high Velocity, thus, in a timely and Ferrari 2015b). For instance, while a variety of data
manner, low Veracity, therefore with low uncertainties, protection frameworks have been proposed for tradi-
and with a high Value, namely, with an expected signif- tional systems (see e.g., Agrawal et al. (2002); Byun and
icant gain (Jin et al. 2015). As a matter of fact, business Li (2008); Colombo and Ferrari (2014a; 2014b; 2015a);
strategies are more and more driven by the integrated Ferrari (2010)), the majority of Big Data platforms inte-
analysis of huge volumes of heterogeneous data, coming grate quite basic access control enforcement mechanisms
from different sources (e.g., social media, IoT devices). (Colombo and Ferrari 2015b). As a result, the uncon-
This phenomenon has been pushed by numerous tech- strained access to high volume of data from multiple
nological advancements. The most significant include the data sources, the sensitive and private contents of some
birth of NoSQL datastores (Cattell 2011), and distributed data resources, and the advanced analysis and predic-
computational paradigms, like MapReduce (Dean and tion capabilities of Big Data analytic platforms, might
Ghemawat 2004), which have jointly opened the way to represent a serious threat. For instance, the analysis capa-
the management and systematic analysis of huge volumes bilities can be exploited to derive correlations between
of semi-structured data (e.g., transactions, electronic doc- sensitive and personal data. As an example, let us con-
uments and emails). sider the domain of fitness apps which nowadays are
Overall, the support provided by Big Data platforms more and more deployed on mobile and wearable devices
for the storage and analysis of huge and heterogeneous and gym equipment. The joint analysis of movement
datasets cannot find a counterpart within traditional data data, hearth beats, and weight might allow profiling users
management systems. In addition, the advantages of these life style and inferring users inclination to pathologies.
new systems are not only related to the outstanding As a consequence, although the potential benefits of
flexibility and efficacy of the analysis services, as Big Big Data analytics are indisputable, the lack of standard
data protection tools open these services to potential
*Correspondence: elena.ferrari@uninsubria.it attackers.
DiSTA, University of Insubria, Via Mazzini 5, 21100 Varese, Italy
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Colombo and Ferrari Cybersecurity (2019) 2:3 Page 2 of 13
The definition of proper data protection tools tailored while in the context of Big Data, data are
for Big Data platforms is as a very ambitious research heterogeneous and schemaless.
challenge. State of the art enforcement techniques pro- • Context management. Another key aspect that
posed for traditional systems cannot be used as they are, should be considered is the support for context based
or straightforwardly adapted to the Big Data context. This access constraints, as these allow highly customized
is mainly due to the required support for semi structured access control forms. For instance, they can be used
and unstructured data (Variety), the quantity of data to to constrain access to specific time periods or
be protected (Volume), and the very strict performance geographical locations. In case contexts are used to
requirements (Velocity) affecting these systems. There- derive access control decisions, access authorizations
fore, the challenge is protecting privacy and confidential- are granted when conditions referring to properties
ity while not hindering data analytics and information of the environment within which an access request
sharing. Additional aspects contribute to raise the com- has been issued are satisfied.
plexity of this goal, such as the variety of data models and • Efficiency of access control. The characteristics of
data analysis and manipulation languages which are used the Big Data scenario, such as the distributed nature
by Big Data platforms. Indeed, different from RDBMSs, of the considered platforms, the complexity of the
Big Data platforms are characterized by various data mod- queries, and the focus on performance, require access
els (Cattell 2011), the most notable being the key-value, control enforcement strategies that do not
wide column, and document oriented ones. compromise the usability of the hosting analytic
In this paper, we focus on access control, by first frameworks. Indeed, based on the considered queries,
identifying a set of requirements that any access con- the number of checks to be executed during access
trol solution for Big Data platforms should address (cfr. control enforcement can match or be even greater
“Requirements” section). Then, we classify and analyze than the number of data records, and, in the Big Data
the related literature (“State of the art”, “Platform spe- scenario, data sets can include up to hundreds of
cific approaches”, “Platform independent approaches” and millions of such records. This requires efficient policy
“Domain specific Big Data approaches” sections), and dis- compliance mechanisms. FGAC has been enforced in
cuss key research challenges (“Research issues” section). traditional relational DBMSs according to two main
Finally, we conclude the paper in “Conclusions” section. approaches. The first is the view-based one, where
This paper is an invited extended version of a paper pub- users are only allowed to access a view of the target
lished in the proceedings of the 23rd ACM Symposium on dataset that satisfies the specified access control
Access Control Models and Technologies (SACMAT’18)1 . restrictions, whereas the second one is based on
Current version differs from the original conference paper query rewriting. Under such an approach instead of
for a wider and updated analysis of state of the art access pre-computing the authorized views, the query is
control solutions for Big Data systems, which also takes modified at run-time by injecting restrictions
into consideration domain specific platforms, and the imposed by the specified access control rules. It is
related open research challenges. therefore important to determine to what extent these
approaches are suitable for the Big Data scenario and
Requirements how they can be possibly customized or extended.
In this section, we provide an overview of the key require-
ments behind the definition of an access control mecha- As it should be clear from the previous discussion, one
nism for Big Data platforms. of the main difficulties in developing an access control
solution for Big Data platforms is the lack of a stan-
• Fine-grained access control. In terms of features dard model and related manipulation languages to which
the access control mechanism should support, fine access control rules and the related enforcement monitor
grained access control (FGAC) has been widely can be bound.
recognized as one of the fundamental component for
an effective protection of personal and sensitive data State of the art
(e.g., see Agrawal et al. (2002); Rizvi et al. (2004)). In the literature various proposals exist which address
Since data processed by Big Data analytics platforms the issue of access control for Big Data platforms
often refer user personal characteristics, it is and satisfy some of the requirements illustrated in
important that access control rules can be bound to “Requirements” section. These proposals can be classified
data at the finest granularity levels. However, the into three main categories:
related enforcement mechanisms need to be invented
from scratch, as those proposed for traditional • Platform specific approaches. Access control
systems rely on data referring to known schema, solutions under this category are designed for one
system only (e.g., MongoDB, Hadoop), and possibly a cluster of commodity hardware nodes. Data are analyzed
leverage on native access control features of the in parallel by means of MapReduce tasks, characterized
protected platform. The main advantage of this by users defined Map and Reduce functions. These tasks
approach is that the devised access control solution operate by first extracting and then manipulating flows
can be optimized for the target system, however, its of key-value pairs, each modeling a portion of the tar-
usability and interoperability are greatly limited. get data resource. The considered computation paradigm
• Platform independent approaches. The allows processing unstructured and semi-structured data
approaches falling under this category propose access resources.
control solutions which do not target a specific In Ulusoy et al. (2015), a framework denoted GuardMR
platform only. Platform independent approaches have has been proposed, to enforce fine grained Role-based
the advantage of being more general than platform Access Control (RBAC) (Ferraiolo et al. 2001) within
specific solutions, however they cannot compete with Hadoop3 , a very popular Big Data analytics platform
them in terms of efficiency. Existing proposals in this built on top of MapReduce. GuardMR enforces data pro-
category mainly leverage on recent research efforts tection by filtering, and possibly altering, the key-value
that aim at defining a unifying query language for pairs derived from a target data resource by a MapRe-
NoSQL datastores (e.g., JSONiq (Florescu and duce task, which are then provided as input to the Map
Fourny 2013) and SQL++ (Ong et al. 2014)). function.
• Domain specific Big Data approaches. This Filters are used to generate views of the analyzed
complementary category includes platform specific resources which are authorized for the subject who
and platform independent approaches that target requires the execution of the MapReduce task. The views
domain specific Big Data systems, designed to fulfill are generated in such a way that any unauthorized content
specific requirements related to data management included in the analyzed resource is removed or obfus-
needs of a target scenario. As a matter of fact, a cated. More precisely, filters specify: i) preconditions to
variety of Big Data systems have been designed to the processing of any key-value pair p extracted from a
handle specific application scenarios, and the target resource under analysis, as well as ii) the ratio-
literature has shown that in these cases the nale for deriving from p a new pair p’, which models the
integration of access control mechanisms has mainly authorized content of p. The use of filters had previously
been driven by intrinsic features of these systems. In been considered in Vigiles (Ulusoy et al. 2014), a fine
particular, among the various scenarios that can grained access control framework for Hadoop. In Ulusoy
benefit from Big Data systems, we focus on two of the et al. (2014), authorization filters are handled by means of
most relevant ones, namely, data stream analysis and per-user assignment lists, and filters are coded in Java by
Internet of Things applications, by analyzing related security administrators. In contrast, in GuardMR filters
access control enforcement techniques. are assigned to subjects on the basis of the covered roles,
and a formal specification approach to the definition of
In what follows, we analyze the related literature in filters is proposed, which allows specifying selection and
view of this classification, then we discuss related research modification criteria at a very high level of abstraction
challenges. using the Object Constraint Language (OCL)4 (Warmer
and Kleppe 1998; Clark and Warmer 2002). GuardMR
Platform specific approaches relies on automatic tools5 to generate Java bytecode from
The great majority of access control frameworks target- OCL-based filter specifications, as well as to integrate the
ing Big Data platforms propose enforcement approaches generated bytecode into the bytecode of the MapReduce
designed on the basis of platform specific features and task to be executed. GuardMR has been used with MapRe-
which can only be used with the platform for which they duce tasks targeting both textual and binary resources
have been defined. (Ulusoy et al. 2015), showing the flexibility of the
In the remainder of this section, we analyze platform approach. GuardMR and Vigiles do not require Hadoop
specific approaches defined for MapReduce-based ana- source code customization, however, they rely on plat-
lytics platforms2 , and NoSQL datastores, which together form specific features, such as the Hadoop APIs and the
cover the majority of existing Big Data systems. Hadoop control flow for regulating the execution of a
MapReduce task. A reasonably low enforcement overhead
MapReduce systems has been observed with both Vigiles and GuardMR. Nei-
MapReduce is a distributed computational paradigm that ther Vigiles nor GuardMR provide support for context
allows analyzing very large data sets (Dean and Ghemawat aware access control policies.
2004). Within MapReduce systems, data resources are A recent work targeting access control enforcement
partitioned into multiple chunks of data and distributed in within MapReduce systems is described in Gupta et al.
(2017). More precisely, Gupta et al. (2017) introduces the NoSQL datastores which have been proposed in the
foundations of an access control model, called HeAC, literature. K-VAC supports the enforcement of content-
which formalizes the authorization model of Apache based, and context-based access control policies possibly
Ranger6 and Apache Sentry7 , as well as the native access specified at different levels of the data model hierarchy
control features of Hadoop. Apache Ranger and Apache (e.g., for a column or for a row). Two prototypical versions
Sentry represent state of the art technologies for the of K-VAC have been released. One has been specifically
enforcement of fine grained access control in Hadoop designed as an internal module of Cassandra, a popu-
ecosystems. Authorization assignments are specified for lar wide-column datastore whose source code has been
operations and objects, possibly on the basis of object modified to host K-VAC’s enforcement monitor. In con-
tags, namely attributes specifying properties, like sensi- trast, the latter version has been released as an external
tivity, content, or expiration date. Moreover, Gupta et al. library, with the aim to enforce access control on multi-
(2017) introduces the foundation of Object Tagged RBAC, ple datastores. However, the use of the proposed library
an RBAC model which, while preserving RBAC role based still requires ad-hoc implementation of binding criteria,
permission assignments, introduces support for object which so far have been only defined for Cassandra and
attributes. A prototypical implementation of the model HBase11 . Overall the integration of K-VAC requires deep
has been defined by introducing role support into Apache customizations of the hosting platform. Empirical perfor-
Ranger. The proposed enforcement approach is again plat- mance evaluations show the efficiency of both the pro-
form specific as it has been designed on top of Hadoop posed prototypes, with a lower overhead measured with
specific features. No support is given to context related the customized version of Cassandra.
properties, and no performance evaluation is presented. Another work targeting Cassandra has been proposed
in Shalabi and Gudes (2017), where an approach to the
NoSQL datastores cryptographic enforcement of RBAC policies has been
NoSQL datastores represent highly flexible, scalable, and defined. Predicate (Katz et al. 2013) and second level
efficient data management systems for Big Data, based on encryption (Nabeel and Bertino 2014) are used for the
different data models. Cattell 2011 classifies NoSQL sys- definition of an efficient scheme for RBAC enforcement
tems into three classes, on the basis of the adopted data which operates within Cassandra distributed architecture.
model, namely key value, wide column, and document- The proposed approach is an example of platform spe-
oriented datastores, each suited to specific application cific solution designed on top of specific features, such
scenarios. Key-value datastores (e.g., Redis8 ) can be seen as the distributed architecture of Cassandra. Also in this
as big hash tables with persistent storage services. Data case no support is given for context-aware policies, and,
are modeled by means of key-value pairs, where values unfortunately, the enforcement overhead is not discussed.
of primitive or complex type are directly addressed by As far as document-oriented datastores, efficient solu-
means of a key. Key value datastores are suited to appli- tions to the integration of fine-grained purpose-based
cation scenarios where efficient look-up operations are access control into MongoDB have been proposed in
required. For instance, they are used to manage web ses- Colombo and Ferrari (2016) and (2017a). In Colombo and
sion information and users profile data. Wide column Ferrari (2017a) the RBAC model natively integrated in
stores (e.g., Cassandra9 ) model data as records with vari- MongoDB has been enhanced with the support for the
able structures, which are then grouped into tables with specification and enforcement of purpose-based policies
flexible schema. Wide column stores are a good fit for (Byun and Li 2008) regulating the access up to document
the data management requirements of blogging platforms level. The proposed approach refines the granularity level
and content management systems. Document-oriented at which the native MongoDB RBAC model operates. An
datastores (e.g., MongoDB 10 ) model data as hierarchical enforcement monitor, called Mem (MongoDB enforce-
records, denoted documents, whose fields either specify a ment monitor), has been designed, which monitors and
primitive value, or are in turn records composed of mul- possibly manipulates the flow of messages exchanged by
tiple fields. Documents are partitioned into collections, MongoDB clients and the MongoDB server, thus acting
which in turn are grouped in a database. Typical appli- like a proxy. Once Mem intercepts a message m issued
cations of document oriented datastores include event by a MongoDB client on behalf of a subject s, it forwards
logging systems and content management systems. m to the server, or it temporary blocks m, and issues
Fine grained access control within NoSQL datastore additional messages finalized at profiling s. If m models
management systems is still in the very early stage, and a query q, Mem rewrites m as m’ in such a way that m’
only few access control frameworks have been proposed encodes a query q’ that only accesses those documents
so far for wide column and document oriented datastores. accessed by q which result authorized by the applicable
K-VAC (Kulkarni 2013) is among the earliest fine access control policies. Mem’s proxy based architecture
grained access control frameworks targeting wide-column allows the straightforward integration of the enforcement
monitor into existing MongoDB deployments with basic achieved towards the fulfillment of this goal. JSONiq is an
configuration tasks. Experimental evaluations show the Xquery (Chamberlin 2003) based language that has been
efficiency of the proposed approach, however also in this defined with the aim to analyze data handled by NoSQL
case no support is given for context-aware policies. datastores adopting a JSON-based data model. Unfortu-
In Colombo and Ferrari (2016), the framework pre- nately, at present JSONiq is only supported by Zorba13 ,
sented in Colombo and Ferrari (2017a) has been sig- and Sparksoniq14 , which allow processing data serialized
nificantly extended, introducing the support for access in JSON format, and by a platform denoted 28msec15 ,
control policies regulating the access up to field level, which supports the execution of JSONiq queries on Mon-
and providing support to specification and enforcement goDB databases.
of content and context based policies. The proposed SQL++ (Ong et al. 2014) is a recent proposal of unifying
enforcement monitor, denoted ConfinedMem, applies the query language that allows analysing semi-structured data
same logic as Mem, but it operates according to a two- handled by NoSQL datastores as well as structured data
step process, which consists of: 1) the derivation of the of traditional DBMSs. SQL++ has been recently adopted
authorized views of all documents to be accessed by a by Couchbase16 and AsterixDB17 (Alsubaiee et al. 2014),
submitted query q included in a message m requiring the whereas Apache Drill18 , is in the process of aligning with
access to data resources, 2) the rewriting of m as m’ in such SQL++. The diffusion of this language is thus growing,
a way that m’ specifies a query q’ which can only access and the adopted SQL based syntax and the backward
the authorized views of the documents to be accessed by compatibility with relational DBMSs promise to further
q. Different implementation techniques have been con- increase its popularity and diffusion.
sidered for queries specifying different operations (e.g., In Colombo and Ferrari (2017b) an SQL++ based
selection and aggregations) with the aim to minimize the Attribute-based Access Control (ABAC) (Hu et al. 2013;
overhead. Experimental evaluations show that, overall, 2015) framework for NoSQL datastores has been pro-
the enforcement overhead which has been observed with posed. The choice to base the framework on SQL++
access control policies specified at field level is signifi- allows protecting any NoSQL datastore which pro-
cantly higher than the one measured for document level vides support to this language. Therefore, the pro-
policies. posal distinguishes from all other work introduced in
“Platform specific approaches” section for higher general-
Platform independent approaches ity and applicability, which may even grow with a future
The great majority of the research contributions in the potential wider diffusion of SQL++. The framework oper-
field of access control for Big Data analytics platforms ates at a very fine grained level, in that it allows regulating
propose a platform specific solution. the access up to single data fields. The supported gran-
The lack of a reference standard query language and ularity is thus equivalent to cell level within relational
data model has caused the birth of a variety of proprietary DBMSs. Enforcement is based on query rewriting and
solutions. As a matter of fact, numerous NoSQL datas- operates with heterogeneous data with no assumption
tores exist, most of which operate with a platform specific on data schema, thus overcoming state of the art query
query language (e.g., the query language of MongoDB rewriting techniques proposed for RDBMSs (Rizvi et al.
can only be used with that platform), and adopt a differ- 2004; LeFevre et al. 2004).
ent data model. Even different datastores that nominally Query rewriting techniques finalized at enforcing cell-
refer to the same data model can use different data orga- level access control within traditional DBMSs operate
nization and terminology. For instance, both MongoDB by projecting or nullifying the value of each cell to be
and CouchDB12 use the document oriented data model, accessed by a query q on the basis of the compliance of
however the concept of collection is not supported by the access performed by q with the applicable access con-
CouchDB, whereas collections are basic data organi- trol policies (LeFevre et al. 2004). More precisely, a query
zation features of MongoDB. The great heterogeneity q submitted for execution is rewritten in such a way to: i)
of the scenario has significantly raised the complexity include a subquery s for each table t accessed by q, which,
of devising enforcement solutions that can work with cell by cell, generates an authorized view of t, and ii) per-
multiple platforms. Overall, the definition of a general form the same analysis tasks as q on the result set of s. The
access control enforcement approach represents a very subquery s specifies projection criteria conditioned by the
ambitious task. compliance of the accesses operated by q with the cell level
In the recent years, academia and industry started col- access control policies that have been specified for t’s cells.
laborating to the definition of unifying query languages for A similar approach can only be used if the scheme of any
NoSQL datastores. To the best of our knowledge, JSONiq accessed table is a priori known, as the projection crite-
(Florescu and Fourny 2013) and SQL++ (Ong et al. 2014) ria of the subqueries need to refer to table columns. The
represent the most relevant results that have been so far schemaless and highly heterogeneous nature of the data
within Big Data platforms does not allow to use similar designed for specific application domains. In particular,
techniques. we first analyze approaches that target Big Data platforms
In Colombo and Ferrari (2017a) this issue has been han- supporting data stream analytics, and then we focus on
dled by means of SQL++ operators that allow achieving those for Internet of Things ecosystems.
the projection without knowing in advance the accessed
fields. The approach operates by visiting, field by field, Big Data streaming analytics
the data unit19 du of an analyzed resource, and adding In recent years, the number of Big Data platforms that
a visited field f to the authorized view du’ of du only if provide support to data stream management is grow-
the access to f complies with the ABAC policies specified ing. Apache Spark20 is probably the most popular open
for f. The proposed approach allows deriving in-memory source framework which supports the analysis of contin-
authorized views of the data resources to be analyzed, uous streams of data. Apache Storm21 is another open
and executing the analysis tasks of the original queries on source distributed real-time computation system which
such derived views. The ABAC framework proposed in can also be used for real-time analytics and continuous
Colombo and Ferrari (2017a) supports the specification computation. In addition, several commercial solutions
and enforcement of context-aware access control policies. exist, such as, for instance, Amazon Kinesis22 , which is a
Empirical performance assessments show an enforcement service for real-time processing of streaming data on the
overhead that varies with the characteristics of the spec- cloud, and IBM Streaming analytics23 , a platform support-
ified policies and the number of fields of the analyzed ing risk analysis and decision making in real-time. Due to
documents. The overhead is high when field level policies the growing emphasis to real-time analysis of data flows,
cover high percentage of data units fields. access control enforcement mechanisms targeting contin-
Another language-based ABAC approach has been pro- uous flows of data are strongly required. A few results have
posed in Longstaff and Noble (2016), with the goal to be been presented in the past years in the field of Data Stream
usable with traditional data management systems, Mapre- Management Systems (DSMSs) (e.g., Nehme et al. (2010),
duce systems, as well as NoSQL datastores. The work Carminati et al. (2010), and Puthal et al. (2015)).
proposes a query rewriting approach that targets user In Nehme et al. (2010), a framework, called FENCE, has
transactions specified with an SQL-like language. Unfor- been proposed, which supports continuous access con-
tunately, a detailed description of the adopted query lan- trol enforcement. Data and query security restrictions
guage and data model is missing, which makes unclear are modeled as meta-data, denoted security punctua-
how the approach could be used with different platforms, tions, which are embedded into the data streams. Differ-
and how the heterogeneity of schemaless data can be ent enforcement mechanisms have been proposed, which
handled by means of an SQL-like language. operate by analyzing security punctuations, such as spe-
A summary of the access control frameworks discussed cial physical operators which are integrated within query
so far along with the supported access control require- execution plans with the aim to filter the tuples which can
ments (cfr. “Requirements” section) is shown in Table 1. be analyzed, and rewriting mechanisms targeting contin-
uous queries.
Domain specific Big Data approaches The framework in Carminati et al. (2010) assumes that
In this section, we focus on the state of the art approaches data analysis within DSMSs is achieved by continuous
to the integration of access control into Big Data systems queries, and enforces access control by means of query
Table 1 Summary of the surveyed platform specific and platform independent access control frameworks
AC framework Target platform AC model Max granularity Context support Efficiency
GuardMR (Ulusoy et al. 2015) Hadoop RBAC K,V pair No Medium/High
Vigiles (Ulusoy et al. 2014) Hadoop DAC K,V pair No Medium/High
HeAC (Gupta et al. 2017) Hadoop RBAC K,V pair No Not available
K-VAC (Kulkarni 2013) Cassandra/ HBase DAC Cell Yes High
Shalabi and Gudes (2017) Cassandra RBAC Cell No Not available
Mem (Colombo and Ferrari 2017a) MongoDB RBAC Document No High
ConfinedMem (Colombo and Ferrari 2016) MongoDB DAC Field Yes Medium/Low
Colombo and Ferrari (2017b) All those ABAC Cell/Field Yes Medium
supporting
SQL++
Longstaff and Noble (2016) Not clear ABAC Cell/Field Yes Not available
rewriting, where rewritten queries are defined by com- For instance, Gusmeroli et al. (2013) and Hernández-
position of secure query operators. In contrast, Puthal Ramos et al. (2013) propose the use of the Capability based
et al. (2015) presents a crypto-based solution to verify access control model (CapBAC) within IoT ecosystems.
authenticity and integrity of data streams. CapBAC distinguishes from other models in the litera-
Complex event processing (CEP) systems (Cugola and ture as it allows externalizing and distributing the man-
Margara 2015) represent the evolution of DSMSs (Cugola agement of access authorizations. However, it does not
and Margara 2012), and are nowadays used for many take context awareness into account, and for this reason it
different applications, such as Internet of Things applica- has been criticized (Ouaddah et al. 2015).
tions and Smart Cyber-physical systems (Dayarathna and RBAC (Ferraiolo et al. 2001) and ABAC (Hu et al. 2013;
Perera 2018). 2015) have also been proposed to regulate the access
CEPs support the processing of heterogeneous streams within IoT ecosystems.
from multiple sources, as well as advanced forms of rea- For instance, in Zhang and Tian (2010), with the aim to
soning over such data streams. On the basis of the expe- fit IoT dinamicity, an extended version of RBAC support-
rience with DSMSs in Carminati et al. (2010), a novel ing contextual constraints has been introduced. However,
access control model for CEP platforms has been pro- the resulting enhanced model has been criticized (e.g.,
posed in Carminati et al. (2016). The model assumes an see Rajpoot et al. (2015)) as it is affected by shortcom-
application scenario where users generating continuous ings, like role explosion, which also characterize RBAC.
flows of data, specify how their data can be processed A few approaches have been based on the ABAC model.
and what cannot be inferred from the data. The compli- For instance, Kaiwen and Lihua (2014) propose an ABAC
ance of the access performed by a query with the specified model that extends RBAC with the dynamic assign-
user preferences is checked by verifying that each operator ment of roles to users. However, the proposed model
in the submitted query complies with the user prefer- only partially exploits ABAC features, as it only sup-
ences specified for the accessed attributes of the analyzed ports subject attributes. Another ABAC model operating
data streams. In Migliavacca et al. (2010), a system, called with a predefined set of attributes has been proposed
DEFCON, has been presented to enforce decentralised in Hemdi and Deters (2016). The proposed enforcement
event flow control. The system, which has been designed monitor has been designed for IoT ecosystems that use
targeting the financial trading scenario, applies informa- CoAP24 as communication protocol. Unfortunately, the
tion flow control principles and leverages on security focus of Hemdi and Deters (2016) is on implementa-
labels assigned to event messages. Event flow control is tion aspects, and neither the enforcement mechanism
achieved through a lightweight approach that makes use nor the supported access control policies are formally
of application-level virtualisation to separate processing specified.
units. In Marra et al. (2017), La Marra et al. (2018) and 2017
a framework is proposed, supporting the enforcement of
Internet of Things Usage Control (UCON) (Zhang et al. 2005) within IoT
Internet of Things (IoT) ecosystems are representative ecosystems. The approach is illustrated discussing the
cases of Big Data applications. IoT applications are rapidly policy enforcement mechanism within a Smart Home
getting popularity in a variety of domains for the indis- environment. However, the generality of the proposed
putable improvements of people life style they bring. mechanism is limited by constraining assumptions, such
Nowadays a growing number of users cannot do with- as the use of ad-hoc defined brokers.
out wereable devices that track their movements, sport A general enforcement mechanism has been proposed
activities and health conditions, and a variety of devices in Colombo and Ferrari (2018), which allows enforcing
and apps exist for this purpose. IoT applications are used policies of different access control models within MQTT-
to control the safety of the environments where people based IoT ecosystems. The proposed framework provides
live, as well as to improve their life style. As a matter of a monitor that enforces access control by regulating the
fact, the diffusion of home automation services and smart flow of the exchanged MQTT control packets. The frame-
devices like smart locks, smart meters, and smart lights is work is illustrated using ABAC, but other models are also
growing. supported.
Due to the personal and sensitive nature of the handled A recent research line targets the study of access con-
information, security and privacy of these systems have trol enforcement for cloud-enabled IoT (see e.g., Alshehri
become a major concern. Therefore, in the recent years, and Sandhu (2016; 2017); Bhatt et al. (2017; 2018); Ahmad
several research efforts have been devoted to security and et al. (2018)). Alshehri and Sandhu propose an access con-
privacy of IoT applications, and a variety of access control trol oriented (ACO) architecture (Alshehri and Sandhu
models have been proposed (see, for instance, Ouaddah 2016; 2017), which supports the definition of access con-
et al. (2017) for a compendium). trol models for cloud-based IoT services. ACO has been
used to define enforcement mechanisms tailored for spe- instance, although the popularity of the SQL++ (Ong et al.
cific IoT platforms (Bhatt et al. 2017), and applications 2014) initiative is growing, the support provided to this
(Bhatt et al. 2018). language is still limited to a small number of platforms.
Access control enforcement for cloud-enabled IoT sys- One key element that may be instrumental to fill this
tems has also been investigated by Ahmad et al. (2018), void is the definition of a unifying data model capable
who, starting from a case study related to a smart home of representing data resources of the different data mod-
environment, have identified a set of key requirements els currently adopted by Big Data platforms. The ability
for the enforcement of access control within IoT ecosys- to represent data resources is a fundamental requirement
tems. The authors have also proposed an approach to for binding access control policies to the protected data,
handle access control as a service, outsourcing policy as well as for the specification of policies regulating the
management to a trusted third party, while relying on access on the basis of the protected objects’ attributes.
the native mechanisms of state of the art IoT platforms Indeed, in the literature on access control, multiple mod-
for policy enforcement. The feasibility of the approach els allow enforcing content-based access constraints (e.g.,
has been assessed wrt the satisfiability of the identified Kulkarni (2013); Colombo and Ferrari (2016)), as well as
requirements. access control rules that refer to various security meta-
Finally, some proposals target of intelligent transporta- data related to the protected data resources (e.g., Colombo
tion systems. Recent research efforts in this field have and Ferrari (2015a)).
been devoted to enable advanced communication forms The key-value, wide column, and document-oriented
among vehicles, road infrastructures, drivers, as well as models adopt different data modeling criteria, however,
intra-vehicle devices. The envisaged services rely on a in all these models data are hierarchically organized as
variety of technologies, which range from dedicated hard- tree structures, where nodes at different height of the tree
ware and software components, to the enabling communi- represent resources at different granularity levels of the
cation infrastructures, possibly cloud or fog based. In this related data model (e.g., database, table, row, and cell).
complex scenario vehicular security represents a major Data models differ among them for the height of the
concern, and the US Department of Transportation has tree with which data resources can be represented. This
already outlined the strategic goals of an Intelligent Trans- may range from 2, within key-value datastores (since all
portation System Program (Barbaresso and et al. 2014). key-value pairs – leaf nodes, belong to a key-space –
Initial research results in this field have been described root node.) to a height of variable length n (n >2) for
in Gupta and Sandhu (2018), where an extended ver- document-oriented datastores, where a database (root
sion of the ACO architecture presented in Alshehri and node), groups a variable number of collections (level 2
Sandhu (2016) is discussed, called E-ACO. The paper also nodes), which in turn include a variable number of doc-
discusses enforcement mechanisms tailored for various E- uments (level 3 nodes), each composed of a variable
ACO layers, however the topic remains an open research number of fields, which in turn are possibly hierarchi-
field, with room for investigations in manifold directions cally organized into a tree structure (level 4 to n). A data
(see Gupta and Sandhu (2018)). resource of a data model corresponds to a node n of the
tree representing all the resources handled by a platform,
Research issues and it can be accessed traversing the path from the root
In what follows, we discuss some open research issues in of the tree to n. Therefore, we believe that a unifying
the field of Big Data access control. representation of data resources of multiple data mod-
els should take into account the identification of proper
Unifying access control models and mechanisms modeling strategies for the nodes of the above mentioned
State of the art review done in “Platform specific resource tree. In particular, nodes should be specified in
approaches”, and “Platform independent approaches” such a way to keep track of: i) any structural property
sections has highlighted that, although research in the related to the modeled resource, ii) hierarchical relations
area of access control for Big Data platforms is progress- with other nodes (e.g., a parent of relationship), iii) pos-
ing, no solution has been proposed so far for a unifying sible meta-data, and iv) access control policies specified
access control framework which can combine general- for the modeled resource. The considered policies may
ity and efficiency of access control. The heterogeneous refer to different access control models, specifying con-
schemaless nature of the managed data significantly com- text aware access control rules as well as content-based
plicates the definition of this framework, and so far this constraints.
has lead mainly to ad-hoc platform specific solutions (see Going one step further, the specified unifying model
“Platform specific approaches” section). In contrast, lan- could also be used for enforcement purposes. For
guage centric approaches still suffer of limited applicabil- instance, enforcement mechanisms can be achieved by
ity (see “Platform independent approaches” section). For means of bidirectional mappings between resources
represented with a platform specific native data model integrates a highly efficient computation engines, which
and the unifying data model. Overall, the analysis promises to be significantly faster than Hadoop3 (up
of related work has revealed that fine grained access to 100 times faster26 .) The overhead is expected to be
control with schemaless data is usually enforced execut- reasonably contained in all those platforms supporting
ing the submitted analysis tasks on authorized views of in-memory MapReduce computations, as well as data
the accessed resources (e.g., see Colombo and Ferrari streams.
(2017b)). Therefore, a platform independent strategy to
handle fine grained access control enforcement may Policy analysis tools
consist of a pipeline of operations supported by any The availability of a unifying data model on which access
platform, which, by means of the unifying data model, control policies can be specified would also allow to sup-
handle the generation of authorized views. The gener- port policy analysis and reasoning at an abstract layer
ated view can then be analyzed by the originally submit- independent from any specific platform. As a matter of
ted query without additional platform specific rewriting fact, the variety of data models, access control models,
activities. The above mentioned pipeline is illustrated in and related configuration options, such as policy propaga-
Fig. 1. For each accessed resource rs, represented as a tion and conflict resolution criteria, adopted by Big Data
tree characterized by different nodes ni , the process: i) platforms, can make really hard for security administra-
derives a unifying model-based representation urs of rs, tors to understand the effect of a set of access control
ii) derives the authorized view urs’ of urs, where the policies on the data resources which are managed by
unauthorized contents have been removed, and, finally, their systems, as well as assessing the quality of the spec-
iii) maps back the authorized view urs’ to the native ified policies. Most of the research efforts in this field
data model, so that the generated view rs’ can be ana- have been devoted to correctness verification, detection
lyzed by the originally submitted analysis task. In order of inconsistencies and redundancies, as well as reason-
to support such approach within multiple platforms, the ing on policy sets completeness. A variety of approaches
above mentioned mapping and view generation mecha- have been adopted to achieve the analysis, which range
nisms should be defined in such a way that any platform, from the use of formal methods, to machine learning
independently from the supported query language and and data mining techniques. For instance, Datalog-based
data model, could handle the execution of this process. approaches have been proposed in Pasarella and Lobo
To the best of our knowledge, the majority of today Big (2017) and Tsankov et al. (2014), which respectively
Data platforms provide support for MapReduce com- target Relationship-based Access Control (ReBAC) poli-
putational paradigm, independently from the adopted cies, and decentralized composite access control systems.
data model and query language. Therefore, a promis- Approaches based on Answer Set Programming (ASP),
ing approach could be that of specifying mapping and such as the ones proposed in Ahn et al. (2010) and
view generation mechanisms by means of MapReduce Kencana Ramli et al. (2013), allow the derivation of ASP
operations. programs from XACML27 policies, and the analysis on the
The enforcement overhead of the above discussed specified policies by means of ASP solvers. Model check-
technique is expected to depend on the platform hosting approaches have been proposed in Guelev et al. (2004)
ing the data to be protected, as different behaviors are and Zhang et al. (2005), whereas SAT solvers and Multi-
expected to be observed. For instance, Apache Spark25 , Terminal Binary Decision Diagrams based techniques in
Fig. 1 The pipeline of operations at the basis of an enforcement mechanism leveraging on the unifying data model
Lin et al. (2010) as a basis for reasoning on the permissions Overall, so far research on policy analysis has primarily
granted by access control policies. Graph-based analysis focused on different properties of policy sets abstracting
approaches for category based access control policies have from the effects of policy enforcement on the protected
been proposed in Alves and Fernández (2015), with the resources. In contrast, we believe that frameworks capable
aim to ease verification tasks of security administrators. of evaluating the effect of policy sets on resource acces-
Finally, data mining techniques have been primarily used sibility within different Big Data platforms are required,
for the detecting policy anomalies (e.g., Hu et al. (2013)). which may provide support to multiple access control
In Bertino et al. (2017) provenance techniques have models and configuration options.
been proposed to check the quality of the specified access
control policies for a scenario where collaborations are Issues related to domain specific Big Data systems
carried out by autonomous cognitive devices. However, Let us now consider open challenges related to access
to the best of our knowledge, so far no proposal has yet control enforcement within domain specific Big Data sys-
targeted Big Data platforms. The model centric approach tems. A selection of approaches targeting the enforce-
previously discussed may be exploited as a basis for the ment of access control policies within traditional DSMSs
definition of such policy analysis framework. For instance, and CEP platforms have been shortly presented in
it may be used to generate views of the protected resources “Big Data streaming analytics” section. A possible strategy
that show the authorized and unauthorized contents when to integrate similar enforcement approaches into Big Data
different policies and configuration options are used, as analytics platforms may consist in designing the mecha-
well as to quantify policy coverage for a requesting subject nism on top of one of the existing streaming framework.
with respect to an execution context. However, similar to the platform specific approaches
The definition of a policy reasoning tool is also instru- presented in “Platform specific approaches” section,
mental to fulfill the new EU General Data Protection Reg- such a solution would suffer from a limited applica-
ulation, (GDPR)28 which is intended to strengthen data bility. Moreover, existing solutions (e.g., Nehme et al.
protection for all individuals within the European Union. (2010)) operate at tuple level and scheme level (e.g.,
GDPR applies regardless of where a company is located, Carminati et al. (2016)), whereas cell/field level granu-
provided that the company manages data of EU residents. larity may be necessary in the Big Data scenario (see
GDPR introduces a set of very important principles for “Platform specific approaches” and “Platform indepen-
Big Data management, such as privacy by-design and dent approaches” sections), requiring a data filtering
by-default. The new regulation also emphasizes account- approach that operates at a finer granularity level. The
ability for data controllers to demonstrate compliance to development of an enforcement mechanisms based on
GDPR, whereas article 35 requires controllers to carry out language centric approaches seems still impracticable, as
Data Protection Impact Assessments in case of potentially no standard continuous query language exists. In contrast,
high-risk processing activities. All such principles require since some of these platforms can implement MapReduce
tools to clearly assess the effect of access control policies tasks (e.g., Apache Spark, Apache Storm), a model centric
on the managed data. approach may be a possible strategy, however, thorough
Finally, a policy analysis framework is also required for investigations are required to support this intuition.
community centered collaborative systems, such as online For what IoT ecosystems are concerned, the initial
social networks and collaborative editing platforms, which efforts shortly summarized in “Internet of Things” section
may be seen as federated applications that handle Big have mainly produced models adopting centralized
Data. Recent surveys pointed out that these systems typi- enforcement mechanisms (e.g., see Colombo and Ferrari
cally provide rudimentary forms of access control (Paci et (2018)). However, multiple IoT ecosystems may be con-
al. 2018). A key requirement for access control models tai- nected to each other exchanging data, and federated sys-
lored for collaborative systems is to allow users to under- tems where multiple IoT applications cooperate cannot be
stand collaborative decisions, as well as to inspect users handled with centralized enforcement mechanisms. Mul-
access preferences, and to evaluate their effects (Paci et al. tiparty access control solutions for IoT ecosystems are
2018). Paci et al. (2018) claim that, although a few work thus needed, and they must be suited to operate at Big
exist which explain the effect of access decisions (Hu et al. Data scale. To the best of our knowledge, the definition of
2013), and the reasons for which certain decisions have such access control frameworks still represent a big open
been taken (den Hartog and Zannone 2016), the above research challenge.
mentioned requirements are still largely understudied.
Therefore, the definition of a reasoning framework capa- Conclusions
ble of operating within such federated environments with Security services for Big Data represent a key feature
multiparty access control models appears as a research instrumental to foster trust on how data are managed and
challenge of paramount importance. analyzed by Big Data platforms. This paper has focused
on one of the key security service, that is, access con- Acknowledgements
trol, by discussing the requirements that an access control Not applicable.
solution for Big Data platforms should address, also with Funding
reference to specific key application scenarios (i.e., IoT Not applicable.
and data streams). Moreover, the paper has provided a Availability of data and materials
review of the state of the art in view of the devised require- Not applicable.
ments, and it has also discussed future research challenges
Authors’ contributions
in the area. The authors declare that they have equally contributed to the preparation of
the article, all authors read and approved the final manuscript.
Endnotes Competing interests

1 The authors declare that they have no competing interests.
Details are omitted due to the blind submission
requirements. Publisher’s Note
2 Springer Nature remains neutral with regard to jurisdictional claims in
MapReduce-based analytics platforms are hereafter published maps and institutional affiliations.
denoted MapReduce systems for the sake of brevity.
Received: 27 August 2018 Accepted: 19 December 2018
3
http://hadoop.apache.org/
4
https://www.omg.org/spec/OCL
5 References
Dresden OCL Toolkit, http://st.inf.tu-dresden.de/ Agrawal R, Kiernan J, Srikant R, Xu Y (2002) Hippocratic Databases. In:
oclportal Proceedings of the 28th International Conference on Very Large Data
6 Bases, VLDB ’02. pp 143–154
https://ranger.apache.org/ Ahmad T, Morelli U, Ranise S, Zannone N (2018) A Lazy Approach to Access
7 Control As a Service (ACaaS) for IoT: An AWS Case Study. In: Proceedings of
https://sentry.apache.org/
8
the 23Nd ACM on Symposium on Access Control Models and
https://redis.io/ Technologies. SACMAT ’18. ACM, New York. pp 235–246
9 Ahn G, Hu H, Lee J, Meng Y (2010) Representing and Reasoning about Web
http://cassandra.apache.org/
Access Control Policies. In: 34th Annual Computer Software and
10
https://www.mongodb.com Applications Conference. IEEE, Seoul. pp 137–146. https://doi.org/10.1109/
11 COMPSAC.2010.20
HBase is a popular wide-column store, https://hbase.
Alshehri A, Sandhu R (2016) Access Control Models for Cloud-Enabled Internet
apache.org/ of Things: A Proposed Architecture and Research Agenda. In: 2016 IEEE
12 2nd International Conference on Collaboration and Internet Computing
http://couchdb.apache.org/ (CIC). pp 530–538
13
http://www.zorba.io Alshehri A., Sandhu R. (2017) Access Control Models for Virtual Object
14 Communication in Cloud-Enabled IoT. In: 2017 IEEE International
http://sparksoniq.org/ Conference on Information Reuse and Integration. pp 16–25
15 Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar V, Bu Y, Carey M, Cetindil I,
https://www.28msec.com/
16
Cheelangi M, Faraaz K, et al. (2014) AsterixDB: A scalable, open source
https://www.couchbase.com/ BDMS. Proc VLDB Endowment 7(14):1905–1916
17 Alves S, Fernández M (2015) A Framework for the Analysis of Access Control
https://asterixdb.apache.org/
Policies with Emergency Management. Electron Notes Theor Comput Sci
18
https://drill.apache.org/ 312:89–105. Ninth Workshop on Logical and Semantic Frameworks, with
19 Applications (LSFA 2014)
SQL++ can be used with datastores adopting different
Barbaresso J, et al. (2014) USDOT’s Intelligent Transportation Systems ITS. In:
data models, thus, the term data unit is used to denote a Strategic Plan 2015-2019
Bertino E, Jabal AA, Calo SB, Makaya C, Touma M, Verma DC, Williams C (2017)
table row, or a document. Provenance-Based Analytics Services for Access Control Policies. In: 2017
20
https://spark.apache.org/ IEEE World Congress on Services, SERVICES 2017, Honolulu, HI, USA, June
21 25-30, 2017. pp 94–101
http://storm.apache.org/ Bhatt S, Patwa F, Sandhu R (2017) Access Control Model for AWS Internet of
22 Things. In: Yan Z, Molva R, Mazurczyk W, Kantola R (eds). Network and
https://aws.amazon.com/kinesis/
System Security. Springer, Cham. pp 721–736
23
https://www.ibm.com/cloud/streaming-analytics Bhatt S, Patwa F, Sandhu R (2018) An Access Control Framework for
24 Cloud-Enabled Wearable Internet of Things. In: 2017 IEEE 3rd International
http://coap.technology/ Conference on Collaboration and Internet Computing (CIC). pp 328–338
25 Byun JW, Li N (2008) Purpose based access control for privacy protection in
https://spark.apache.org/
26 relational database systems. VLDB J 17(4):603–619
https://www.datamation.com/data-center/hadoop- Carminati B, Colombo P, Ferrari E, Sagirlar G (2016) Enhancing User Control on
Personal Data Usage in Internet of Things Ecosystems. In: 2016 IEEE
vs.-spark-the-new-age-of-big-data.html
International Conference on Services Computing (SCC). pp 291–298
27
eXtensible Access Control Markup Language Carminati B, Ferrari E, Cao J, Tan KL (2010) A Framework to Enforce Access
Control over Data Streams. ACM Trans Inf Syst Secur 13(3):28–12831
(XACML) Version 3.0 http://docs.oasis-open.org/xacml/ Cattell R (2011) Scalable SQL and NoSQL Data Stores. SIGMOD Rec 39(4):12–27
Chamberlin D (2003) XQuery: A Query Language for XML. In: Proceedings of
3.0/xacml-3.0-core-spec-os-en.html
the 2003 ACM SIGMOD International Conference on Management of Data.
28
https://www.eugdpr.org/ SIGMOD ’03. ACM, New York (USA). pp 682–682
Clark T, Warmer J (2002) Object Modeling with the OCL. The Rationale behind Hu H, Ahn G, Kulkarni K (2013) Discovery and resolution of anomalies in web
the Object Constraint Language. LNCS, Volume 2263. Springer, Berlin access control policies. IEEE Trans Dependable Sec Comput 10(6):341–354
Colombo P, Ferrari E (2014a) Enforcement of Purpose Based Access Control Hu H, Ahn GJ, Jorgensen J (2013) Multiparty Access Control for Online Social
within Relational Database Management Systems. IEEE Trans Knowl Data Networks: Model and Mechanisms. IEEE Trans Knowl Data Eng
Eng (TKDE) 26(11):2703–2716 25(7):1614–1627
Colombo P, Ferrari E (2014b) Enforcing Obligations within Relational Database Hu VC, Cogdell MM (2013). Guide to Attribute Based Access Control (ABAC)
Management Systems. IEEE Tran Dependable Sec Comput (TDSC) Definition and Considerations, National Institute of Standards and
11(4):318–331 Technology, Jan. 2014, [online] Available: http://nvlpubs.nist.gov/nistpubs/
Colombo P, Ferrari E (2015a) Efficient Enforcement of Action aware Purpose specialpublications/NIST.sp.800-162.pdf
Based Access Control within Relational Database Management Systems. Hu VC, Kuhn DR, Ferraiolo DF (2015) Attribute-Based Access Control.
IEEE Trans Knowl Data Eng (TKDE) 27(8):2134–2147 Computer 48(2):85–88
Colombo P, Ferrari E (2015b) Privacy Aware Access Control for Big Data: A Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and Challenges of Big
Research Roadmap. Big Data Res 2(4):145–154 Data Research. Big Data Res 2(2):59–64
Colombo P, Ferrari E (2016) Towards Virtual Private NoSQL datastores. In: 32nd Kaiwen S, Lihua Y (2014) Attribute-Role-Based Hybrid Access Control in the
IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Internet of Things. In: Han W, Huang Z, Hu C, Zhang H, Guo L (eds). Web
Finland, May 16-20, 2016. pp 193–204 Technologies and Applications. Springer, Cham. pp 333–343
Colombo P, Ferrari E (2017a) Enhancing MongoDB with purpose-based access Katz J, Sahai A, Waters B (2013) Predicate encryption supporting disjunctions,
control. IEEE Trans Dependable Sec Comput 14(6):591–604 polynomial equations, and inner products. J Cryptol 26(2):191–224
Colombo P, Ferrari E (2017b) Towards a unifying attribute based access control Kencana Ramli CDP, Nielson HR, Nielson F (2013) XACML 3.0 in Answer Set
approach for nosql datastores. In: 33rd IEEE International Conference on Programming. In: Albert E (ed). Logic-Based Program Synthesis and
Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. Transformation. Springer, Berlin, Heidelberg. pp 89–105
pp 709–720 Kulkarni D (2013) A fine-grained access control model for key-value systems.
Colombo P, Ferrari E (2018) Access Control Enforcement Within MQTT-based In: Proceedings of the Third ACM Conference on Data and Application
Internet of Things Ecosystems. In: 23Nd ACM on Symposium on Access Security and Privacy (CODASPY ’13). ACM, New York. pp 161–164. https://
Control Models and Technologies. SACMAT ’18. ACM, New York (USA). doi.org/10.1145/2435349.2435370
pp 223–234 La Marra A, Martinelli F, Mori P, Rizos A, Saracino A (2017) Improving MQTT by
Cugola G, Margara A (2012) Processing Flows of Information: From Data Inclusion of Usage Control. In: Wang G, Atiquzzaman M, Yan Z, Choo K-KR
Stream to Complex Event Processing. ACM Comput Surv 44(3):1–62 (eds). Security, Privacy, and Anonymity in Computation, Communication,
Cugola G, Margara A (2015) The Complex Event Processing Paradigm(Colace F, and Storage. Springer, Cham. pp 545–560
De Santo M, Moscato V, Picariello A, Schreiber FA, Tanca L, eds.). Springer, La Marra A, Martinelli F, Mori P, Rizos A, Saracino A (2018) Introducing Usage
Cham Control in MQTT. In: Katsikas SK, Cuppens F, Cuppens N, Lambrinoudakis C,
Dayarathna M, Perera S (2018) Recent Advancements in Event Processing. Kalloniatis C, Mylopoulos J, Antón A, Gritzalis S (eds). Computer Security.
ACM Comput Surv 51(2):33–13336 Springer, Cham. pp 35–43
Dean J, Ghemawat S (2004) MapReduce: Simplified Data Processing on Large LeFevre K, Agrawal R, Ercegovac V, Ramakrishnan R, Xu Y, DeWitt D (2004).
Clusters. In: Proceedings of the 6th Conference on Symposium on Limiting disclosure in hippocratic databases. In Proceedings of the
Opearting Systems Design & Implementation - Volume 6. OSDI’04. USENIX Thirtieth international conference on Very large data bases,Toronto
Association, Berkeley. pp 10–10 (Canada), Volume 30 (VLDB ’04), Mario A. Nascimento, M. Tamer Özsu,
Donald Kossmann, Renée J. Miller, José A. Blakeley, and K. Bernhard
den Hartog J, Zannone N (2016) A Policy Framework for Data Fusion and
Schiefer (Eds.), Vol. 30. VLDB Endowment 108-119.
Derived Data Control. In: Proceedings of the 2016 ACM International
Workshop on Attribute Based Access Control. ABAC ’16. ACM, New York. Lin D, Rao P, Bertino E, Li N, Lobo J (2010) EXAM: a comprehensive environment
pp 47–57 for the analysis of access control policies. Int J Inf Secur 9(4):253–273
Ferraiolo DF, Sandhu R, Gavrila S, Kuhn DR, Chandramouli R (2001) Proposed Longstaff JJ, Noble J (2016) Attribute based access control for big data
NIST Standard for Role-based Access Control. ACM Trans Inf Syst Secur applications by query modification. In: Second IEEE International
4(3):224–274 Conference on Big Data Computing Service and Applications, BigDataService
Ferrari E (2010) Access Control in Data Management Systems. Synthesis 2016, Oxford, United Kingdom, March 29 - April 1, 2016. pp 58–65
Lectures on Data Management. Morgan & Claypool Publishers. ISBN: Marra AL, Martinelli F, Mori P, Saracino A (2017) Implementing Usage Control
1608453758 9781608453757 in Internet of Things: A Smart Home Use Case. In: 2017 IEEE
Florescu D, Fourny G (2013) JSONiq: The History of a Query Language. IEEE Trustcom/BigDataSE/ICESS. pp 1056–1063
Internet Comput 17(5):86–90 Migliavacca M, Papagiannis I, Eyers DM, Shand B, Bacon J, Pietzuch P (2010)
Guelev DP, Ryan M, Schobbens PY (2004) Model-Checking Access Control DEFCON: High-performance Event Processing with Information Security. In:
Policies. In: Zhang K, Zheng Y (eds). Information Security. Springer, Berlin, Proceedings of the 2010 USENIX Conference on USENIX Annual Technical
Heidelberg. pp 219–230 Conference. USENIXATC’10. USENIX Association, Berkeley, CA, USA. pp 1–1
Gupta M, Patwa F, Sandhu R (2017) Object-tagged RBAC model for the Nabeel M, Bertino E (2014) Privacy preserving delegated access control in
hadoop ecosystem. In: Livraga G, Zhu S (eds). Data and Applications public clouds. IEEE Trans Knowl Data Eng 26(9):2268–2280
Security and Privacy XXXI. Springer, Cham. pp 63–81 Nehme RV, Lim HS, Bertino E (2010) FENCE: Continuous access control
Gupta M, Sandhu RS (2018) Authorization framework for secure cloud assisted enforcement in dynamic data stream environments. In: 2010 IEEE 26th
connected cars and vehicular internet of things. In: Proceedings of the International Conference on Data Engineering (ICDE 2010). pp 940–943
23nd ACM on Symposium on Access Control Models and Technologies, Ong KW, Papakonstantinou Y, Vernoux R (2014) The SQL++ unifying
SACMAT 2018, Indianapolis, IN, USA, June 13-15, 2018. pp 193–204 semi-structured query language, and an expressiveness benchmark of
Gusmeroli S, Piccione S, Rotondi D (2013) A capability-based security approach SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR. https://doi.org/
to manage access control in the Internet of Things. Math Comput Model abs/1405.3631
58(5):1189–1205. The Measurement of Undesirable Outputs: Models Ouaddah A, Bouij-Pasquier I, Elkalam AA, Ouahman AA (2015) Security analysis
Development and Empirical Analyses and Advances in mobile, ubiquitous and proposal of new access control model in the Internet of Thing. In:
and cognitive computing 2015 International Conference on Electrical and Information Technologies
Hemdi M, Deters R (2016) Using REST based protocol to enable ABAC within (ICEIT). pp 30–35
IoT systems. In: 2016 IEEE 7th Annual Information Technology, Electronics Ouaddah A, Mousannif H, Elkalam AA, Ouahman AA (2017) Access control in
and Mobile Communication Conference (IEMCON). pp 1–7 the Internet of Things: Big challenges and new opportunities. Comput
Hernández-Ramos JL, Jara AJ, Marin L, Skarmeta AF (2013) Distributed Netw 112:237–262
capability-based access control for the internet of things. J Internet Serv Inf Paci F, Squicciarini A, Zannone N (2018) Survey on Access Control for Community-
Secur (JISIS) 3(3/4):1–16 Centered Collaborative Systems. ACM Comput Surv 51(1):6–1638
Pasarella E, Lobo J (2017) A Datalog Framework for Modeling

Relationship-based Access Control Policies. In: Proceedings of the 22nd
ACM on Symposium on Access Control Models and Technologies
(SACMAT ’17 Abstracts). ACM, New York. pp 91–102. https://doi.org/10.
1145/3078861.3078871
Puthal D, Nepal S, Ranjan R, Chen J (2015) Dpbsv – an efficient and secure
scheme for big sensing data stream. In: 2015 IEEE
Trustcom/BigDataSE/ISPA, vol. 1. pp 246–253
Rajpoot QM, Jensen CD, Krishnan R (2015) Integrating Attributes into
Role-Based Access Control. In: Samarati P (ed). Data and Applications
Security and Privacy XXIX. Springer, Cham. pp 242–249
Rizvi S, Mendelzon A, Sudarshan S, Roy P (2004) Extending query rewriting
techniques for fine-grained access control. In: ACM SIGMOD 2004.
pp 551–562
Shalabi Y, Gudes E (2017) Cryptographically Enforced Role-Based Access
Control for NoSQL Distributed Databases. In: Livraga G, Zhu S (eds). Data
and Applications Security and Privacy XXXI. Springer, Cham. pp 3–19
Tsankov P, Marinovic S, Dashti MT, Basin D (2014) Decentralized Composite
Access Control. In: Abadi M, Kremer S (eds). Principles of Security and Trust.
Springer, Berlin, Heidelberg. pp 245–264
Ulusoy H, Colombo P, Ferrari E, Kantarcioglu M, Pattuk E (2015) GuardMR:
Fine-grained Security Policy Enforcement for MapReduce Systems. In:
Proceedings of the 10th ACM Symposium on Information, Computer and
Communications Security. ASIA CCS ’15. ACM, New York. pp 285–296
Ulusoy H, Kantarcioglu M, Pattuk E, Hamlen K (2014) Vigiles: Fine-Grained
Access Control for MapReduce Systems. In: 2014 IEEE International
Congress on Big Data. pp 40–47
Warmer JB, Kleppe AG (1998) The object constraint language: Precise
modeling with uml (addison-wesley object technology series)
Zhang G, Tian J (2010) An extended role based access control model for the
Internet of Things. In: 2010 International Conference on Information,
Networking and Automation (ICINA), vol. 1. pp 1–3191323
Zhang N, Ryan M, Guelev DP (2005) Evaluating Access Control Policies
Through Model Checking. In: Zhou J, Lopez J, Deng RH, Bao F (eds).
Information Security. Springer, Berlin, Heidelberg. pp 446–460
Zhang X, Parisi-Presicce F, Sandhu R, Park J (2005) Formal Model and Policy
Specification of Usage Control. ACM Trans Inf Syst Secur 8(4):351–387

Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends

Uploaded by

Copyright:

Available Formats

Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends

Uploaded by

Copyright:

Available Formats

Colombo and Ferrari Cybersecurity (2019) 2:3

Access control technologies for Big Data

Introduction Data platforms outperform traditional systems even with

Endnotes Competing interests

Pasarella E, Lobo J (2017) A Datalog Framework for Modeling

You might also like