1 Introduction
An increasing number of real-time pervasive computing applications rely on information gathered from multiple data sources to provide relevant and dependable features to users.
Internet-of-Things (IoT) is a primary enabler in this regard because aggregating and inferencing with big IoT data can provide insights about entities, their environment, and how they interact [
34]. While the 3Vs – Volume, Velocity, and Variety are implicit challenges to any data intensive system [
33], middleware-based
Context Management Systems (CMSs) have to efficiently manage large-scale context retrieval, inferencing, storage, and delivery of highly-transient context information for a wide spectrum of quality requirements under uncertainty attributed to variations in context query features and changes in real-world situations. Unlike data caching techniques which are based on prior knowledge (e.g., transition probabilities of cache states, file libraries), historical knowledge about situations and context can be scarce or causing uncertainty; hence, there is a critical need to develop heuristic methods. Providing relevant and dependable (i.e., adequately reliable) context information for time-critical applications is further constrained by limited computing resources [
16], e.g., IoT devices are often self-contained low powered devices. Dependability and relevancy of context information can significantly suffer in these conditions. For instance, a traffic congestion at an upcoming intersection is no longer relevant to a driver using a traffic alerting system after merging with it. The dependability of the application, hence, reduces. Storing all context information for reuse as in traditional data caching is seemingly a viable option where available (provided not all devices in a distributed environment are cache enabled), but the ephemeral validity period of fast varying context information results in short cacheable periods unless refreshed or evicted – both of which are expensive operations, either because of the need for frequent retrievals or cache misses. The representational and structural challenges such as maintaining affinity among related situations and the challenges to maintain (i) the relationships between the different logical levels of context information and (ii) entity relationships in cache memory without incurring redundant context management costs are unique challenges in context caching described in our previous work [
44,
46]. We summarize the solution to solve the unique research challenges in adaptive context caching by the 5Ws [
44] –
What context to cache,
When to cache context,
Where to cache context,
hoW to cache context, and
Who cache context? We implement and test this solution in the paper.
1.1 Difficulties and Challenges in Context Management
According to the definition given by Abowd et al. [
1], context is “any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves”. We will consider four challenges when caching context information based on this definition as follows.
(1)
Context information can exceed the volume of the raw data because context explains a certain piece of data or an entity in an environment infusing extrinsic elements to the description. That is, context caching should involve managing a collection of data and derived information along with their meta data that can describe multiple logical and physical relationships [
46]. Consider higher-level context inferred using the Context Space Theory [
7] such as the situation of a room. The inputs can be the noise level, light intensity, and a video stream. The context, as an output, is a JSON-LD [
37] document. Context information can also have multiple alternate representations [
18] and applications might be interested in different aspects of the same context. Traditional data caching methods may not be well applicable to context caching, because pieces of data are typically treated individually in contrast to the inter-related temporal and spatial context. So, a novel caching approach is urgently needed to handle such large amounts of heterogeneous contextual data efficiently.
(2)
Context is transient, e.g., the situation of a person can change at any moment. That is, a new context caching approach needs to be very adaptable and flexible in near real-time.
(3)
Context is both time and quality critical. For instance, consider notifying a driver of a potential hazard (i.e., a situation, or in other words the context about the driver in the particular environment). Context information needs to be accurate (e.g., specific hazard), delivered on time (e.g., giving adequate time to react) to be relevant, and dependable (e.g., save the driver from an accident). While traditional data caching that minimizes response time focus on storing data items with long cacheable lifetime, a novel context caching approach is urgently needed to adequately manage the freshness and applicable varying Quality of Context parameters (i.e., accuracy, precision) in addition.
(4)
Big IoT data streams used to infer context are heterogeneous, e.g., in size and format. IoT data streams can be inconsistent, so that context data might not show any clear patterns [
29]. There can exist occasional errors in retrieved context data or losses in QoC [
11]. For instance, if a context provider which used to refresh cached information is no longer available, a caching approach would need to make cost and quality based decisions between evictions and alternative provider refreshing [
47]. That is, a new context caching approach needs to cost-efficiently manage the adaptive context caching lifecycle without compromising any of QoS, QoC, or cost-efficiency.
In summary, developing a Context Management System that solves all the above four challenges under uncertainty is not trivial. Related examples can be found in the state of the art [
8,
19,
22,
51,
53] which suffer either from scalability, complexity, inability to generalize to a large set of scenarios and situations and/or cost.
1.2 Motivating Scenario – a Busy City Circle
Let us consider a real-world scenario surrounding a city circle with multiple intersections. We illustrate this scenario in Figure
1. Assuming all are pull-based context queries [
14], Table
1 lists several context queries that originate from real-time navigation applications, i.e., context consumers, using the
Context Definition and Query Language (CDQL) [
13,
14].
Based on any of the context queries above, the users are dependent on the CMS to provide valid, reliable context information about the car parks, bike parks, weather, parks, and the like, to make smarter decisions. For instance, in Q7 the rider is dependent on the CMS to notify with minimal latency. But the rider will not be satisfied with false positive hazards based on stale and invalid context about the intersection retrieved from the cache memory. On the contrary, fresher context (which correlates to cost) that takes longer to retrieve and derive would have no relevance if the rider is notified late. So, how can performance- and cost-efficiency be maximized having minimum effect on freshness, i.e., dependability?
Context queries can be broken down into multiple sub-queries called context requests. Context requests are identified at the execution time. For instance, in our architecture, the query coordinator in the
Context Query Engine (CQE) coordinates the execution of the sub-queries to produce the aggregated context query response from the
Context-as-a-Service (CoaaS) platform [
15]. Figure
2 illustrates the query breakdown for
\({Q_4}\) and
\({Q_5}\). Blue colour is used to indicate the context query, orange colour for sub-queries, green for context entities, and yellow for context attributes. The arrows indicate the logical relationship between the different context. Context in the immediate child nodes is either those required to (i) derive the context depicted in the parent node (e.g., “
get car parks” is required to generate context response for Q
4), or (ii) describe the context in the parent node (e.g., address and location attributes describe the building entity).
Notice the sub-query “good for jogging” shares the same entities and context attributes as “good for walking”, i.e., temperature and probability of rain attributes of the weather entity. It is prudent to cache this context for reuse and avoid redundant retrievals. We see this notion being used as the basis in the work of Khargharia et al. [
23]. The other function of context data caching is repurposing. Assume the
\(ca{r_5}\) executes
\({Q_2}\) travelling close to
\(ca{r_3}\) that executes
\({Q_1}\). Retrieved data of nearby car parks for
\({Q_1}\) could be repurposed for
\({Q_2}\) using only a filter. But the quality requirements for each context vary with time. Uncertainty in situations is also a contributing factor to variation in context needs. So, the motivation of this paper is to develop a more flexible, fast adaptive context caching approach that takes into consideration the features of context, context queries, situations, and quality requirements under uncertainty which have been largely ignored in similar works over traditional popularity based (reusability-oriented) techniques that may not guarantee of all aspects of relevance and dependability – (i)
Quality of Service (QoS), (ii)
Quality of Context (QoC), and (iii) cost-efficiency.
1.3 Novel Contributions of this Paper
This paper presents a novel approach for adaptive context caching different from traditional data caching. We also present experimental results using our prototype demonstrating how our mechanism is effective in delivering relevant and dependable context information for the context consumers in the motivating scenario using a simulated context sub-query load. The main contributions of this paper are as follows:
•
we propose a novel, scalable reinforcement learning based approach, first of its kind, which makes selective context caching decisions aimed at maximizing the cost- and performance-efficiency of a CMS,
•
we propose and evaluate novel policies and heuristic methods not available in previous literature for adaptive context caching approaches such as eviction and dynamic cache memory scaling to mitigate uncertainty and lack of prior knowledge about context, e.g., situations,
•
we show that our approach is
long-term optimal and short-term aware [
44] unlike any previous adaptive caching techniques in the literature and faster than existing work to adapt even under uncertainties unique to context information management – important functional requirements of adaptive context caching techniques,
•
we evaluate the cost of responding to context queries and the Quality of Service (QoS) under multiple settings to identify the most effective adaptive caching configuration to achieve the multiple efficiency objectives, and
•
we show that our proposed algorithm performs without prior knowledge and makes adaptive caching decisions based on performance metrics while taking into account features of the dynamic query load in near real-time.
The rest of the paper is structured as follows. Section
2 discusses the related work in
adaptive context caching (ACOCA). Section
3 discuss the concepts foundational to ACOCA and Section
4 presents our ACOCA mechanism including heuristics and cache memory organization. Section
5 describes the implementation of the ACOCA prototype and experimentations. We extensively discuss our observations in this section. Section
6 concludes and highlights the direction for future work in this area.
2 Related Work in Adaptive Context Caching
The relevancy and dependability of context information are essential when users rely on pervasive computing applications to make time-critical decisions. As in the motivating scenario, users such as drivers rely on the application to provide reliable context information about the parking slots, weather, and their own current state to make the preference choices. For instance, if the context about a selected car parking facility is invalid, let's say,
goodForWalking was inferred incorrectly, the driver would not be able to park and transit to the destination conveniently. The validity and relevance can be a matter of a number of factors, including the selection of Quality of Context (QoC) and quality aware context providers [
21,
27]. But there is a limit to how much selection can improve the dependability and reliance of context. Consider a context service that samples all the car parking slots each second. QoC metrics such as timeliness (i.e., freshness) and accuracy of the number of available slots are significantly high. But what if each data ingestion costs the context management middleware? Infrequent variation in availability makes constant retrievals redundant (i.e., longer context lifetime [
45]), and continuously monitoring the context of car parks that the drivers show minimal interest in is irrational. In summary, continuous monitoring and storing context information, dubbed the database mode of operation for Context Management Systems (CMSs) are expensive. Unfortunately, state-of-the-art CMS such as FIWARE Orion [
53], Nexus [
19],
Context Information System (CIS) [
22], COSMOS [
8], and CA4IoT [
32] leverage the database mode of operation since it facilitates a fast query execution strategy, abstracting the complexities of on demand context retrievals. Translating context queries written in CDQL [
14] or NGSI [
53] into database native queries, e.g., CDQL to MongoDB queries in [
14], executes fast enough satisfying Quality of Service requirements. Hence, the rationale to use the database mode is agreeable because we already provided an example of how retrieval latency (a QoS metric) can affect the relevance (a QoC metric) of context about the road (i.e., the congestion situation) to a driver.
FIWARE Orion use specialized databases in Generic Enablers [
54], while Nexus and CIS desire to manage context information that are unlikely to change in the near future, e.g., location of places. These underlying assumptions are impractical when pervasive computing applications are rather interested in context that change frequently as in the scenario. In fact, we already stated that the size of context about an entity can be much larger since it involves aggregating and inferencing from a number of heterogeneous data streams, e.g., context about a car park can change because of its internal features such as occupancy rate at a certain time of the day, and any events that occur around the car park such as community or sporting events (which could only be derived from a different data stream). Monitoring and storing a large of volume of data streams is storage-inefficient and expensive. So, these techniques are not sufficiently scalable in the IoT ecosystem. We understand that the cost of context management can further aggravate if these systems incorporate techniques to verify the validity of the context information in storage [
44] before responding to context queries from storage. Inadequate sampling rates [
47], and asynchrony between lifetime of context and network queuing times [
38,
45], long retrieval latencies can significantly deteriorate the validity and quality of context [
35], owing to ingestion from alternative context providers [
47] increasing the need for context storage, redundant retrieval of the same context information – resulting in a continuously depleting cost-efficiency as argued in [
46] to be at least
\(O( {M + C} )\) in the best case scenario where
M is the number of context information and
C is the number of related context. Hence, database mode pays a large cost for certain aspects of performance-efficiency (such as context query response latency) although there is no guarantee of quality.
The other end of the spectrum of CMS operations is the redirector mode where context information is retrieved on demand for each query without storage. While the asynchrony in network queues, response delays, heterogeneity in IoT specifications (e.g., sampling rates, precision, and accuracy of measurements), availability of context providers are major limitations to redirector mode, and the total cost of retrieving context for every context query is an expensive drawback, even though it guarantees the freshness of context. However, redirector mode is extremely slow and not scalable. In essence, there is an urgent need to develop a technique that combines the merits of both the database and redirector modes such that CMSs could effectively handle scale, reduce cost, deliver context information that is adequate quality (based on the requirements of the user that defines how relevant the context is), at least response latency.
While caching context appears a feasible solution to the above problem based on related literature in traditional data caching, there is limited application of these solutions to CMSs. Firstly, context information is significantly different from data that is discussed in the literature, e.g., transiency. We critically analyse the unique ACOCA challenges compared to data caching in [
46] and further argue the inapplicability of existing data caching techniques to ACOCA in [
44]. Work in context caching is in its infancy and few of the existing works require further work to achieve cost-, performance-efficiencies while adhering to QoC. For instance, Khargharia et al. [
23] proposed a prefetching technique based on expected popularity of low-level context information. While it is tested in database mode, it fails to provide evidence of freshness of context and cost-efficiency which is a logical extension of the drawbacks using database mode. Hence, we argue the inadequacy of using only the popularity of context access as in data caching due to loss of validity in cache memory, minimizing the effective reusability (in contrast to traditional data, which without a lifetime attributed to it, can be reused as long as it is cached) [
44]. Schwefel et al. [
35] propose a theoretical model for synchronously retrieving context minimizing the loss of freshness, but provide no evidence about how this model would adapt with respect to transient situations, dynamic query load features, and quality requirements in near-real time. In Section
1, we argued the need for near-real time adaptiveness to situations and other high-level contexts as a functional requirement of ACOCA but [
35] focused only on IoT data. Medvedev [
28] investigates several proactive refreshing techniques to overcome this problem but the CMS has to resort to other processes to profile and negotiate parameters such as lifetime (e.g., as in [
45]), planning periods when handling the uncertainty of context information. The primary problem is, how can any of these techniques adapt fast and flexibly to changes in situations, features of query loads, when there is a lack of prior knowledge about the observations?
There are several reasons for lack of prior knowledge about context information. They are: (a) the emergence of new context information (unavailability of previously observed statistics), (b) lack of observed data by volume, (c) uncertainty in data due to variability in the environment, e.g.,
\(\mathbb{E}\)[AR] of Q
4 under adverse weather conditions versus a calm day between 7:30-8:30 am, (d) imperfections [
18,
25], and/or (e) incompleteness, e.g., having no observed data on AR of context query Q
5 between 3:00-4:00 am although there exist in abundance between 7:30-8:30 am. Hence, it is one of the significant and non-trivial problems in adaptive context caching which Blasco and Gunduz [
6] explicitly state, taking popularity profiles. Sheng et al. [
36] investigated a strategy using no priors, but in traditional data caching which is not fully applicable to transient context information also given their inherent complexities [
46]. So, there is a significant research gap in literature to mitigate this problem arising from the reasons above.
Utilizing Markov Decision Process and control theory for adaptation in caching has also been studied in the literature. For instance, Zhu et al. [
50], Sheng et al. [
36], and Nasehzadeh et al. [
31] leverage
Reinforcement Learning (RL) but still develop solutions based on known priors such as data libraries that are not applicable in context caching. Defining a state space similarly to previous work in data caching for context caching is infeasible as no system can predict all the situations and potential viable caching actions in advance for the future, e.g., the transitional probability distribution for each context
i - cache action
\({a_i}\) pair are unknown in contrast to how they are known in data caching.The techniques we need to develop for context caching need to be able to manage these unknowns, while being coherent with how the caching decision at any stage of lifecycle [
44] would impact the cost-, performance-efficiency and QoC aspects.
4 Adaptive Context Caching Algorithm
In this section, the rest of the aspects of the ACOCA lifecycle in the adaptive context caching approach are discussed in detail. The design specifics of the reinforcement learning agents, end-to-end process flow of the agents, proposed heuristics to overcome uncertainty, eviction policies, cache organization, and context cache memory scaling are described.
4.1 Actor-Critic Agent
We define the selective caching problem as a
Markov Decision Process (MDP). A state
\(( s )\) is a vector containing 15 features. They are the observed
\(AR{( t )_i}\) and
\(HR{( t )_i}\),
\(\mathbb{E}[ {AR{{( t )}_i}} ]\),
\(\mathbb{E}[ {HR{{( t )}_i}} ]\),
\(\overline {C{L_i}}\), average latency to retrieve, and average retrieval cost as indicated in Table
3. These features were chosen to reflect the (i) popularity, (ii) contribution to efficiency as a result of caching, (iii) ability to reliably retrieve and derive context, e.g., context provider
\(q = 1\) is more reliable than
\(q = 2\) when
\(avialabilit{y_{q = 1}} > availabilit{y_{q = 2}}\) and
\(Re{L_{q = 1}} > Re{L_{q = 2}}\), and (iv) ability to maximize the cache lifetime. Observed and expected timelines are discretized using the
\(\langle {short,mind,long} \rangle\) tuple. The state space
\(( \mathcal{S} )\) is, hence, defined using continuous variables. We have not considered the size of context as a feature in this work despite related work in data caching [
24,
49], assuming the low level context considered in our scenario are uniform in size. The goal is to cache the maximum number of contexts that yield incremental returns (i.e., the reward) to the CMS.
A context caching action
\(( {{a_i}} )\) in the action set
\(= \{ {0,1} \}\) corresponds to not-caching or caching a context value
i. The context caching action space is
\(\mathcal{A} = \{ {{a_1},{a_2}, \ldots {a_i}} \}\) for
\(\forall i.\) Given the context caching decision is binary, the output of the agent is the probability to make one of the decisions. Let us take the probability to cache a piece of context
\(P( {{a_i} = 1} ) = 1 - P( {{a_i} = 0} )\) for that matter. The context caching policy is
\(\pi ( {a{\rm{|}}s} )\) accordingly. Consider that
\(Rewar{d_\tau }\) is the cumulative earning to the CMS from responding to context queries after
\(\tau\) number of windows where
\(\gamma\) is the discount factor. Then
\(Rewar{d_\tau } = \mathop \sum \nolimits_{t = 0}^T ({\gamma ^t} \times {r_{t + \tau }})\) and the optimal selective context caching policy
\({\pi ^*} = argma{x_\pi }( {\mathbb{E}[ {Rewar{d_\tau }{\rm{|}}\pi } ]} )\). The value function
\(( {{V^\pi }} )\) and Q-functions
\(( {{Q^\pi }} )\) for a selective context caching policy can be defined as follows given
\(p( {s'{\rm{|}}s,a} )\) is the transition probability to the new state
\(s'\) from
s taking context caching action
a:
Figure
4 depicts the system design of the sub-components of the
ACAgn. We propose an event-driven architecture for the agents developed using publisher-subscriber and observer patterns to handle streams of context information to evaluate for caching and trigger reward calculations, respectively. The state translator converts a context to the respective state which is published to the queue. Then the actor-critic network reads from the queue and makes the selective caching decisions. Context selection decisions are logged in the
Decision History Registry (DHR) and
Delay Time Registry (DTR) or
Cache Life Registry (CLR) if the decision was not to cache and cache, respectively. Each piece of cached context is profiled using the
Cache Profiler (CP). The Event Listener listens to two types of events originating from the DHR, CLR, and/or CP. Firstly, to “calculate reward” events. This event is triggered when (i) a piece of context information is prematurely evicted, i.e., before
\(\mathbb{E}[ {C{L_i}} ]\) elapses, (ii)
\(\mathbb{E}[ {C{L_i}} ]\) has elapsed when
\({a_i} = 1\), (iii)
\(\mathbb{E}[ {D{T_i}} ]\) has elapsed when
\({a_i} = 0\), or (iv) the dedicated limited-sized list collecting the sequence of
\(Re{t_i}\) in the cache profiler is full. Secondly, to “extend cache life” events that are triggered when a piece of context information avoids eviction even after the
\(\mathbb{E}[ {C{L_i}} ]\) has elapsed. Accordingly, the first event causes the actor-critic network to learn and the latter to make selective caching decisions.
The output of the actor-critic network
\(( {0 \le {v_i} \le 1} )\) – the probability to cache can be described as a measure of
fitness to cache over the estimated
\(\mathbb{E}[ {C{L_i}} ]\) defined in the feature vector. For instance,
\({v_i} = 1\) denotes the context can be cached for the entire
\(\mathbb{E}[ {C{L_i}} ]\). On the contrary,
\({v_i} = 0\) denotes not to cache the context. So,
\({v_i} = 0.5\) is an ambiguous state. Using this principle, we can map the output value to our binary cache action
\(( {{a_i}} )\) as:
Accordingly, we define \(C{L_i} = ( {{v_i} - 0.5} ) \times \overline {C{L_i}} /0.5\) and \(D{T_i} = ( {0.5 - {v_i}} ) \times mid/0.5\).
As indicated in Figure
4, we use a delayed reward calculation scheme using event triggers and listeners to capture the actual return from the cache action as opposed to calculating expected values for the reward at the decision epoch, e.g., as in traditional data caching techniques [
17,
31]. The window during which the reward is calculated is referred to as the
learning epoch. Consider
\({R_{i,t + \tau }}\) is the number of context requests that access context
i between the selective decision epoch
t and the learning epoch after
\(\tau\) windows. Then,
\(\tau = D{T_i}\) when
\({a_i} = 0\) and
\(\tau = \min ( {C{L_i},long,{\tau _{full}}} )\) given
\({a_i} = 1.\) \({\tau _{full}}\) is the time at which the dedicated limited-sized list collecting the sequence of
\(Re{t_i}\) in the cache profiler gets full. Then the reward is calculated as follows:
The function \(f\) assigns a large negative reward to context information that have not recorded any or has recorded infrequent intermittent accesses subsequent to being cached. Such context information is cost-inefficient to cache because they incur hold-up costs – the cost of occupying the cache space for no return. Function \(g\) assigns positive reward for \({a_i} = 0\) decisions for such context. But \(| g | < | f |\) because we encourage the agent to maximize the number of context information that get cached \(( {{a_i} = 1} )\) considering that one of our objectives is to minimize RT for time-critical context queries.
4.2 Deep Deterministic Policy Gradient Agent
An actor-critic network that searches for the optimal ACOCA policy was implemented to compare with
ACAgn. DDPGAgn is implemented using the same event driven architecture depicted in Figure
4. Rewards are calculated similarly as in (5), (6), and (7). We implemented four deep neural networks (i.e., Q-networks): an actor network, a target-actor network, a critic network, and a target-critic network to improve self-learning [
26,
30].
Consider the previous agents –
StatAgn, and
ACAgn. When to cache context decision was made using the caching or not caching decision as a confidence metric, e.g., as in (4). This solution results in either caching a piece of context for at least the estimated period
\(C{L_i}\) or delay evaluating for caching for
\(D{T_i}\). Both
\(C{L_i}\) and
\(D{T_i}\) are continuous values of time. On this basis, we define a continuous action space
\(\mathcal{A}{\rm{:= }}[ { - long,long} ]\) for the
DDPGAgn where
\({a_i} \in \mathbb{R}\) (
\(\mathbb{R}\) is the set of real numbers). Then the discrete context selection actions (i.e., cache or not cache) can be defined as follows:
Based on the work of Lillicrap et al. [
26] on continuous controls, (3) can be modified as follows to avoid the inner expectation when the optimal policy is deterministic
\(( {\varrho :\mathcal{S} \leftarrow \mathcal{A}} )\).
Accordingly, \({Q^\varrho }\) can be learnt off-policy, i.e., using Q-Learning, using a transitional probability distribution generated from experience gathered during runtime by a different process (which we implement in the form of Cache Profiler). It enables it to overcome the lack of prior knowledge problem when adapting to context and context query variations. The actor uses a parameterized function \(\varrho (s|{\theta ^\varrho })\) where \(\theta\) are the parameterized performance metrics, i.e., AR, HR, retrieval costs,and so on. The critic network implements the Bellman equation above in (8).
A crucial difference between the
DDPGAgn and
ACAgn is the learning process.
ACAgn updates the model for each context
i in the Decision History. This is computationally expensive, and the process does not generalize for heterogeneous context information (which is unique feature of context query loads) since the model attempts to optimize for each piece of context information. It is a significant drawback to the scalability of an ACOCA mechanism when the context space is large. So, the
DDPGAgn leverage a mini-batch processing technique that samples
\(\vartheta\) recent decisions, e.g., 16 decisions, and soft target updating from [
26] to update the network and target-networks in each learning cycle to minimize biasness to certain situations. The soft target update function can be indicated as follows.
4.3 Context Cache States
Figure
5 depicts the state-transition diagram of a piece of context information. The ghost cache state refers to being temporarily moved to a buffer memory for eviction until any substantive context queries are responded using the context in concern before its freshness is completely lost. A piece of context information is evaluated asynchronously for caching either when (i) it is not cached, or (ii)
\(C{L_i}\) has elapsed. We use the term
complete hit to describe access to cached context that also conform to quality requirements of the querying consumer, e.g., validity of the number of available parking slots for the navigation application. A
partial hit (or also called a
partial miss) occurs when a derived high-level context needs to be re-interpreted and refreshed using lower-level context where at least one of the low-level context used to derive it is invalid to change the currently cached context value. A complete miss is when neither of the above criteria is met.
According to [
28,
45], context retrievals account for 20-25% of the costs incurred by a CMS whereas the rest is attributed to penalties incurred using the redirector mode. We contend that our cost-efficiency objective can be achieved through minimizing the response latency of responding to context queries. The cost can be further minimized by managing the number of retrieval operations including refreshing. We develop a three part solution for this matter. First, we bypass context selection delays enforced by
\(D{T_i}\) when a registered event is recognized in the continuous event monitor. Secondly, our approach leverages
shared context retrievals – using a single retrieval operation to service a number of parallel queries that share similar context requirements. All cached context, e.g., context attributes, and entities, which can be derived using the data from the same context provider are refreshed from a single retrieval operation. Thirdly, the reactive refreshing strategy [
28,
45] is used in this paper so that the retrievals occur only when the context is requested and invalid, delaying the need for retrieval.
4.4 Handling Uncertainty: Heuristics, and Context State Space Exploration
Consider applying (5) for a piece of context information that lacks previously observed data to accurately calculate the
\(\mathbb{E}\)[HR]. Therefore, we calculate
\(\mathbb{E}\)[
\(C{L_i}\)] based on the estimated
\(\mathbb{E}[ {HR} ]\) in (
11) using on Figure
6. The residual lifetime [
41] remaining to cache,
\(ResiL = ( {{f_{arrive}} - {f_{thresh}}} ) \times \overline {ReL} /( {1 - {f_{arrive}}} )\) given
\({f_{arrive}} = 1 - \overline {ReL} /{L_i}.\) HR
\(\to\)0 when
\({f_{arrive}} \le {f_{thresh}}\) because
\(ResiL \le 0\). Otherwise, considering a minimum
\(\mathbb{E}[ {HR} ] = 0.5\) that could be translated into the criteria
\(ResiL \ge 2/\mathbb{E}[ \lambda ]\).
\(\mathbb{E}[ {HR} ]\) is defined as:
Similar notion to (10) is found in [
39]. It is the probability of an access being a hit
\(({P_{hit}}( i )\)) defined in
GoodFetch – a cached item that balances access and update (defined as
refreshing in our scope) frequencies.
Another approach to overcome uncertainty and lack of prior knowledge about context is to explore the state space using an \(\varepsilon\)-greedy method. It randomly caches context information at a probability of \(\varepsilon\). Although it is simple to implement, complete random exploration cannot minimize convergence time, performance- and cost-efficiency degradations if the context state space is large. Rather, we use the Most Frequently Used (MFU) strategy to cache context information greedily based on the popularity factor. The choice of this heuristic was made based on experimental observations comparing against the Most Recently Used (MRU) and complete random strategies, the discussion of which is beyond the scope of this paper.
A further drawback of
\(\varepsilon\)-greedy exploration in the literature is the lack of adaptiveness [
20]. Context query loads are generally heterogeneous but can exhibit, (i) homogeneity, e.g., a higher volume of context queries searching for alternate routes after a roadside accident, and/or (ii) consistency, e.g., recurring spatial-temporal query patterns such as searching for a car parking space during rush hours. Constant
\(\varepsilon\) value could result in unnecessary explorative caching in these scenarios resulting in cache-inefficiencies, especially when the cache memory is limited. It is rather more rational to exploit – relying on the known actions that generate higher reward, e.g., caching context
i which generated a positive reward from caching, for a similar situation. On the contrary, more exploration, i.e., large
\(\varepsilon\), could be beneficial when the queries are highly random. We adapt
\(\varepsilon\) by modifying
\(\varepsilon = \varepsilon \pm \zeta\) after each
\(cycl{e_{learn}}\) [
26] as a result. The dynamism of the context queries is quantified using
\(\overline {Reward}\) for all the context caching decisions since the last
\(\varepsilon\) modification event. For instance, consider a stream of context queries that request a set of novel context information. A trained SCA may decide not to cache due to the lack of statistical observations and negative expected values, e.g., (4) could yield a negative average reward despite the context showing high AR.
\(\varepsilon\) is incremented if
\(\overline {Reward} < Rewar{d_{max}}\) or decremented otherwise to adapt to this situation.
We also introduced an adaptive \(\delta\) that modifies \(\delta = \delta \pm \omega\) after a constant number of decisions referred to as the learning cycle \(( {Cycl{e_{learn}}} )\) to reflect the current level of volatility in the query load. For instance, \(\delta\) is reduced by \(\omega\) in default for each window unless a \(cycl{e_{learn}}\) occurred that increments \(\delta\) by \(\omega\). The rationale is \(cycl{e_{learn}}\) occurs frequently when the queries demand a heterogenous set of context information to be evaluated for caching – a feature of volatility.
4.5 Context Cache Memory Organization
In-memory caches are faster yet more expensive compared to persistence-level caches used in literature, e.g., [
28]. Further, devices that share memory units with cache, e.g., routers, suffer from data volatility. Although contemporary in-memory technologies such as Redis [
55] support replication across distributed nodes, transient context cannot be recovered but only reproduced in case of a loss [
46] because transiency does not guarantee recovered context from replication instances to comply with Quality of Context parameters, e.g., freshness. In this paper, we can assume that context originating from IoT devices is small (because of the low level context considered in this paper) and reliably retrieved. So, context can be reproduced fast (based on related literature [
4]). We opt to in-memory context caching based on the following reasons: (i) low cache seek time suitable for responding to time-critical queries, and (ii) to test the generalizability of ACOCA in a system where main memory is shared with cache (as at the network edge).
Medvedev [
28] proposed a logical cache hierarchy which is extended in this paper. Consider Figure
7 below. It illustrates an alternate simplified context hierarchy. As we will discuss next, the context cache processes follow either top-down, e.g., for eviction, or bottom-up, e.g., when performing context cache placement and refreshing, cache scans.
A unit of cache memory is defined logically in our cache memory organization. We chose this design because of two advantages. Firstly, it separates the complicated physical memory management tasks from the logical cache. For instance, consider a Cloud-based cache memory where the context information is physically fragmented across multiple distributed nodes. The defragmentation routine may result in scaling down the number of physical instances to, e.g., minimize hold-up costs. But the logical cache memory when accessed by the CMS will be location transparent, oblivious to scaling, and fragments – suitable for pervasive context-aware applications to dependably access context information from context cache memory. Secondly, the logical context cache memory organization maintains the logical affinity of hierarchically related context information [
44]. The context cache memory is, hence, scaled in logical units; however, corresponding to the piece of context information that is relatively highest in the logical hierarchy among the cacheable context, e.g., in terms of number of context entities in the low-level context cache.
4.6 Eviction Heuristics
One of the features of cache memories in a distributed system is the diversity of capacity and technology. Devices towards the edge are often limited in size and shared [
3] whereas dedicated, scalable, and large sized cache memories can be found in the Cloud [
9]. A critical evaluation of previous literature suggests several advantages of leveraging scalable cache memories to cost-effectively handle volatile context query loads [
44]. Although comprehensive solutions for issues such as post-scaling degradation or performance cliffs [
10,
40], and distributed fragmentation [
55] are yet to be investigated, overall performance gains outweigh the disadvantages. We discuss a criterion to adaptively scale context cache memory as a part of the adaptive context caching and eviction processes in this section based on this rationale.
We assume that the pay-as-you-go scheme of Cloud caches involves making an upfront payment for each unit of cache, e.g., $1.68 per GB per month for a stateful File Cache instance [
52]. The cost of caching is distributed among all the context information that share the cache memory. So, an underutilized Cloud cache is cost inefficient. According to Figure
8, we attempt to evict context(s) only when the remaining cache space is insufficient. Consider the logical hierarchy of the context cache. A top-down approach to eviction can release a large cache memory space in a single eviction cycle.
We define two types of evictions: (i) mandatory, and (ii) selective. Mandatory eviction involves removing a piece of context, e.g., an entity, and all its child context in the logical hierarchy, e.g., context attributes, from the cache. Selective eviction removes only lower-level context which satisfy the decision criteria of the eviction algorithm. For example, only the speed attribute is evicted from an entity car, while the attribute direction remains in the cache. We use a threshold \(( \eta )\) based approach to categorize the types of evictions, the specific definition of which depends on the eviction algorithm.
We compare the performance of windowed (i) random eviction, (ii)
Least Frequently Used (LFU), (iii)
Least Valued First (LVF) [
24], and (iv) our novel
Time-Aware Hierarchical (TAH) eviction policies. TAH combines the advantages of time-awareness in TLRU [
5] with LVF [
2] and LFU as indicated in Figure
9. Hierarchically, TAH applies LVF and LFU for selecting the evitable entities and attributes, respectively. Previous literature in caching specify that access to objects follows a Zipf-like distribution [
39,
48] and popular items are expected to create higher hit rates [
48]. Raw data from context providers are the frequently accessed in this regard since they get reused with context attributes having multiple providers. So, context attributes can also share this feature. Context attributes are the lowest in the logical context hierarchy and evicting attributes based on the frequency of access is logical. According to [
14] as well, sub-queries are based on entities. We have also shown above in Section
2 that criticality differs for context queries comparing Q
1 and Q
7. It is more useful to evict entities based on their significance in contribution to queries which we quantify using the “value”. The TAH policy is therefore designed accordingly.
It is important to note that all eviction policies considered in this work other than the TAH policy are not time-aware, i.e., a piece of context information can be evicted despite the estimated
\(C{L_i}\) not elapsing. The value of a piece of context information to evict is calculated as follows where
\(\hat{I}\) refers to the most popular piece of context in the same logical level,
\(\ddot{I}\) refers to the context that takes the longest latency to retrieve and
\(\Delta C{L_i}\) refers to the remaining cached lifetime:
where,
\(0 \le \kappa ,{\rm{\ }}\mu ,{\rm{\ }}\nu ,\frac{{A{R_i}}}{{A{R_{\hat{I}}}}},{\rm{\ }}\frac{{{\rm{\Delta }}C{L_i}}}{{C{L_i}}},{\rm{\ }}\frac{{\overline {R{L_i}} }}{{\overline {R{L_{\ddot{I}}}} }}{\rm{\ }} \le 1\) and
\(Valu{e_i}{\rm{\ }} \in \mathbb{R}_0^ +\). As an example,
\(Valu{e_i}\) is low and a possible candidate for eviction when an entity is relatively unpopular,
\({\rm{\Delta }}C{L_i} \to 0\) or
\({\rm{\Delta }}C{L_i}\) has elapsed and the context can be retrieved reliability and fast such that re-derivation is cheap to perform. Context information that avoids being evicted by the end of their
\(\mathbb{E}[ {{\boldsymbol{C}}{{\boldsymbol{L}}_{\boldsymbol{i}}}} ]\) (set at the caching epoch) are reevaluated and the
Cache Life Registry (CLR) is updated with a recalculated
\(\mathbb{E}[ {C{L_i}} ]\). This extends the residence period of a piece of context information in the cache memory.
5 Implementation and Evaluation
In this section, our adaptive context caching (ACOCA) approach is evaluated and compared against the benchmarks (explained below). We evaluate whether the cost- and performance-efficiencies of the CMS are maximized using ACOCA.
5.1 Prototype and Experimental Setup
We developed our proposed methodology as a
proof-of-concept (POC) using Python 3.8.2. The architecture of the system is illustrated in Figure
10. All processes other than those in the context query execution path execute asynchronously in an event-driven manner. The response latencies of context queries are, hence, statistically independent from the response latency of the ACOCA agents. The average response latency to make ACOCA decisions were
\(0.639 \pm 0.07\)ms and
\(2.235 \pm 0.53\)ms using the
ACAgn and
DDPGAgn respectively. In comparison,
StatAgn took
\(0.428 \pm 0.00\) ms.
The architecture comprises six components: (i) Context Query Execution Manager (CQEM), (ii) Storage Query Execution Agent (SQEA), (iii) Selective Caching Agent (SCA), (iv) Cache Operations Module (CaOpM), (v) Cache Memory Module (CMM), and (vi) Context Service Resolver (CSR). CQEM coordinates the context query execution and performs data transformation operations. SQEA executes context cache access operations. SCA makes the context caching decisions. CaOpM executes context data assignment in cache memory, context refreshing, and eviction operations. CMM is a self-contained component that monitors, scales, and stores context information for fast access based on SCA's directives called the context caching plan [
44]. CSR selects relevant context providers and establishes connections. All components are multi-threaded, e.g., data from context providers for a query is retrieved parallelly, and access concurrency to shared resources (i.e., hash tables) are controlled using hierarchical write-locks. Shared services (e.g., SCA) utilize the publisher-subscriber pattern that reads from First-in-First-out queues. gRPC is used for inter-service communication. This design strategy focuses on massive scalability in a distributed environment by creating autonomous agents to simplify the complex ACOCA problem.
We used SQLite to handle structured data repositories such as the context services registry, service descriptions (SD) store, and resource lookup. MongoDB was used for maintaining logs, statistics, and historic context repositories.
5.1.1 Implementing the Reinforcement Learning Agents.
The ACAgn and DDPGAgn implement fully connected neural networks for both actor- and critic-network using TensorFlow 2.0 and Keras. We implemented two hidden layers containing 512 and 256 neurons respectively to minimize the overhead ACOCA agents introduce to the CMS. It also enables us to compare the RL agents with the StatAgn reliably as a light-weight solution suitable in limited resourced distributed environments. The hidden layer activation function was the Rectified Linear Unit (ReLu). SoftMax and Tanh were used in the output layer of the ACAgn and in the critic-network of the DDPGAgn for activation. Mean squared error (MSE) was used to calculate the loss between the Q-target and the Q-value at the actor-network during learning whereas Adam optimizer was used in updating. Scipy is used for mathematical operations including extrapolation. The size of the replay buffer used in DDPGAgn was set to 100 recent decisions.
5.1.2 Deployment and Hyper-parameter Setting.
We used a single Unix-based instance having 8GBs of shared memory and 2.4GHz of CPU clock speed to respond to queries, emulating a typical contemporary edge device. Initial values of the hyperparameters are shown in Table
4. The default logical cache size of a cache memory unit is three entities, considering the highest number of entities accessed by a single context query (i.e., Q
4 and Q
7) among those stated in the motivating scenario. So, a limited cache memory can accommodate only 37.5% of all cacheable entities in the scenario.
5.1.3 Simulating the Busy Intersection.
We simulated the context providers using a proprietary carpark simulator from our previous work [
45] and the IoT data simulator [
56]. Context are retrieved via REST APIs from the providers. Response latency of each context provider is normally distributed, e.g., for
\(GET\ :/cars?vin =\),
\(\overline {RL} = 0.997s\), and
\({\sigma ^2} = 0.012.\) Each context provider responds with a data that could be mapped to context attributes of the requested entities using the defined schema. The SLA that CMS has with the context provider defines only the cost per retrieval
\(( {Cos{t_{ret}}} )\).
We consider a static context lifetime
\(({L_i})\) and a linear freshness decay function to setup a controlled environment that focuses on our primary objectives, considering that
\(\mathbb{E}[ {{L_i}} ]\) of each context attribute is known using our lifetime estimation technique [
45] for a planning period (
PlnPrd). So,
\({L_i} = {\rm{max}}( {1/S{R_q},\mathbb{E}[ {{L_i}} ]} )\) for any context data from a context provider.
Our motivating scenario contains eight context query types and 12 sub-query types involving eight entities, seven context services, and 25 context providers which can be found in GitHub
1 . The repository contains all the datasets used to manipulate the behaviours of the entities, SLAs, context provider registry, and context query load. All context queries are pull-based. We defined eight context consumers (real-time pervasive computing applications) that have negotiated at least one SLA with the Context Management System (CMS). These SLAs specify (i) the price per valid response
\(( {Pric{e_{res}}} )\), (ii) freshness threshold
\(( {{f_{thresh}}} )\), (iii) cost of penalty for an invalid response
\(( {Cos{t_{pen}}} )\), and (iv) the maximum tolerated response latency
\(( {R{T_{max}}} )\). The 153,600 different combinations of context query types, sub-query types, and SLAs that request context information about the heterogenous context entities were executed continuously and some query instances repeated where necessary. We suggest interested readers refer to our repository
1 for further details about the SLAs,
Service Descriptions (SDs), entities, and attributes used in the experiments. A context consumer can also be a provider, e.g., when related to a car in the scenario. The scenario contains complex situations involving stationary and non-stationary entities where meta-data about the entity is derived from the SD, i.e., VIN of car
1, location of carpark
2. Apache JMeter was set up to generate the query load conforming to a Poisson distribution where
\(\lambda = \ 2.5/s\).
5.1.4 Performance Measurement.
We present context caching occurring in the low-level context cache (refer to Figure
7) in this paper. Each test case was executed at least three times per session (i.e., 4,413 context sub-queries) over two independent sessions and then averaged. We define two metrics to evaluate the cost-efficiency of a CMS using an adaptive context caching mechanism. Firstly, we calculate and store the pessimistic return (
PessiRet).
PessiRet is derived using the most expensive and the least valuable SLAs applied within each window. The most expensive SLA defines the most stringent QoS requirements underpinned by the highest
\(R{T_{max}}\) and
\(Pe{n_{del}}\). The least valuable SLA offers the least
\(Pric{e_{res}}\). These SLAs are selected at the end of each window by observation.
PessiRet is calculated similarly to (3) using observed values instead of expected values. Secondly, the total return from responding to all context queries during the observed period is calculated using
\(TotalRet = \mathop \sum \nolimits_{\forall t} \mathop \sum \nolimits_{\forall i} Ret{( t )_i}.\) To measure the performance-efficiency, variation of response time,
probability of delay (PD), throughput, cache throughput,
hit rate (HR), number of context retrievals from providers, and the number of evictions are utilized.
5.1.5 Benchmarks.
The two agents tested under different combinations of configuration parameters (i.e., cache memory size, eviction policy, hyper-parameter sharing, and parameter update policy) are compared against the (i) short-term optimal
StatAgn [
43] to compare the short-term cost- and performance-efficiency of the agents, (ii) probabilistic context caching (
Prob) proposed by Khargharia et al. [
23], (iii) redirector mode (retrieving context information for each query without storage) to compare the impact of context caching on meeting the objectives, and (iv) the popular database mode of operation used predominantly in the state-of-the-art in CMSs [
6,
8,
12,
19,
22,
32].
5.2 Results and Discussion
In this sub-section, we present a summary of the results gathered from experimenting with our prototype. Errors indicated conform to a 95% confidence interval.
First, we observed the impact adaptive context caching has on performance-efficiency of the CMS. The response time reduced by ∼85% compared to the redirector mode. All the
selective caching agent (SCA) minimized caching context that were requested intermittently. However,
ACAgn was 50% slower compared to the
StatAgn to reach this status as we expected since
ACAgn has to gather adequate experience over a longer period of time before it can make accurate value estimations. Table
5 compares the overall performance of our approach using limited and scalable context cache memories for each tested agent and the benchmarks.
There exists a strong negative correlation of -0.9672 between RT and HR using a limited cache for ACAgn. The correlation coefficient is -0.7621 otherwise. There exists no correlation between RT and PD using a limited cache memory. Although, RT and PD are strongly correlated, i.e., 0.8531, using a scalable context cache memory. Corresponding variances of RT limited-sized and scalable cache memories are 1.098 and 0.694. Therefore, we contend that the reason for the above observations is attributed to the limited-sized cache memory resulting in higher number of cache evictions. Similar variance can be observed with other metrics as well. Therefore, limited-sized context cache memories do not guarantee consistent QoS, i.e., dependability at any time, compared to a scalable one.
Both
ACAgn and
DDPGAgn show a strong correlation between RT-HR and also RT-PD irrespective of cache size. We summarize those results in Table
6 below. It conforms to our assumption that
\({P_{DC,i}} \propto 1 - HR{( T )_i}\) in (4).
In the next two subsections, we will discuss the performance efficiency of the CMS using ACOCA agents in the short term. It is measured in \(\overline {HR}\), \(\overline {RT}\), throughput, and \(\overline {PessiRet}\) under different configurations. We will then compare the overall cost efficiency. Next, the long-term efficiencies are discussed, comparing all the agents with benchmarks. Finally, we summarize the key observations from our experiments, highlighting the key principles of ACOCA.
5.2.1 Actor-Critic Agent in the Short Term.
First, we investigated exploratory selective caching using
ACAgn with adaptive
\(\varepsilon\)-greedy policy initiated
\(\eta = 1.0\) for the
ACAgn overcome the drawback of longer convergence time, support with exploring the state space and gathering enough experience in a short span of time. Figure
11 summarizes the empirical results.
The agent makes random decisions when the transition probability distribution is non-existent at first. We refer to this default as random exploration. Several other key observations are as follows. Firstly, the impact of explorative caching on context caching is only marginal using a limited-sized cache. This is because of frequent evictions. Secondly, only the LVF policy makes significant performance improvements on caching with exploration. This supports the argument behind our novel eviction policy to combine LVF with LFU. The novel policy shows significantly low PD and RT, greater throughput, and the lowest negative
\(\overline {PessRet}\) among all the other results using a scalable cache memory given no explorative caching is performed. We contend that the novel policy had a detrimental effect from exploration because the dominant feature in cached context management is now the frequency of use, i.e., LFU of the policy and MFU of exploration. It shows better performance using the LFU policy rather than LVF as was observed using the
StatAgn [
43]. Thirdly,
\(\overline {PessiRet} < 0\) using scalable context cache memory. Contrarily,
\(\overline {PessiRet} > 0\) using
StatAgn under the same conditions [
43]. Further, these
\(\overline {PessiRet}\) values are worse compared to using limited-sized cache memory. The reason for the reduced
\(PessiRet\) could be found again with explorative caching. It suggests that the RL-based models require further learning (i.e., longer than the short term) to reach an optimal solution with exploration.
Considering the context query loads are generally heterogeneous, and the objective to minimize the converge time to an optimal solution, exploration is inevitable. LVF reflects the features of our novel policy during the best-case scenario of a homogenous context query load leading to no or low exploration. Therefore, we find that the LVF policy complements the ACAgn to provide cost-optimal context query management services using a context cache memory.
5.2.2 Deep Deterministic Policy Gradient Agent.
DDPGAgn was tested using a similar setting to
ACAgn. The results are summarized in Figure
12. The striking difference in results between the two agents (i) proves our rationale as to why the policy should be parameterized using the plethora of the same performance metrics that are used to evaluate all efficiency goals of the CMS, and (ii) supports the argument to why
DDPGAgn is a faster, flexible technique to make effective ACOCA decisions – one of the primary objectives of this work.
LFU performs consistently better compared to other policies with
DDPGAgn including
\(\overline {PessiRet} > 0\) using scalable cache memory. Importantly, the HR is lower using a scalable context cache compared to the high HR using limited-sized context cache. As we will show in our discussion about long-term convergence,
DDPGAgn is biased towards not caching much context information given the randomness of the context query load used in testing and high degree of reliability in retrieval. The query load is random enough that caching a piece of context result in a higher probability of partial misses. Given that the cost of cache memory
\(\to 0\), the marginal return from caching a piece of context information is minimized with size. The results also suggest that the penalty incurred as a result of cache seeking and then retrieving or refreshing consequently to a partial hit is costlier. So, a minimum number of context potent to contribute to the efficiencies are cached. We observe this explicitly as the mechanism scaled the cache memory only up to 6 entities in size. In comparison, the
ACAgn and
StatAgn expanded up to 9 entities in size. Two other reasons why
DDPGAgn caches a smaller amount of context information are as follows. First, the transiency of context dealt with in this scenario resulting in a high penalty. Secondly, the low hold-up cost (i.e., zero-based on our assumption) due to low marginality in cache memory cost, resulting in context most potent for caching being cached for longer periods. The high HR can be similarly justified using these same arguments for limited-size context cache memory. Overall, as we expected in the discussion in Section
3,
DDPAgn has adapted better compared to
ACAgn based on all metrics.
Figure
13 shows a strong positive correlation when
StatAgn uses a scalable cache memory and a negative relationship for
Prob, between the number of entities evicted during PlnPrd and the variance of
\(\overline {PessiRet} \ ( {Var( {\overline {PessiRet} \ } )} )\).The relationship is weak for
ACAgn and
DDPGAgn. The relatively constant number of evictions against
\(Var( {\overline {PessiRet} \ } )\) using a limited-sized cache for
StatAgn suggests that the realized returns from responding to context queries vary only based on factors other than the selective caching decision. Hence, the caching decisions made by the
StatAgn are short-term optimal and can be defined as the short-term goal state. In contrast, there are several reasons for the correlations the two agents: (i) both agents are designed to converge for long-term optimality where
\(\gamma = 0.9\), (ii)
\(\gamma\) and
\(Cycl{e_{learn}}\) are defined statically in this work and the algorithm does not adjust to the variable dynamics of the environment in contrast to
\(\delta\) of
StatAgn, and (iii) the correlation using the limited cache memory is a result of explorative caching since there are two distinct clusters that would theoretically merge overtime when the optimal solution is reached. Considering the dispersion of the results, and the large variance in
\(\overline {PessiRet}\)that is larger than any of our proposed methods, we contend that
Prob does not meet any of the short-term optimality features in contrast.
5.2.3 Overall Cost-efficiency.
Next, the total returns of the CMS were measured. Table
7 summarizes this data using the
ACAgn and the
DDPGAgn. Highlighted in bold are the settings used for which the maximum total return (lowest cost) was achieved.
The total return using the Redirector mode (no caching) was $-274.42 given the earnings deducted by penalties was $-195.42 and the retrieval cost was $79.
StatAgn [
43] was beneficial for selective caching where 47.94% and 34.57% increase in returns were observed for the best setting of scalable and limited-sized cache memories. Although the
StatAgn was simplistic, the efficiency improvement is significant.
The improvement in returns is only 7.78% for ACAgn using a limited-size cache memory. The rest of the configurations in ACAgn yields a lesser total return. One of the reasons for this observation is the complex dynamics of the context queries and data, i.e., transiency, compared to data caching. Because of the random and heuristic exploration of the state space for learning, context that are cost-inefficient to cache are cached and vice versa until the \({\pi ^*}\) is reached. The results underlying a limited-sized context cache have always been worse than the random eviction policy. So, it confirms our hypothesis that the ACAgn does not converge in the short term.
The improvement in returns compared to the redirector mode was 60.22% and 26.06% respectively using the parameterized
DDPGAgn with scalable and limited-sized cache memories. But the retrieval costs increased by 4.62% and 60.27%. The improvement in returns is greater than
StatAgn's 47.94% using scalable memory [
43]. We defer to the
biasness to cache less context feature of the optimal policy for the random context query load used in testing as the reason. We identify
DDPGAgn to have reached pseudo short-term optimality based on several features of the results: (i) all the metrics are better than that observed using the random policy (which with
ACAgn was not the case), and (ii) the best configuration using scalable memory shows higher earnings deducted for penalties compared to the no eviction policy, i.e., $-109.16 > $-116.95. Later in the next sub-section, we show how the
DDPGAgn improves in returns using a sample of sequential results as well. In order to investigate the reasons for the observed performance metrics, the key components of the returns need to be discussed. Figure
14 compares the total cost, penalty cost, and retrieval costs as a ratio of the earnings among the best configurations in the short run.
Our ACOCA approach aims at minimizing the costs to earnings ratios.
StatAgn and
Prob are advantageous in cost minimization among the other agents.
StatAgn shows the highest penalty to total cost ratio of 0.8389 and 0.8257 respectively using scalable and limited-sized context cache memories as well. But
DDPGAgn using the scalable cache memory is comparable to the
StatAgn and
Prob. ACAgn using a scalable cache, LVF eviction, and MFU exploration shows the lowest penalty to total cost ratio of 0.7131 followed by 0.7256 of the
DDPGAgn using a scalable cache, LFU eviction, and random exploration. There are two reasons for this observation. First, based on our assumption for PD and the divergence of a number of evictions which we will indicate later in Figure
16, the penalties are minimized over time for RL agents. It is a result of gradual convergence to an optimal policy. In contrast,
StatAgn does not adjust or improve over time. Second, the increase in retrieval costs using the RL agents is a result of (i) additional refreshing due to exploratory caching, and (ii) transient models (RL models that are yet to reach
\({\pi ^*}\)) not caching the relevant context, causing retrievals.
5.2.4 Convergence and Long-term Efficiency.
We investigated the performance of the selective caching agents over the long-term. The best configurations for each type of agent (highlighted in Table
7) using the scalable context cache memory in the short-term were used in the following experiments. Database mode and redirector modes of operations are not indicated in the following graphs because they are not adaptive. As indicated in Figure
15, the
DDPGAgn performed the best in the long-term as we argued in the methodology. The pessimistic returns (PessiRet) and the hit rate (HR) have been maximized, while the response time (RT) has been minimized as it was set in the objective function – a novel contribution of this work in optimizing an CMS for both cost- and performance-efficiency, unlike any previous work including the probabilistic method (
Prob) [
23].
We contend that the
DDPGAgn converge to an optimal policy within 200–250 windows considering that the variance of all the performance metrics minimize between this time. The convergence time is significantly lower in number compared to related work in traditional data caching where more than a thousand iterations or windows are spent until convergence, e.g., ∼20,000 in [
36] and ∼500 in [
47]. (i.e., our approach is at least two times faster). Hence, our adaptive context caching model is extremely and flexible that it is capable of adapting to real world situations extremely fast – a feature that is useful in delivering reliable and relevant context information for users without having to incur high costs of context access in a highly transient environment such as in the motivating scenario.
It should be noted that variance of any performance metric will not be zero in context cache management because of context refreshing events that happen in an ad hoc manner (i.e., because situations can change at any time). While the
ACAgn and
Prob could be seen to perform sub optimally than the
DDPGAgn, we argue that it has been able to reach an optimal policy within our testing period because the
ACAgn and
Prob show a high level of variance for all performance metrics. It is due to the model exploring the context state space, which we theoretically argued could make the
ACAgn converge later than the
DDPGAgn due to the need to gather a large amount of experience to accurately estimate the value functions. Given
ACAgn and
Prob almost overlaps in Figure
15(c) we may also argue that these agents might have converged to a local optimum. Hence, we confirm that the value function based reinforcement learning techniques or similar control theories are not applicable when developing fast adaptive, flexible adaptive context caching approaches under uncertainty.
DDPGAgn is extremely selective efficient in the long-term given the smaller number of context entity and attribute evictions compared to the other agents, which can be argued to have resulted in the consistency in QoS (because we showed in the short-term that evictions contribute to inconsistent QoS and incurred costs). Hence,
DDPGAgn has been able to select and cache the most reusable and repurposeful context. Further to our arguments in [
43], we also show that
StatAgn is rather a short-term optimal solution that that is oblivious of the long term efficiency objectives because (i) the returns of the
StatAgn are lower than the
DDPGAgn and have been continuously varying during the period of testing resulting in a variance of 0.7246 in returns, (ii) HR of the context cache using
StatAgn is similar to that achieved in the short-term, while
DDPGAgn improves subsequently to an average HR of 0.977, and (iii) the large number of evictions suggesting low context selection efficiency. Using every parameter, we have shown our method outperforms the probabilistic method as well [
23]. Table
8 summarizes the results obtained the long-term for the agents.
5.2.5 Scalability of Our Approach.
We evaluated the scalability of our approach by gradually increasing the number of concurrent users for all adaptive context caching agents. Since the POC is executed using a single node configuration, we tested only up to 100 concurrent users. Figure
16 compares the average response time for
sub-queries (SQs) when 1, 10, and 100 concurrent users make context queries using the
DDPGAgn. The system was stable throughout while the PessiRet were $
\(5.278 \pm 1.77\) and $
\(63.099 \pm 20.01\) when 10 and 100 concurrent consumers. PD also reduced from 0.2847 to 0.2324 due to the high AR of cached context information. It suggests that our approach is more cost- and performance-efficient under large-scale.
It should be noted that there are two reasons for this scalability. Firstly, we indicated that the
DDPGAgn as of the already established lightweight
StatAgn incur a low overhead to the CMS.
DDPGAgn takes only
\(2.235 \pm 0.53{\boldsymbol{\ }}\)ms to make the selective context caching decisions, or ∼450 pieces of context information evaluated for caching in a second. Secondly, since we employ adaptive context state-space exploration, and parameter sharing between subsequent
PlnPrds, our model is capable of fast adapting from an already learnt optimal state to another in case the subsequent
PlnPrd shows minor variations to the previously explored instance of the
PlnPrd. Hence, the ACOCA algorithm is able to effectively use the experience with our heuristics to manage any uncertainty without incurring a significant overhead or taking a long time to adapt. We see this in Figure
16 since we randomly modified the context query load in timings (e.g., the same sub-query randomly varied in the time it was made emulating a waiting-time paradox scenario, where the same driver can do a routine work at relatively the same time but with random variations) where our model converged much faster within 5 minutes.
6 Conclusion and Further Work
In our work, we proposed an approach for distributed adaptive context caching that maximizes cost- and performance-efficiency of Context Management Systems (CMS) using extensive experiments and simulations. We compared two selective context caching agents that implement reinforcement learning techniques to maximize the cost- and performance-efficiency of a CMS along with suitable heuristics to overcome the uncertainty and lack of prior knowledge about context information, and context query loads, e.g., about real-world situations. The paper also introduced novel eviction policies, and cache organization for caching context information. These are explained and proved using mathematical models, supported by examples from a real-world inspired motivating scenario. We evaluated the cost of responding to context queries and the Quality of Service (QoS) under multiple settings to identify the most efficient adaptive caching configuration.
We showed that our novel Statistical based Agent (
StatAgn) and the agent that implements a Deep Deterministic Policy Gradient method (
DDPGAgn) can minimize costs by up to 47.94% and 60.22% respectively in comparison to the redirector mode for the CMS fast, and flexibly in the short-term. Least Value First (LVF) and our bespoke Time-Aware Hierarchical (TAH) eviction policy for context information performs well under highly volatile context query loads whereas Least Frequently Used (LFU) was suitable otherwise, especially when the CMS utilizes limited-sized context cache memory. Especially in the long-term efficiency measurement where
DDPGAgn outperformed all the benchmarks, ACOCA reduced redundant costs such as hold-up costs, refreshing, and processing in the CMS which otherwise could have occurred from caching or storing (in database mode) intermittently accessed context information. Hence, the
DDPAgn was 8.67x times selective efficient than the next best approach, the
StatAgn and the probabilistic method [
23]. We also showed that the
DDPGAgn is at least 2x times faster to converge to an optimal policy compared to related work in traditional data caching techniques that leverage similar techniques despite the additional uncertainties involved with context, e.g., changes in situations, and context query requirements, e.g., quality of context.
In summary, we develop an approach for adaptive context caching that is cost-efficient, performance-efficient, fast adaptive, and flexible such that context consumers can be delivered relevant and dependable context information cheaply, and consistently. In fact, by levering
StatAgn and
DDPGAgn in tandem, we contend that the important functional requirement of ACOCA – long term optimal and short-term awareness [
44] can be guaranteed as well, unlike any previous work which suffers from the cold start problem and underperformance when in circumstances such as context uncertainty and lack of prior knowledge that our approach effectively mitigates. In further work, we intend to develop an edge computing based distributed adaptive context caching for the
Distributed Context Intelligence System (DCIS) to further scale CMSs in the global IoT eco-system.