research-article

Open access

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Authors:

Guang-Li HuangAuthors Info & Claims

ACM Transactions on Internet of Things, Volume 5, Issue 2

Article No.: 12, Pages 1 - 32

https://doi.org/10.1145/3648571

Published: 23 April 2024 Publication History

PDF eReader

Abstract

Real-time applications increasingly rely on context information to provide relevant and dependable features. Context queries require large-scale retrieval, inferencing, aggregation, and delivery of context using only limited computing resources, especially in a distributed environment. If this is slow, inconsistent, and too expensive to access context information, the dependability and relevancy of real-time applications may fail to exist. This paper argues, transiency of context (i.e., the limited validity period), variations in the features of context query loads (e.g., the request rate, different Quality of Service (QoS), and Quality of Context (QoC) requirements), and lack of prior knowledge about context to make near real-time adaptations as fundamental challenges that need to be addressed to overcome these shortcomings. Hence, we propose a performance metric driven reinforcement learning based adaptive context caching approach aiming to maximize both cost- and performance-efficiency for middleware-based Context Management Systems (CMSs). Although context-aware caching has been thoroughly investigated in the literature, our approach is novel because existing techniques are not fully applicable to caching context due to (i) the underlying fundamental challenges and (ii) not addressing the limitations hindering dependability and consistency of context. Unlike previously tested modes of CMS operations and traditional data caching techniques, our approach can provide real-time pervasive applications with lower cost, faster, and fresher high quality context information. Compared to existing context-aware data caching algorithms, our technique is bespoken for caching context information, which is different from traditional data. We also show that our full-cycle context lifecycle-based approach can maximize both cost- and performance-efficiency while maintaining adequate QoC solely based on real-time performance metrics and our heuristic techniques without depending on any previous knowledge about the context, variations in query features, or quality demands, unlike any previous work. We demonstrate using a real world inspired scenario and a prototype middleware based CMS integrated with our adaptive context caching approach that we have implemented, how realtime applications that are 85% faster can be more relevant and dependable to users, while costing 60.22% less than using existing techniques to access context information. Our model is also at least twice as fast and more flexible to adapt compared to existing benchmarks even under uncertainty and lack of prior knowledge about context, transiency, and variable context query loads.

1 Introduction

An increasing number of real-time pervasive computing applications rely on information gathered from multiple data sources to provide relevant and dependable features to users. Internet-of-Things (IoT) is a primary enabler in this regard because aggregating and inferencing with big IoT data can provide insights about entities, their environment, and how they interact [34]. While the 3Vs – Volume, Velocity, and Variety are implicit challenges to any data intensive system [33], middleware-based Context Management Systems (CMSs) have to efficiently manage large-scale context retrieval, inferencing, storage, and delivery of highly-transient context information for a wide spectrum of quality requirements under uncertainty attributed to variations in context query features and changes in real-world situations. Unlike data caching techniques which are based on prior knowledge (e.g., transition probabilities of cache states, file libraries), historical knowledge about situations and context can be scarce or causing uncertainty; hence, there is a critical need to develop heuristic methods. Providing relevant and dependable (i.e., adequately reliable) context information for time-critical applications is further constrained by limited computing resources [16], e.g., IoT devices are often self-contained low powered devices. Dependability and relevancy of context information can significantly suffer in these conditions. For instance, a traffic congestion at an upcoming intersection is no longer relevant to a driver using a traffic alerting system after merging with it. The dependability of the application, hence, reduces. Storing all context information for reuse as in traditional data caching is seemingly a viable option where available (provided not all devices in a distributed environment are cache enabled), but the ephemeral validity period of fast varying context information results in short cacheable periods unless refreshed or evicted – both of which are expensive operations, either because of the need for frequent retrievals or cache misses. The representational and structural challenges such as maintaining affinity among related situations and the challenges to maintain (i) the relationships between the different logical levels of context information and (ii) entity relationships in cache memory without incurring redundant context management costs are unique challenges in context caching described in our previous work [44, 46]. We summarize the solution to solve the unique research challenges in adaptive context caching by the 5Ws [44] – What context to cache, When to cache context, Where to cache context, hoW to cache context, and Who cache context? We implement and test this solution in the paper.

1.1 Difficulties and Challenges in Context Management

According to the definition given by Abowd et al. [1], context is “any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves”. We will consider four challenges when caching context information based on this definition as follows.

(1)

Context information can exceed the volume of the raw data because context explains a certain piece of data or an entity in an environment infusing extrinsic elements to the description. That is, context caching should involve managing a collection of data and derived information along with their meta data that can describe multiple logical and physical relationships [46]. Consider higher-level context inferred using the Context Space Theory [7] such as the situation of a room. The inputs can be the noise level, light intensity, and a video stream. The context, as an output, is a JSON-LD [37] document. Context information can also have multiple alternate representations [18] and applications might be interested in different aspects of the same context. Traditional data caching methods may not be well applicable to context caching, because pieces of data are typically treated individually in contrast to the inter-related temporal and spatial context. So, a novel caching approach is urgently needed to handle such large amounts of heterogeneous contextual data efficiently.

(2)

Context is transient, e.g., the situation of a person can change at any moment. That is, a new context caching approach needs to be very adaptable and flexible in near real-time.

(3)

Context is both time and quality critical. For instance, consider notifying a driver of a potential hazard (i.e., a situation, or in other words the context about the driver in the particular environment). Context information needs to be accurate (e.g., specific hazard), delivered on time (e.g., giving adequate time to react) to be relevant, and dependable (e.g., save the driver from an accident). While traditional data caching that minimizes response time focus on storing data items with long cacheable lifetime, a novel context caching approach is urgently needed to adequately manage the freshness and applicable varying Quality of Context parameters (i.e., accuracy, precision) in addition.

(4)

Big IoT data streams used to infer context are heterogeneous, e.g., in size and format. IoT data streams can be inconsistent, so that context data might not show any clear patterns [29]. There can exist occasional errors in retrieved context data or losses in QoC [11]. For instance, if a context provider which used to refresh cached information is no longer available, a caching approach would need to make cost and quality based decisions between evictions and alternative provider refreshing [47]. That is, a new context caching approach needs to cost-efficiently manage the adaptive context caching lifecycle without compromising any of QoS, QoC, or cost-efficiency.

In summary, developing a Context Management System that solves all the above four challenges under uncertainty is not trivial. Related examples can be found in the state of the art [8, 19, 22, 51, 53] which suffer either from scalability, complexity, inability to generalize to a large set of scenarios and situations and/or cost.

1.2 Motivating Scenario – a Busy City Circle

Let us consider a real-world scenario surrounding a city circle with multiple intersections. We illustrate this scenario in Figure 1. Assuming all are pull-based context queries [14], Table 1 lists several context queries that originate from real-time navigation applications, i.e., context consumers, using the Context Definition and Query Language (CDQL) [13, 14].

Fig. 1.

Table 1.

Notation	Query Description
Q₁	Search car parks with available slots in the area.
Q₂	Search car parks with available slots in the area that charge less than a given amount per hour.
Q₃	Search for car parks with available slots that are less than a certain distance from the target location.
Q₄	Search car parks with available slots given it is good for walking to the target location.
Q₅	Check whether it is good for jogging at the park.
Q₆	Search bike parking spots in the near vicinity of the rider.
Q₇	Is the rider vulnerable to crashing into an object at the next closest intersection?
Q₈	Which way to turn at the next junction to avoid traffic to reach the target location?

Table 1. Context Queries in the Motivating Scenario

Based on any of the context queries above, the users are dependent on the CMS to provide valid, reliable context information about the car parks, bike parks, weather, parks, and the like, to make smarter decisions. For instance, in Q₇ the rider is dependent on the CMS to notify with minimal latency. But the rider will not be satisfied with false positive hazards based on stale and invalid context about the intersection retrieved from the cache memory. On the contrary, fresher context (which correlates to cost) that takes longer to retrieve and derive would have no relevance if the rider is notified late. So, how can performance- and cost-efficiency be maximized having minimum effect on freshness, i.e., dependability?

Context queries can be broken down into multiple sub-queries called context requests. Context requests are identified at the execution time. For instance, in our architecture, the query coordinator in the Context Query Engine (CQE) coordinates the execution of the sub-queries to produce the aggregated context query response from the Context-as-a-Service (CoaaS) platform [15]. Figure 2 illustrates the query breakdown for ${Q_4}$ and ${Q_5}$. Blue colour is used to indicate the context query, orange colour for sub-queries, green for context entities, and yellow for context attributes. The arrows indicate the logical relationship between the different context. Context in the immediate child nodes is either those required to (i) derive the context depicted in the parent node (e.g., “get car parks” is required to generate context response for Q₄), or (ii) describe the context in the parent node (e.g., address and location attributes describe the building entity).

Fig. 2.

Notice the sub-query “good for jogging” shares the same entities and context attributes as “good for walking”, i.e., temperature and probability of rain attributes of the weather entity. It is prudent to cache this context for reuse and avoid redundant retrievals. We see this notion being used as the basis in the work of Khargharia et al. [23]. The other function of context data caching is repurposing. Assume the $ca{r_5}$ executes ${Q_2}$ travelling close to $ca{r_3}$ that executes ${Q_1}$. Retrieved data of nearby car parks for ${Q_1}$ could be repurposed for ${Q_2}$ using only a filter. But the quality requirements for each context vary with time. Uncertainty in situations is also a contributing factor to variation in context needs. So, the motivation of this paper is to develop a more flexible, fast adaptive context caching approach that takes into consideration the features of context, context queries, situations, and quality requirements under uncertainty which have been largely ignored in similar works over traditional popularity based (reusability-oriented) techniques that may not guarantee of all aspects of relevance and dependability – (i) Quality of Service (QoS), (ii) Quality of Context (QoC), and (iii) cost-efficiency.

1.3 Novel Contributions of this Paper

This paper presents a novel approach for adaptive context caching different from traditional data caching. We also present experimental results using our prototype demonstrating how our mechanism is effective in delivering relevant and dependable context information for the context consumers in the motivating scenario using a simulated context sub-query load. The main contributions of this paper are as follows:

•

we propose a novel, scalable reinforcement learning based approach, first of its kind, which makes selective context caching decisions aimed at maximizing the cost- and performance-efficiency of a CMS,

•

we propose and evaluate novel policies and heuristic methods not available in previous literature for adaptive context caching approaches such as eviction and dynamic cache memory scaling to mitigate uncertainty and lack of prior knowledge about context, e.g., situations,

•

we show that our approach is long-term optimal and short-term aware [44] unlike any previous adaptive caching techniques in the literature and faster than existing work to adapt even under uncertainties unique to context information management – important functional requirements of adaptive context caching techniques,

•

we evaluate the cost of responding to context queries and the Quality of Service (QoS) under multiple settings to identify the most effective adaptive caching configuration to achieve the multiple efficiency objectives, and

•

we show that our proposed algorithm performs without prior knowledge and makes adaptive caching decisions based on performance metrics while taking into account features of the dynamic query load in near real-time.

The rest of the paper is structured as follows. Section 2 discusses the related work in adaptive context caching (ACOCA). Section 3 discuss the concepts foundational to ACOCA and Section 4 presents our ACOCA mechanism including heuristics and cache memory organization. Section 5 describes the implementation of the ACOCA prototype and experimentations. We extensively discuss our observations in this section. Section 6 concludes and highlights the direction for future work in this area.

2 Related Work in Adaptive Context Caching

The relevancy and dependability of context information are essential when users rely on pervasive computing applications to make time-critical decisions. As in the motivating scenario, users such as drivers rely on the application to provide reliable context information about the parking slots, weather, and their own current state to make the preference choices. For instance, if the context about a selected car parking facility is invalid, let's say, goodForWalking was inferred incorrectly, the driver would not be able to park and transit to the destination conveniently. The validity and relevance can be a matter of a number of factors, including the selection of Quality of Context (QoC) and quality aware context providers [21, 27]. But there is a limit to how much selection can improve the dependability and reliance of context. Consider a context service that samples all the car parking slots each second. QoC metrics such as timeliness (i.e., freshness) and accuracy of the number of available slots are significantly high. But what if each data ingestion costs the context management middleware? Infrequent variation in availability makes constant retrievals redundant (i.e., longer context lifetime [45]), and continuously monitoring the context of car parks that the drivers show minimal interest in is irrational. In summary, continuous monitoring and storing context information, dubbed the database mode of operation for Context Management Systems (CMSs) are expensive. Unfortunately, state-of-the-art CMS such as FIWARE Orion [53], Nexus [19], Context Information System (CIS) [22], COSMOS [8], and CA4IoT [32] leverage the database mode of operation since it facilitates a fast query execution strategy, abstracting the complexities of on demand context retrievals. Translating context queries written in CDQL [14] or NGSI [53] into database native queries, e.g., CDQL to MongoDB queries in [14], executes fast enough satisfying Quality of Service requirements. Hence, the rationale to use the database mode is agreeable because we already provided an example of how retrieval latency (a QoS metric) can affect the relevance (a QoC metric) of context about the road (i.e., the congestion situation) to a driver.

FIWARE Orion use specialized databases in Generic Enablers [54], while Nexus and CIS desire to manage context information that are unlikely to change in the near future, e.g., location of places. These underlying assumptions are impractical when pervasive computing applications are rather interested in context that change frequently as in the scenario. In fact, we already stated that the size of context about an entity can be much larger since it involves aggregating and inferencing from a number of heterogeneous data streams, e.g., context about a car park can change because of its internal features such as occupancy rate at a certain time of the day, and any events that occur around the car park such as community or sporting events (which could only be derived from a different data stream). Monitoring and storing a large of volume of data streams is storage-inefficient and expensive. So, these techniques are not sufficiently scalable in the IoT ecosystem. We understand that the cost of context management can further aggravate if these systems incorporate techniques to verify the validity of the context information in storage [44] before responding to context queries from storage. Inadequate sampling rates [47], and asynchrony between lifetime of context and network queuing times [38, 45], long retrieval latencies can significantly deteriorate the validity and quality of context [35], owing to ingestion from alternative context providers [47] increasing the need for context storage, redundant retrieval of the same context information – resulting in a continuously depleting cost-efficiency as argued in [46] to be at least $O( {M + C} )$ in the best case scenario where M is the number of context information and C is the number of related context. Hence, database mode pays a large cost for certain aspects of performance-efficiency (such as context query response latency) although there is no guarantee of quality.

The other end of the spectrum of CMS operations is the redirector mode where context information is retrieved on demand for each query without storage. While the asynchrony in network queues, response delays, heterogeneity in IoT specifications (e.g., sampling rates, precision, and accuracy of measurements), availability of context providers are major limitations to redirector mode, and the total cost of retrieving context for every context query is an expensive drawback, even though it guarantees the freshness of context. However, redirector mode is extremely slow and not scalable. In essence, there is an urgent need to develop a technique that combines the merits of both the database and redirector modes such that CMSs could effectively handle scale, reduce cost, deliver context information that is adequate quality (based on the requirements of the user that defines how relevant the context is), at least response latency.

While caching context appears a feasible solution to the above problem based on related literature in traditional data caching, there is limited application of these solutions to CMSs. Firstly, context information is significantly different from data that is discussed in the literature, e.g., transiency. We critically analyse the unique ACOCA challenges compared to data caching in [46] and further argue the inapplicability of existing data caching techniques to ACOCA in [44]. Work in context caching is in its infancy and few of the existing works require further work to achieve cost-, performance-efficiencies while adhering to QoC. For instance, Khargharia et al. [23] proposed a prefetching technique based on expected popularity of low-level context information. While it is tested in database mode, it fails to provide evidence of freshness of context and cost-efficiency which is a logical extension of the drawbacks using database mode. Hence, we argue the inadequacy of using only the popularity of context access as in data caching due to loss of validity in cache memory, minimizing the effective reusability (in contrast to traditional data, which without a lifetime attributed to it, can be reused as long as it is cached) [44]. Schwefel et al. [35] propose a theoretical model for synchronously retrieving context minimizing the loss of freshness, but provide no evidence about how this model would adapt with respect to transient situations, dynamic query load features, and quality requirements in near-real time. In Section 1, we argued the need for near-real time adaptiveness to situations and other high-level contexts as a functional requirement of ACOCA but [35] focused only on IoT data. Medvedev [28] investigates several proactive refreshing techniques to overcome this problem but the CMS has to resort to other processes to profile and negotiate parameters such as lifetime (e.g., as in [45]), planning periods when handling the uncertainty of context information. The primary problem is, how can any of these techniques adapt fast and flexibly to changes in situations, features of query loads, when there is a lack of prior knowledge about the observations?

There are several reasons for lack of prior knowledge about context information. They are: (a) the emergence of new context information (unavailability of previously observed statistics), (b) lack of observed data by volume, (c) uncertainty in data due to variability in the environment, e.g., $\mathbb{E}$[AR] of Q₄ under adverse weather conditions versus a calm day between 7:30-8:30 am, (d) imperfections [18, 25], and/or (e) incompleteness, e.g., having no observed data on AR of context query Q₅ between 3:00-4:00 am although there exist in abundance between 7:30-8:30 am. Hence, it is one of the significant and non-trivial problems in adaptive context caching which Blasco and Gunduz [6] explicitly state, taking popularity profiles. Sheng et al. [36] investigated a strategy using no priors, but in traditional data caching which is not fully applicable to transient context information also given their inherent complexities [46]. So, there is a significant research gap in literature to mitigate this problem arising from the reasons above.

Utilizing Markov Decision Process and control theory for adaptation in caching has also been studied in the literature. For instance, Zhu et al. [50], Sheng et al. [36], and Nasehzadeh et al. [31] leverage Reinforcement Learning (RL) but still develop solutions based on known priors such as data libraries that are not applicable in context caching. Defining a state space similarly to previous work in data caching for context caching is infeasible as no system can predict all the situations and potential viable caching actions in advance for the future, e.g., the transitional probability distribution for each context i - cache action ${a_i}$ pair are unknown in contrast to how they are known in data caching.The techniques we need to develop for context caching need to be able to manage these unknowns, while being coherent with how the caching decision at any stage of lifecycle [44] would impact the cost-, performance-efficiency and QoC aspects.

3 Background of Selective Context Caching

In this section, we describe the context selection algorithm and its design specifics. The selective decision is concerned with answering two questions: (1) what context to cache, and (2) when to cache context? The main objective of selectively caching context is to maximize the performance, scalability, and profitability of Context Management Systems (CMSs) by efficiently utilizing the available resources – processing, cache, memory, network, and storage, for context information that is potent to contribute to the efficiency and quality goals when cached. Service Level Agreements (SLA) underpin the parameters to establish the desirable levels of cost, QoS, and QoC [28, 47]. These parameters are used as constraints in the objective functions.

3.1 Background of Context Selection

The popularity of a piece of context can be observed under two states – when cached and not cached. In this paper, Hit Rate (HR) is affected in both of these states. For instance, the HR of a context attribute (CA) is the conditional probability of retrieving the values from the cache memory conforming to the desired QoC parameters. A context value i that is either unavailable or obsolete (or in other words, invalid) in the cache is a cache miss. Access Rate (AR) is the number of times a piece of context is accessed in context queries irrespective of the cached status $( {{n_i}} )$ divided by the total number of context queries $( N )$ received. They can be indicated as follows where i is an index of a context information in $i \in \mathcal{I}:= \{ {0,1,2, \ldots ,\ I} \}$, $HR{( t )_i}$ and $AR{( t )_i}$ are the HR and AR of context information i at time t:

\begin{align} HR{\left( t \right)_i} &= P{\left( {Fresh{\rm{|}}Cached} \right)_i}\\ AR{\left( t \right)_i} &= \ \frac{{{n_i}}}{N}\nonumber \end{align}

(1)

The number of accesses to a cached and adequately fresh context information is ${m_i} = k \times {n_i}$ where $0 \le k \le 1.$A sliding time window of size W corresponding to the last W time units, e.g., W seconds, is used in this paper. T is the index referring to the current window. To summarize the observed and expected values of sequential performance metrics (e.g., HR and AR) in a space-efficient manner, we define a triplet $\langle {short,\ mid,\ long} \rangle$. It refers to the relative index from the current window at which the corresponding statistic is measured. For example, consider $\langle {1,2,5} \rangle$ and a queue of observed HR for the past five windows as $[ {0,0.67,0.93,0.86,0.73} ].$ This is summarized as $[ {0.73,0.67,0} ].$

An important definition in this work is the Probability of Delay (PD). It is the overall probability that a context query exceeds the maximum acceptable response time $( {R{T_{max}}} )$ set in a specific SLA, resulting in a penalty $(Pe{n_{del}}).$ Caching all context information (similar to the database mode used in state-of-the-art CMSs) does not guarantee $RT < R{T_{max}}$ due to, (i) infeasible SLAs [45], (ii) ephemeral expiry periods $( {{E_n}} )$ because of short residual lifetimes (ResiL) (refer to Section 4.4), and (iii) processing, retrieval, and/or storage overheads which correlate to the size of cached context [47]. Identifying and deciding what context not to cache under these circumstances (e.g., $ost \gg RetrievalCost$ ) can significantly improve the cost- and performance-efficiency of a CMS.

The cached lifetime of a context item $( {C{L_i}} )$ is the period of time measured in t time units, i.e., seconds, during which i resides in the cache. Delay time $( {D{T_i}} )$ is the number of windows the same i will not be evaluated for caching. Calculations of both these values are done by the selective agent algorithm which will be further described in Sections 4.1 and 4.2. The rest of the symbols used in this paper are summarized in Table 2.

Table 2.

Symbol	Description
$\lambda$	Poisson request arrival rate, e.g., 1.5/second
$\varepsilon$	Exploration factor $( {0 \le \varepsilon \le 1} )$
$\zeta$	Exploration factor correction
$\eta$	Eviction threshold
$\gamma$	Discount rate of the RL agent, i.e., 0.9 $( {0 \le \gamma \le 1} )$
$\alpha$	Learning rate of the actor-network, i.e., 0.0001
$\beta$	Learning rate of the critic-network, i.e., 0.001
$\tau$	Factor for soft target network parameter updates
$S{R_q}$	Sampling rate of context provider $q$ in $q \in \mathcal{Q}:= \{ {0,1,2, \ldots ,\ Q} \}$, e.g., 0.33Hz
${P_{DC,i}}$	$P{(Delay\|Cached)_i}$ – The conditional probability of getting a delayed response though the item is cached
${f_{thresh}}$	Minimum accepted freshness given an SLA
${f_{arrive}}$	Freshness of an item once retrieved
$Pric{e_{res}}$	Price earned per valid context query response
$Cos{t_{re}}$	Cost of retrieving context from context providers
$ReL$	Retrieval latency of a context provider
${L_i}$	Lifetime of a context item
$Cycl{e_{learn}}$	Learning Cycle, e.g., 20 selective caching decisions

Table 2. Symbols used in Discussed Contexion Selection Approaches

3.2 Reinforcement Learning for Adaptive Context Selection

The selective caching agents investigated in this paper can be summarized as follows in Figure 3. We presented our work using the statistical based agent (StatAgn) in [43]. In this paper, we present our solutions implementing reinforcement learning: (a) actor-critic network agent (ACAgn) and (b) deep deterministic policy gradient agent (DDPGAgn).

Fig. 3.

According to the discussion in Section 2, we argued that reinforcement learning has significant viability to implement an ACOCA approach although it has not been investigated before. Reinforcement learning techniques can be broadly categorized into (i) value function estimation approaches and (ii) policy search approaches [42]. We theoretically argued in [44] that the drawback of utilizing value function estimation approaches for ACOCA is the inability to accurately estimate the value $( {{V^\pi }} )$ for a given policy because it is impossible to pre-train for context caching decisions using prior experience (that does not exist). Yet, policy search methods have the, (i) ability to parameterize the policy $( {{\pi _\theta }} )$ by performance metrics – ideal to incorporate the plethora of parameters involved in ACOCA [44], (ii) ability to cold-start – suitable when prior knowledge about situations and context information are not readily available, and (iii) fast convergence times shown in literature [42] compared to other reinforcement learning techniques, that is ideal to meet our flexibility and fast adaptation goals of ACOCA. Since there are no prior works in this area, it is important to study these categories (using the value function estimating ACAgn and the policy searching DDPGAgn) to prove the validity of these rationales using empirical results that are not contaminated by bespoke optimizers developed for specific data caching problems found in the literature, e.g., [17, 31, 36, 50].

In the next section, we will discuss our proposed adaptive context caching algorithm.

4 Adaptive Context Caching Algorithm

In this section, the rest of the aspects of the ACOCA lifecycle in the adaptive context caching approach are discussed in detail. The design specifics of the reinforcement learning agents, end-to-end process flow of the agents, proposed heuristics to overcome uncertainty, eviction policies, cache organization, and context cache memory scaling are described.

4.1 Actor-Critic Agent

We define the selective caching problem as a Markov Decision Process (MDP). A state $( s )$ is a vector containing 15 features. They are the observed $AR{( t )_i}$ and $HR{( t )_i}$, $\mathbb{E}[ {AR{{( t )}_i}} ]$, $\mathbb{E}[ {HR{{( t )}_i}} ]$, $\overline {C{L_i}}$, average latency to retrieve, and average retrieval cost as indicated in Table 3. These features were chosen to reflect the (i) popularity, (ii) contribution to efficiency as a result of caching, (iii) ability to reliably retrieve and derive context, e.g., context provider $q = 1$ is more reliable than $q = 2$ when $avialabilit{y_{q = 1}} > availabilit{y_{q = 2}}$ and $Re{L_{q = 1}} > Re{L_{q = 2}}$, and (iv) ability to maximize the cache lifetime. Observed and expected timelines are discretized using the $\langle {short,mind,long} \rangle$ tuple. The state space $( \mathcal{S} )$ is, hence, defined using continuous variables. We have not considered the size of context as a feature in this work despite related work in data caching [24, 49], assuming the low level context considered in our scenario are uniform in size. The goal is to cache the maximum number of contexts that yield incremental returns (i.e., the reward) to the CMS.

Table 3.

Notation	Description
$AR{( {long} )_i}$	Observed AR at $T - long$ for i.
$AR{( {mid} )_i}$	Observed AR at $T - mid$ for i.
$AR{( {short} )_i}$	Observed AR at $T - short$ for i.
$\mathbb{E}[AR{( {long} )_i}]$	Expected AR at $T + long$ for i.
$\mathbb{E}[ {AR{{( {mid} )}_i}} ]$	Expected AR at $T + mid$ for i.
$\mathbb{E}[ {AR{{( {short} )}_i}} ]$	Expected AR at $T + short$ for i.
$HR{( {long} )_i}$	Observed HR at $T - long$ for i.
$HR{( {mid} )_i}$	Observed HR at $T - mid$ for i.
$HR{( {short} )_i}$	Observed HR at $T - short$ for i.
$\mathbb{E}[HR{( {long} )_i}]$	Expected HR at $T + long$ for i.
$\mathbb{E}[ {HR{{( {mid} )}_i}} ]$	Expected HR at $T + mid$ for i.
$\mathbb{E}[ {HR{{( {short} )}_i}} ]$	Expected HR at $T + short$ for i.
$\overline {C{L_i}}$	Average cached lifetime of i.
$\overline {R{L_i}}$	Average context retrieval latency of i.
$\overline {Cos{t_{Ret,i}}}$	Average cost of context retrieval of i.

Table 3. Features used in Context Selection in the ACAgn

A context caching action $( {{a_i}} )$ in the action set $= \{ {0,1} \}$ corresponds to not-caching or caching a context value i. The context caching action space is $\mathcal{A} = \{ {{a_1},{a_2}, \ldots {a_i}} \}$ for $\forall i.$ Given the context caching decision is binary, the output of the agent is the probability to make one of the decisions. Let us take the probability to cache a piece of context $P( {{a_i} = 1} ) = 1 - P( {{a_i} = 0} )$ for that matter. The context caching policy is $\pi ( {a{\rm{|}}s} )$ accordingly. Consider that $Rewar{d_\tau }$ is the cumulative earning to the CMS from responding to context queries after $\tau$ number of windows where $\gamma$ is the discount factor. Then $Rewar{d_\tau } = \mathop \sum \nolimits_{t = 0}^T ({\gamma ^t} \times {r_{t + \tau }})$ and the optimal selective context caching policy ${\pi ^*} = argma{x_\pi }( {\mathbb{E}[ {Rewar{d_\tau }{\rm{|}}\pi } ]} )$. The value function $( {{V^\pi }} )$ and Q-functions $( {{Q^\pi }} )$ for a selective context caching policy can be defined as follows given $p( {s'{\rm{|}}s,a} )$ is the transition probability to the new state $s'$ from s taking context caching action a:

\begin{equation} {V^\pi }\left( s \right) = \mathop \sum \limits_a \pi (a|s)\mathop \sum \limits_{s'} p\left( {s'|s,a} \right)\left[ {r + \left( {\gamma \times {V^\pi }\left( {s'} \right)} \right)} \right] \end{equation}

(2)

\begin{equation} {Q^\pi }\left( {s,a} \right) = \mathop \sum \limits_{s'} p\left( {s'{\rm{|}}s,a} \right)\left[ {r + \gamma \mathop \sum \limits_{a'} (\pi (a'|s') \times {Q^\pi }\left( {s',a'} \right))} \right] \end{equation}

(3)

Figure 4 depicts the system design of the sub-components of the ACAgn. We propose an event-driven architecture for the agents developed using publisher-subscriber and observer patterns to handle streams of context information to evaluate for caching and trigger reward calculations, respectively. The state translator converts a context to the respective state which is published to the queue. Then the actor-critic network reads from the queue and makes the selective caching decisions. Context selection decisions are logged in the Decision History Registry (DHR) and Delay Time Registry (DTR) or Cache Life Registry (CLR) if the decision was not to cache and cache, respectively. Each piece of cached context is profiled using the Cache Profiler (CP). The Event Listener listens to two types of events originating from the DHR, CLR, and/or CP. Firstly, to “calculate reward” events. This event is triggered when (i) a piece of context information is prematurely evicted, i.e., before $\mathbb{E}[ {C{L_i}} ]$ elapses, (ii) $\mathbb{E}[ {C{L_i}} ]$ has elapsed when ${a_i} = 1$, (iii) $\mathbb{E}[ {D{T_i}} ]$ has elapsed when ${a_i} = 0$, or (iv) the dedicated limited-sized list collecting the sequence of $Re{t_i}$ in the cache profiler is full. Secondly, to “extend cache life” events that are triggered when a piece of context information avoids eviction even after the $\mathbb{E}[ {C{L_i}} ]$ has elapsed. Accordingly, the first event causes the actor-critic network to learn and the latter to make selective caching decisions.

Fig. 4.

The output of the actor-critic network $( {0 \le {v_i} \le 1} )$ – the probability to cache can be described as a measure of fitness to cache over the estimated $\mathbb{E}[ {C{L_i}} ]$ defined in the feature vector. For instance, ${v_i} = 1$ denotes the context can be cached for the entire $\mathbb{E}[ {C{L_i}} ]$. On the contrary, ${v_i} = 0$ denotes not to cache the context. So, ${v_i} = 0.5$ is an ambiguous state. Using this principle, we can map the output value to our binary cache action $( {{a_i}} )$ as:

\begin{equation} {a_i} = \left\{ {\begin{array}{@{}l@{\quad}l@{}} 1;& if\ {v_i} > 0.5\ \\ 0;& otherwise \end{array}} \right. \end{equation}

(4)

Accordingly, we define $C{L_i} = ( {{v_i} - 0.5} ) \times \overline {C{L_i}} /0.5$ and $D{T_i} = ( {0.5 - {v_i}} ) \times mid/0.5$.

As indicated in Figure 4, we use a delayed reward calculation scheme using event triggers and listeners to capture the actual return from the cache action as opposed to calculating expected values for the reward at the decision epoch, e.g., as in traditional data caching techniques [17, 31]. The window during which the reward is calculated is referred to as the learning epoch. Consider ${R_{i,t + \tau }}$ is the number of context requests that access context i between the selective decision epoch t and the learning epoch after $\tau$ windows. Then, $\tau = D{T_i}$ when ${a_i} = 0$ and $\tau = \min ( {C{L_i},long,{\tau _{full}}} )$ given ${a_i} = 1.$ ${\tau _{full}}$ is the time at which the dedicated limited-sized list collecting the sequence of $Re{t_i}$ in the cache profiler gets full. Then the reward is calculated as follows:

\begin{equation} Reward = \left\{ {\begin{array}{@{}l@{\quad}l@{}} f;& {a_i} = 1\\ g;& otherwise \end{array}} \right. \end{equation}

(5)

\begin{equation} f = \left\{ {\begin{array}{@{}l@{\quad}l@{}} \frac{{\mathop \sum \nolimits_{j = 0}^{{R_{i,t + \tau }}} Re{t_{i,j}}}}{{{R_{i,t + \tau }}}};& if\ {R_{i,t + \tau }} > 0\\ - 10;& otherwise \end{array}} \right. \end{equation}

(6)

\begin{equation} g = \left\{ {\begin{array}{@{}l@{\quad}l@{}} \frac{{\mathop \sum \nolimits_{j = 0}^{{R_{i,t + \tau }}} \left( {Ret{{\left( t \right)}_i} - \mathbb{E}\left[ {Ret{{\left( t \right)}_i}} \right]} \right)}}{{{R_{i,t + \tau }}}};& if\ {R_{i,t + \tau }} > 0\\ 5;& otherwise \end{array}} \right. \end{equation}

(7)

The function $f$ assigns a large negative reward to context information that have not recorded any or has recorded infrequent intermittent accesses subsequent to being cached. Such context information is cost-inefficient to cache because they incur hold-up costs – the cost of occupying the cache space for no return. Function $g$ assigns positive reward for ${a_i} = 0$ decisions for such context. But $| g | < | f |$ because we encourage the agent to maximize the number of context information that get cached $( {{a_i} = 1} )$ considering that one of our objectives is to minimize RT for time-critical context queries.

4.2 Deep Deterministic Policy Gradient Agent

An actor-critic network that searches for the optimal ACOCA policy was implemented to compare with ACAgn. DDPGAgn is implemented using the same event driven architecture depicted in Figure 4. Rewards are calculated similarly as in (5), (6), and (7). We implemented four deep neural networks (i.e., Q-networks): an actor network, a target-actor network, a critic network, and a target-critic network to improve self-learning [26, 30].

Consider the previous agents – StatAgn, and ACAgn. When to cache context decision was made using the caching or not caching decision as a confidence metric, e.g., as in (4). This solution results in either caching a piece of context for at least the estimated period $C{L_i}$ or delay evaluating for caching for $D{T_i}$. Both $C{L_i}$ and $D{T_i}$ are continuous values of time. On this basis, we define a continuous action space $\mathcal{A}{\rm{:= }}[ { - long,long} ]$ for the DDPGAgn where ${a_i} \in \mathbb{R}$ ($\mathbb{R}$ is the set of real numbers). Then the discrete context selection actions (i.e., cache or not cache) can be defined as follows:

\begin{equation} discreteAction\ = \left\{ {\begin{array}{@{}l@{\quad}l@{}} cache,\ C{L_i} = {a_i};& if\ {a_i} > 0\ \\ not\ cache,\ D{T_i} = \frac{{{a_i}}}{W}& otherwise \end{array}} \right. \end{equation}

(8)

Based on the work of Lillicrap et al. [26] on continuous controls, (3) can be modified as follows to avoid the inner expectation when the optimal policy is deterministic $( {\varrho :\mathcal{S} \leftarrow \mathcal{A}} )$.

\begin{equation} {Q^\varrho }\left( {s,a} \right) = \mathop \sum \limits_{s'} p\left( {s'{\rm{|}}s,a} \right)\left[ {r + \gamma \mathop \sum \limits_{a'} (\pi (a'|s') \times {Q^\varrho }\left( {s',a'} \right))} \right] \end{equation}

(9)

Accordingly, ${Q^\varrho }$ can be learnt off-policy, i.e., using Q-Learning, using a transitional probability distribution generated from experience gathered during runtime by a different process (which we implement in the form of Cache Profiler). It enables it to overcome the lack of prior knowledge problem when adapting to context and context query variations. The actor uses a parameterized function $\varrho (s|{\theta ^\varrho })$ where $\theta$ are the parameterized performance metrics, i.e., AR, HR, retrieval costs,and so on. The critic network implements the Bellman equation above in (8).

A crucial difference between the DDPGAgn and ACAgn is the learning process. ACAgn updates the model for each context i in the Decision History. This is computationally expensive, and the process does not generalize for heterogeneous context information (which is unique feature of context query loads) since the model attempts to optimize for each piece of context information. It is a significant drawback to the scalability of an ACOCA mechanism when the context space is large. So, the DDPGAgn leverage a mini-batch processing technique that samples $\vartheta$ recent decisions, e.g., 16 decisions, and soft target updating from [26] to update the network and target-networks in each learning cycle to minimize biasness to certain situations. The soft target update function can be indicated as follows.

\begin{equation} \theta ' = \tau \theta + \left( {1 - \tau } \right)\theta ' \end{equation}

(10)

4.3 Context Cache States

Figure 5 depicts the state-transition diagram of a piece of context information. The ghost cache state refers to being temporarily moved to a buffer memory for eviction until any substantive context queries are responded using the context in concern before its freshness is completely lost. A piece of context information is evaluated asynchronously for caching either when (i) it is not cached, or (ii) $C{L_i}$ has elapsed. We use the term complete hit to describe access to cached context that also conform to quality requirements of the querying consumer, e.g., validity of the number of available parking slots for the navigation application. A partial hit (or also called a partial miss) occurs when a derived high-level context needs to be re-interpreted and refreshed using lower-level context where at least one of the low-level context used to derive it is invalid to change the currently cached context value. A complete miss is when neither of the above criteria is met.

Fig. 5.

According to [28, 45], context retrievals account for 20-25% of the costs incurred by a CMS whereas the rest is attributed to penalties incurred using the redirector mode. We contend that our cost-efficiency objective can be achieved through minimizing the response latency of responding to context queries. The cost can be further minimized by managing the number of retrieval operations including refreshing. We develop a three part solution for this matter. First, we bypass context selection delays enforced by $D{T_i}$ when a registered event is recognized in the continuous event monitor. Secondly, our approach leverages shared context retrievals – using a single retrieval operation to service a number of parallel queries that share similar context requirements. All cached context, e.g., context attributes, and entities, which can be derived using the data from the same context provider are refreshed from a single retrieval operation. Thirdly, the reactive refreshing strategy [28, 45] is used in this paper so that the retrievals occur only when the context is requested and invalid, delaying the need for retrieval.

4.4 Handling Uncertainty: Heuristics, and Context State Space Exploration

Consider applying (5) for a piece of context information that lacks previously observed data to accurately calculate the $\mathbb{E}$[HR]. Therefore, we calculate $\mathbb{E}$[$C{L_i}$] based on the estimated $\mathbb{E}[ {HR} ]$ in (11) using on Figure 6. The residual lifetime [41] remaining to cache, $ResiL = ( {{f_{arrive}} - {f_{thresh}}} ) \times \overline {ReL} /( {1 - {f_{arrive}}} )$ given ${f_{arrive}} = 1 - \overline {ReL} /{L_i}.$ HR$\to$0 when ${f_{arrive}} \le {f_{thresh}}$ because $ResiL \le 0$. Otherwise, considering a minimum $\mathbb{E}[ {HR} ] = 0.5$ that could be translated into the criteria $ResiL \ge 2/\mathbb{E}[ \lambda ]$. $\mathbb{E}[ {HR} ]$ is defined as:

\begin{equation} \mathbb{E}\left[ {HR} \right] = \left\{ {\begin{array}{@{}*{1}{c}@{}} {1;when\ {L_i} \to \infty }\\ {\frac{{\mathbb{E}\left[ {ReL} \right] \times \mathbb{E}\left[ \lambda \right] \times \mathbb{E}\left[ {AR} \right]}}{{\left( {\mathbb{E}\left[ {ReL} \right] \times \mathbb{E}\left[ \lambda \right] \times \mathbb{E}\left[ {AR} \right]} \right) + 1}}} \end{array}} \right.;otherwise \end{equation}

(11)

Fig. 6.

Similar notion to (10) is found in [39]. It is the probability of an access being a hit $({P_{hit}}( i )$) defined in GoodFetch – a cached item that balances access and update (defined as refreshing in our scope) frequencies.

Another approach to overcome uncertainty and lack of prior knowledge about context is to explore the state space using an $\varepsilon$-greedy method. It randomly caches context information at a probability of $\varepsilon$. Although it is simple to implement, complete random exploration cannot minimize convergence time, performance- and cost-efficiency degradations if the context state space is large. Rather, we use the Most Frequently Used (MFU) strategy to cache context information greedily based on the popularity factor. The choice of this heuristic was made based on experimental observations comparing against the Most Recently Used (MRU) and complete random strategies, the discussion of which is beyond the scope of this paper.

A further drawback of $\varepsilon$-greedy exploration in the literature is the lack of adaptiveness [20]. Context query loads are generally heterogeneous but can exhibit, (i) homogeneity, e.g., a higher volume of context queries searching for alternate routes after a roadside accident, and/or (ii) consistency, e.g., recurring spatial-temporal query patterns such as searching for a car parking space during rush hours. Constant $\varepsilon$ value could result in unnecessary explorative caching in these scenarios resulting in cache-inefficiencies, especially when the cache memory is limited. It is rather more rational to exploit – relying on the known actions that generate higher reward, e.g., caching context i which generated a positive reward from caching, for a similar situation. On the contrary, more exploration, i.e., large $\varepsilon$, could be beneficial when the queries are highly random. We adapt $\varepsilon$ by modifying $\varepsilon = \varepsilon \pm \zeta$ after each $cycl{e_{learn}}$ [26] as a result. The dynamism of the context queries is quantified using $\overline {Reward}$ for all the context caching decisions since the last $\varepsilon$ modification event. For instance, consider a stream of context queries that request a set of novel context information. A trained SCA may decide not to cache due to the lack of statistical observations and negative expected values, e.g., (4) could yield a negative average reward despite the context showing high AR. $\varepsilon$ is incremented if $\overline {Reward} < Rewar{d_{max}}$ or decremented otherwise to adapt to this situation.

We also introduced an adaptive $\delta$ that modifies $\delta = \delta \pm \omega$ after a constant number of decisions referred to as the learning cycle $( {Cycl{e_{learn}}} )$ to reflect the current level of volatility in the query load. For instance, $\delta$ is reduced by $\omega$ in default for each window unless a $cycl{e_{learn}}$ occurred that increments $\delta$ by $\omega$. The rationale is $cycl{e_{learn}}$ occurs frequently when the queries demand a heterogenous set of context information to be evaluated for caching – a feature of volatility.

4.5 Context Cache Memory Organization

In-memory caches are faster yet more expensive compared to persistence-level caches used in literature, e.g., [28]. Further, devices that share memory units with cache, e.g., routers, suffer from data volatility. Although contemporary in-memory technologies such as Redis [55] support replication across distributed nodes, transient context cannot be recovered but only reproduced in case of a loss [46] because transiency does not guarantee recovered context from replication instances to comply with Quality of Context parameters, e.g., freshness. In this paper, we can assume that context originating from IoT devices is small (because of the low level context considered in this paper) and reliably retrieved. So, context can be reproduced fast (based on related literature [4]). We opt to in-memory context caching based on the following reasons: (i) low cache seek time suitable for responding to time-critical queries, and (ii) to test the generalizability of ACOCA in a system where main memory is shared with cache (as at the network edge).

Medvedev [28] proposed a logical cache hierarchy which is extended in this paper. Consider Figure 7 below. It illustrates an alternate simplified context hierarchy. As we will discuss next, the context cache processes follow either top-down, e.g., for eviction, or bottom-up, e.g., when performing context cache placement and refreshing, cache scans.

Fig. 7.

A unit of cache memory is defined logically in our cache memory organization. We chose this design because of two advantages. Firstly, it separates the complicated physical memory management tasks from the logical cache. For instance, consider a Cloud-based cache memory where the context information is physically fragmented across multiple distributed nodes. The defragmentation routine may result in scaling down the number of physical instances to, e.g., minimize hold-up costs. But the logical cache memory when accessed by the CMS will be location transparent, oblivious to scaling, and fragments – suitable for pervasive context-aware applications to dependably access context information from context cache memory. Secondly, the logical context cache memory organization maintains the logical affinity of hierarchically related context information [44]. The context cache memory is, hence, scaled in logical units; however, corresponding to the piece of context information that is relatively highest in the logical hierarchy among the cacheable context, e.g., in terms of number of context entities in the low-level context cache.

4.6 Eviction Heuristics

One of the features of cache memories in a distributed system is the diversity of capacity and technology. Devices towards the edge are often limited in size and shared [3] whereas dedicated, scalable, and large sized cache memories can be found in the Cloud [9]. A critical evaluation of previous literature suggests several advantages of leveraging scalable cache memories to cost-effectively handle volatile context query loads [44]. Although comprehensive solutions for issues such as post-scaling degradation or performance cliffs [10, 40], and distributed fragmentation [55] are yet to be investigated, overall performance gains outweigh the disadvantages. We discuss a criterion to adaptively scale context cache memory as a part of the adaptive context caching and eviction processes in this section based on this rationale.

We assume that the pay-as-you-go scheme of Cloud caches involves making an upfront payment for each unit of cache, e.g., $1.68 per GB per month for a stateful File Cache instance [52]. The cost of caching is distributed among all the context information that share the cache memory. So, an underutilized Cloud cache is cost inefficient. According to Figure 8, we attempt to evict context(s) only when the remaining cache space is insufficient. Consider the logical hierarchy of the context cache. A top-down approach to eviction can release a large cache memory space in a single eviction cycle.

Fig. 8.

We define two types of evictions: (i) mandatory, and (ii) selective. Mandatory eviction involves removing a piece of context, e.g., an entity, and all its child context in the logical hierarchy, e.g., context attributes, from the cache. Selective eviction removes only lower-level context which satisfy the decision criteria of the eviction algorithm. For example, only the speed attribute is evicted from an entity car, while the attribute direction remains in the cache. We use a threshold $( \eta )$ based approach to categorize the types of evictions, the specific definition of which depends on the eviction algorithm.

We compare the performance of windowed (i) random eviction, (ii) Least Frequently Used (LFU), (iii) Least Valued First (LVF) [24], and (iv) our novel Time-Aware Hierarchical (TAH) eviction policies. TAH combines the advantages of time-awareness in TLRU [5] with LVF [2] and LFU as indicated in Figure 9. Hierarchically, TAH applies LVF and LFU for selecting the evitable entities and attributes, respectively. Previous literature in caching specify that access to objects follows a Zipf-like distribution [39, 48] and popular items are expected to create higher hit rates [48]. Raw data from context providers are the frequently accessed in this regard since they get reused with context attributes having multiple providers. So, context attributes can also share this feature. Context attributes are the lowest in the logical context hierarchy and evicting attributes based on the frequency of access is logical. According to [14] as well, sub-queries are based on entities. We have also shown above in Section 2 that criticality differs for context queries comparing Q₁ and Q₇. It is more useful to evict entities based on their significance in contribution to queries which we quantify using the “value”. The TAH policy is therefore designed accordingly.

Fig. 9.

It is important to note that all eviction policies considered in this work other than the TAH policy are not time-aware, i.e., a piece of context information can be evicted despite the estimated $C{L_i}$ not elapsing. The value of a piece of context information to evict is calculated as follows where $\hat{I}$ refers to the most popular piece of context in the same logical level, $\ddot{I}$ refers to the context that takes the longest latency to retrieve and $\Delta C{L_i}$ refers to the remaining cached lifetime:

\begin{equation} Valu{e_i} = \left( {\kappa \times \frac{{A{R_i}}}{{A{R_{\hat{I}}}}}} \right) + \left( {\mu \times \frac{{\Delta C{L_i}}}{{C{L_i}}}} \right) + \left( {\nu \times \frac{{\overline {R{L_i}} }}{{\overline {R{L_{\ddot{I}}}} }}} \right)\ \ \end{equation}

(12)

where, $0 \le \kappa ,{\rm{\ }}\mu ,{\rm{\ }}\nu ,\frac{{A{R_i}}}{{A{R_{\hat{I}}}}},{\rm{\ }}\frac{{{\rm{\Delta }}C{L_i}}}{{C{L_i}}},{\rm{\ }}\frac{{\overline {R{L_i}} }}{{\overline {R{L_{\ddot{I}}}} }}{\rm{\ }} \le 1$ and $Valu{e_i}{\rm{\ }} \in \mathbb{R}_0^ +$. As an example, $Valu{e_i}$ is low and a possible candidate for eviction when an entity is relatively unpopular, ${\rm{\Delta }}C{L_i} \to 0$ or ${\rm{\Delta }}C{L_i}$ has elapsed and the context can be retrieved reliability and fast such that re-derivation is cheap to perform. Context information that avoids being evicted by the end of their $\mathbb{E}[ {{\boldsymbol{C}}{{\boldsymbol{L}}_{\boldsymbol{i}}}} ]$ (set at the caching epoch) are reevaluated and the Cache Life Registry (CLR) is updated with a recalculated $\mathbb{E}[ {C{L_i}} ]$. This extends the residence period of a piece of context information in the cache memory.

5 Implementation and Evaluation

In this section, our adaptive context caching (ACOCA) approach is evaluated and compared against the benchmarks (explained below). We evaluate whether the cost- and performance-efficiencies of the CMS are maximized using ACOCA.

5.1 Prototype and Experimental Setup

We developed our proposed methodology as a proof-of-concept (POC) using Python 3.8.2. The architecture of the system is illustrated in Figure 10. All processes other than those in the context query execution path execute asynchronously in an event-driven manner. The response latencies of context queries are, hence, statistically independent from the response latency of the ACOCA agents. The average response latency to make ACOCA decisions were $0.639 \pm 0.07$ms and $2.235 \pm 0.53$ms using the ACAgn and DDPGAgn respectively. In comparison, StatAgn took $0.428 \pm 0.00$ ms.

Fig. 10.

The architecture comprises six components: (i) Context Query Execution Manager (CQEM), (ii) Storage Query Execution Agent (SQEA), (iii) Selective Caching Agent (SCA), (iv) Cache Operations Module (CaOpM), (v) Cache Memory Module (CMM), and (vi) Context Service Resolver (CSR). CQEM coordinates the context query execution and performs data transformation operations. SQEA executes context cache access operations. SCA makes the context caching decisions. CaOpM executes context data assignment in cache memory, context refreshing, and eviction operations. CMM is a self-contained component that monitors, scales, and stores context information for fast access based on SCA's directives called the context caching plan [44]. CSR selects relevant context providers and establishes connections. All components are multi-threaded, e.g., data from context providers for a query is retrieved parallelly, and access concurrency to shared resources (i.e., hash tables) are controlled using hierarchical write-locks. Shared services (e.g., SCA) utilize the publisher-subscriber pattern that reads from First-in-First-out queues. gRPC is used for inter-service communication. This design strategy focuses on massive scalability in a distributed environment by creating autonomous agents to simplify the complex ACOCA problem.

We used SQLite to handle structured data repositories such as the context services registry, service descriptions (SD) store, and resource lookup. MongoDB was used for maintaining logs, statistics, and historic context repositories.

5.1.1 Implementing the Reinforcement Learning Agents.

The ACAgn and DDPGAgn implement fully connected neural networks for both actor- and critic-network using TensorFlow 2.0 and Keras. We implemented two hidden layers containing 512 and 256 neurons respectively to minimize the overhead ACOCA agents introduce to the CMS. It also enables us to compare the RL agents with the StatAgn reliably as a light-weight solution suitable in limited resourced distributed environments. The hidden layer activation function was the Rectified Linear Unit (ReLu). SoftMax and Tanh were used in the output layer of the ACAgn and in the critic-network of the DDPGAgn for activation. Mean squared error (MSE) was used to calculate the loss between the Q-target and the Q-value at the actor-network during learning whereas Adam optimizer was used in updating. Scipy is used for mathematical operations including extrapolation. The size of the replay buffer used in DDPGAgn was set to 100 recent decisions.

5.1.2 Deployment and Hyper-parameter Setting.

We used a single Unix-based instance having 8GBs of shared memory and 2.4GHz of CPU clock speed to respond to queries, emulating a typical contemporary edge device. Initial values of the hyperparameters are shown in Table 4. The default logical cache size of a cache memory unit is three entities, considering the highest number of entities accessed by a single context query (i.e., Q₄ and Q₇) among those stated in the motivating scenario. So, a limited cache memory can accommodate only 37.5% of all cacheable entities in the scenario.

Table 4.

Parameter	Value	Parameter	Value	Parameter	Value
$\varepsilon$	0.5	${\varepsilon _{min}}$	0.001	$\vartheta$	16 decisions
$\varphi$	0.5	${\varepsilon _{max}}$	0.95	$x$	10 decisions
$\gamma$	0.9	$\beta ,\tau$	0.001	short	1 window
$\alpha$	0.0001	$\omega ,{\rm{\ }}\zeta$	$\pm$0.005	mid	5 windows
$\kappa ,\mu ,\nu ,\eta$	1.0	W	5 seconds	long	10 windows
$Rewar{d_{min}}$	1.0	$Cycl{e_{learn}}$	20 decisions

Table 4. Parameters and Values used in the Experimental Setup

5.1.3 Simulating the Busy Intersection.

We simulated the context providers using a proprietary carpark simulator from our previous work [45] and the IoT data simulator [56]. Context are retrieved via REST APIs from the providers. Response latency of each context provider is normally distributed, e.g., for $GET\ :/cars?vin =$, $\overline {RL} = 0.997s$, and ${\sigma ^2} = 0.012.$ Each context provider responds with a data that could be mapped to context attributes of the requested entities using the defined schema. The SLA that CMS has with the context provider defines only the cost per retrieval $( {Cos{t_{ret}}} )$.

We consider a static context lifetime $({L_i})$ and a linear freshness decay function to setup a controlled environment that focuses on our primary objectives, considering that $\mathbb{E}[ {{L_i}} ]$ of each context attribute is known using our lifetime estimation technique [45] for a planning period (PlnPrd). So, ${L_i} = {\rm{max}}( {1/S{R_q},\mathbb{E}[ {{L_i}} ]} )$ for any context data from a context provider.

Our motivating scenario contains eight context query types and 12 sub-query types involving eight entities, seven context services, and 25 context providers which can be found in GitHub¹ . The repository contains all the datasets used to manipulate the behaviours of the entities, SLAs, context provider registry, and context query load. All context queries are pull-based. We defined eight context consumers (real-time pervasive computing applications) that have negotiated at least one SLA with the Context Management System (CMS). These SLAs specify (i) the price per valid response $( {Pric{e_{res}}} )$, (ii) freshness threshold $( {{f_{thresh}}} )$, (iii) cost of penalty for an invalid response $( {Cos{t_{pen}}} )$, and (iv) the maximum tolerated response latency $( {R{T_{max}}} )$. The 153,600 different combinations of context query types, sub-query types, and SLAs that request context information about the heterogenous context entities were executed continuously and some query instances repeated where necessary. We suggest interested readers refer to our repository¹ for further details about the SLAs, Service Descriptions (SDs), entities, and attributes used in the experiments. A context consumer can also be a provider, e.g., when related to a car in the scenario. The scenario contains complex situations involving stationary and non-stationary entities where meta-data about the entity is derived from the SD, i.e., VIN of car_1, location of carpark₂. Apache JMeter was set up to generate the query load conforming to a Poisson distribution where $\lambda = \ 2.5/s$.

5.1.4 Performance Measurement.

We present context caching occurring in the low-level context cache (refer to Figure 7) in this paper. Each test case was executed at least three times per session (i.e., 4,413 context sub-queries) over two independent sessions and then averaged. We define two metrics to evaluate the cost-efficiency of a CMS using an adaptive context caching mechanism. Firstly, we calculate and store the pessimistic return (PessiRet). PessiRet is derived using the most expensive and the least valuable SLAs applied within each window. The most expensive SLA defines the most stringent QoS requirements underpinned by the highest $R{T_{max}}$ and $Pe{n_{del}}$. The least valuable SLA offers the least $Pric{e_{res}}$. These SLAs are selected at the end of each window by observation. PessiRet is calculated similarly to (3) using observed values instead of expected values. Secondly, the total return from responding to all context queries during the observed period is calculated using $TotalRet = \mathop \sum \nolimits_{\forall t} \mathop \sum \nolimits_{\forall i} Ret{( t )_i}.$ To measure the performance-efficiency, variation of response time, probability of delay (PD), throughput, cache throughput, hit rate (HR), number of context retrievals from providers, and the number of evictions are utilized.

5.1.5 Benchmarks.

The two agents tested under different combinations of configuration parameters (i.e., cache memory size, eviction policy, hyper-parameter sharing, and parameter update policy) are compared against the (i) short-term optimal StatAgn [43] to compare the short-term cost- and performance-efficiency of the agents, (ii) probabilistic context caching (Prob) proposed by Khargharia et al. [23], (iii) redirector mode (retrieving context information for each query without storage) to compare the impact of context caching on meeting the objectives, and (iv) the popular database mode of operation used predominantly in the state-of-the-art in CMSs [6, 8, 12, 19, 22, 32].

5.2 Results and Discussion

In this sub-section, we present a summary of the results gathered from experimenting with our prototype. Errors indicated conform to a 95% confidence interval.

First, we observed the impact adaptive context caching has on performance-efficiency of the CMS. The response time reduced by ∼85% compared to the redirector mode. All the selective caching agent (SCA) minimized caching context that were requested intermittently. However, ACAgn was 50% slower compared to the StatAgn to reach this status as we expected since ACAgn has to gather adequate experience over a longer period of time before it can make accurate value estimations. Table 5 compares the overall performance of our approach using limited and scalable context cache memories for each tested agent and the benchmarks.

Table 5.

Agent	Cache Size/Mode	RT	HR	PD
None	Redirector mode	$1.3843 \pm 0.01$	$0 \pm 0.00$	$0.6712 \pm 0.00$
None	Database mode	$0.7922 \pm 0.12$	$0.8755 \pm 0.07$	$0.5068 \pm 0.27$
Probabilistic [23]	Limited	$0.4547 \pm 0.04$	$0.7563 \pm 0.04$	$0.3407 \pm 0.18$
Probabilistic [23]	Scalable	$0.5347 \pm 0.04$	$0.6230 \pm 0.05$	$0.3927 \pm 0.20$
StatAgn [43]	Limited	$0.9903 \pm 0.09$	$0.7031 \pm 0.04$	$0.5197 \pm 0.05$
StatAgn [43]	Scalable	$0.6288 \pm 0.05$	$0.9007 \pm 0.02$	$0.4382 \pm 0.01$
ACAgn	Limited	$1.2262 \pm 0.18$	$0.4522 \pm 0.08$	$0.6346 \pm 0.02$
ACAgn	Scalable	$1.1719 \pm 0.17$	$0.4471 \pm 0.07$	$0.6077 \pm 0.05$
DDPGAgn	Limited	$1.1843 \pm 0.18$	$0.6030 \pm 0.10$	$0.6207 \pm 0.02$
DDPGAgn	Scalable	$0.9728 \pm 0.15$	$0.7283 \pm 0.09$	$0.5526 \pm 0.03$

Table 5. Comparison of Agent Performance using Scalable and Limited-sized Cache Memories

There exists a strong negative correlation of -0.9672 between RT and HR using a limited cache for ACAgn. The correlation coefficient is -0.7621 otherwise. There exists no correlation between RT and PD using a limited cache memory. Although, RT and PD are strongly correlated, i.e., 0.8531, using a scalable context cache memory. Corresponding variances of RT limited-sized and scalable cache memories are 1.098 and 0.694. Therefore, we contend that the reason for the above observations is attributed to the limited-sized cache memory resulting in higher number of cache evictions. Similar variance can be observed with other metrics as well. Therefore, limited-sized context cache memories do not guarantee consistent QoS, i.e., dependability at any time, compared to a scalable one.

Both ACAgn and DDPGAgn show a strong correlation between RT-HR and also RT-PD irrespective of cache size. We summarize those results in Table 6 below. It conforms to our assumption that ${P_{DC,i}} \propto 1 - HR{( T )_i}$ in (4).

Table 6.

Agent	Cache Size	RT-HR	RT-PD	HR-PD
ACAgn	Limited	−0.7598	0.8107	−0.5895
ACAgn	Scalable	−0.7710	0.9094	−0.7088
DDPGAgn	Limited	−0.8639	0.8349	−0.6888
DDPGAgn	Scalable	−0.8650	0.9322	−0.8144

Table 6. Correlations of RT, HR, and PD for RL-based Agents

In the next two subsections, we will discuss the performance efficiency of the CMS using ACOCA agents in the short term. It is measured in $\overline {HR}$, $\overline {RT}$, throughput, and $\overline {PessiRet}$ under different configurations. We will then compare the overall cost efficiency. Next, the long-term efficiencies are discussed, comparing all the agents with benchmarks. Finally, we summarize the key observations from our experiments, highlighting the key principles of ACOCA.

5.2.1 Actor-Critic Agent in the Short Term.

First, we investigated exploratory selective caching using ACAgn with adaptive $\varepsilon$-greedy policy initiated $\eta = 1.0$ for the ACAgn overcome the drawback of longer convergence time, support with exploring the state space and gathering enough experience in a short span of time. Figure 11 summarizes the empirical results.

Fig. 11.

The agent makes random decisions when the transition probability distribution is non-existent at first. We refer to this default as random exploration. Several other key observations are as follows. Firstly, the impact of explorative caching on context caching is only marginal using a limited-sized cache. This is because of frequent evictions. Secondly, only the LVF policy makes significant performance improvements on caching with exploration. This supports the argument behind our novel eviction policy to combine LVF with LFU. The novel policy shows significantly low PD and RT, greater throughput, and the lowest negative $\overline {PessRet}$ among all the other results using a scalable cache memory given no explorative caching is performed. We contend that the novel policy had a detrimental effect from exploration because the dominant feature in cached context management is now the frequency of use, i.e., LFU of the policy and MFU of exploration. It shows better performance using the LFU policy rather than LVF as was observed using the StatAgn [43]. Thirdly, $\overline {PessiRet} < 0$ using scalable context cache memory. Contrarily, $\overline {PessiRet} > 0$ using StatAgn under the same conditions [43]. Further, these $\overline {PessiRet}$ values are worse compared to using limited-sized cache memory. The reason for the reduced $PessiRet$ could be found again with explorative caching. It suggests that the RL-based models require further learning (i.e., longer than the short term) to reach an optimal solution with exploration.

Considering the context query loads are generally heterogeneous, and the objective to minimize the converge time to an optimal solution, exploration is inevitable. LVF reflects the features of our novel policy during the best-case scenario of a homogenous context query load leading to no or low exploration. Therefore, we find that the LVF policy complements the ACAgn to provide cost-optimal context query management services using a context cache memory.

5.2.2 Deep Deterministic Policy Gradient Agent.

DDPGAgn was tested using a similar setting to ACAgn. The results are summarized in Figure 12. The striking difference in results between the two agents (i) proves our rationale as to why the policy should be parameterized using the plethora of the same performance metrics that are used to evaluate all efficiency goals of the CMS, and (ii) supports the argument to why DDPGAgn is a faster, flexible technique to make effective ACOCA decisions – one of the primary objectives of this work.

Fig. 12.

LFU performs consistently better compared to other policies with DDPGAgn including $\overline {PessiRet} > 0$ using scalable cache memory. Importantly, the HR is lower using a scalable context cache compared to the high HR using limited-sized context cache. As we will show in our discussion about long-term convergence, DDPGAgn is biased towards not caching much context information given the randomness of the context query load used in testing and high degree of reliability in retrieval. The query load is random enough that caching a piece of context result in a higher probability of partial misses. Given that the cost of cache memory$\to 0$, the marginal return from caching a piece of context information is minimized with size. The results also suggest that the penalty incurred as a result of cache seeking and then retrieving or refreshing consequently to a partial hit is costlier. So, a minimum number of context potent to contribute to the efficiencies are cached. We observe this explicitly as the mechanism scaled the cache memory only up to 6 entities in size. In comparison, the ACAgn and StatAgn expanded up to 9 entities in size. Two other reasons why DDPGAgn caches a smaller amount of context information are as follows. First, the transiency of context dealt with in this scenario resulting in a high penalty. Secondly, the low hold-up cost (i.e., zero-based on our assumption) due to low marginality in cache memory cost, resulting in context most potent for caching being cached for longer periods. The high HR can be similarly justified using these same arguments for limited-size context cache memory. Overall, as we expected in the discussion in Section 3, DDPAgn has adapted better compared to ACAgn based on all metrics.

Figure 13 shows a strong positive correlation when StatAgn uses a scalable cache memory and a negative relationship for Prob, between the number of entities evicted during PlnPrd and the variance of $\overline {PessiRet} \ ( {Var( {\overline {PessiRet} \ } )} )$.The relationship is weak for ACAgn and DDPGAgn. The relatively constant number of evictions against $Var( {\overline {PessiRet} \ } )$ using a limited-sized cache for StatAgn suggests that the realized returns from responding to context queries vary only based on factors other than the selective caching decision. Hence, the caching decisions made by the StatAgn are short-term optimal and can be defined as the short-term goal state. In contrast, there are several reasons for the correlations the two agents: (i) both agents are designed to converge for long-term optimality where $\gamma = 0.9$, (ii) $\gamma$ and $Cycl{e_{learn}}$ are defined statically in this work and the algorithm does not adjust to the variable dynamics of the environment in contrast to $\delta$ of StatAgn, and (iii) the correlation using the limited cache memory is a result of explorative caching since there are two distinct clusters that would theoretically merge overtime when the optimal solution is reached. Considering the dispersion of the results, and the large variance in $\overline {PessiRet}$that is larger than any of our proposed methods, we contend that Prob does not meet any of the short-term optimality features in contrast.

Fig. 13.

5.2.3 Overall Cost-efficiency.

Next, the total returns of the CMS were measured. Table 7 summarizes this data using the ACAgn and the DDPGAgn. Highlighted in bold are the settings used for which the maximum total return (lowest cost) was achieved.

Table 7.

Policy	Exploration	ACAgn			DDPGAgn
	Exploration	Earning – Penalties	Retrievals Cost	Total Return	Earning – Penalties	Retrievals Cost	Total Return
Scalable Cache Memory
Database mode	Random	−157.63	63.17	−220.80	−116.95	56.50	−173.45
Database mode	MFU	−138.33	65.67	−203.92	−108.15	59.25	−167.4
LFU	Random	−146.97	86.33	−233.30	−109.16	82.56	−191.72
LFU	MFU	−191.29	89.67	−280.96	−127.78	93.21	−220.98
LVF	Random	−193.95	80.75	−274.70	−157.14	93.31	−250.45
LVF	MFU	−179.42	95.42	−274.83	−176.32	109.29	−285.61
Novel	Random	−115.125	58.7	−173.82	−160.88	70	−230.88
Novel	MFU	−200.55	87.62	−288.18	−186.90	91.44	−278.34
Limited Sized Cache Memory
Random	Random	−171.42	81.25	−252.67	−197.83	114.25	−312.08
Random	MFU	−190.97	106.21	−297.18	−184.52	106.62	−291.15
LFU	Random	−178.68	84.33	−261.76	−155.80	101.88	−257.68
LFU	MFU	−180.53	72.54	−253.07	−161.30	88.60	−249.90
LVF	Random	−177.85	78.25	−256.10	−202.90	126.62	-219.68
LVF	MFU	−201.62	86.21	−287.83	−173.65	104.19	−277.84
Novel	Random	−206.41	127.08	−333.62	−186.05	113.12	−299.18
Novel	MFU	−185.725	105.54	−291.27	−173.56	101.79	−275.35

Table 7. Total Returns for the Context Management System using ACAgn, and the DDPGAgn in Dollars

These are best results under the set configurations. So, would prefer them highlights to get the attention of the readers.

The total return using the Redirector mode (no caching) was $-274.42 given the earnings deducted by penalties was $-195.42 and the retrieval cost was $79. StatAgn [43] was beneficial for selective caching where 47.94% and 34.57% increase in returns were observed for the best setting of scalable and limited-sized cache memories. Although the StatAgn was simplistic, the efficiency improvement is significant.

The improvement in returns is only 7.78% for ACAgn using a limited-size cache memory. The rest of the configurations in ACAgn yields a lesser total return. One of the reasons for this observation is the complex dynamics of the context queries and data, i.e., transiency, compared to data caching. Because of the random and heuristic exploration of the state space for learning, context that are cost-inefficient to cache are cached and vice versa until the ${\pi ^*}$ is reached. The results underlying a limited-sized context cache have always been worse than the random eviction policy. So, it confirms our hypothesis that the ACAgn does not converge in the short term.

The improvement in returns compared to the redirector mode was 60.22% and 26.06% respectively using the parameterized DDPGAgn with scalable and limited-sized cache memories. But the retrieval costs increased by 4.62% and 60.27%. The improvement in returns is greater than StatAgn's 47.94% using scalable memory [43]. We defer to the biasness to cache less context feature of the optimal policy for the random context query load used in testing as the reason. We identify DDPGAgn to have reached pseudo short-term optimality based on several features of the results: (i) all the metrics are better than that observed using the random policy (which with ACAgn was not the case), and (ii) the best configuration using scalable memory shows higher earnings deducted for penalties compared to the no eviction policy, i.e., $-109.16 > $-116.95. Later in the next sub-section, we show how the DDPGAgn improves in returns using a sample of sequential results as well. In order to investigate the reasons for the observed performance metrics, the key components of the returns need to be discussed. Figure 14 compares the total cost, penalty cost, and retrieval costs as a ratio of the earnings among the best configurations in the short run.

Fig. 14.

Our ACOCA approach aims at minimizing the costs to earnings ratios. StatAgn and Prob are advantageous in cost minimization among the other agents. StatAgn shows the highest penalty to total cost ratio of 0.8389 and 0.8257 respectively using scalable and limited-sized context cache memories as well. But DDPGAgn using the scalable cache memory is comparable to the StatAgn and Prob. ACAgn using a scalable cache, LVF eviction, and MFU exploration shows the lowest penalty to total cost ratio of 0.7131 followed by 0.7256 of the DDPGAgn using a scalable cache, LFU eviction, and random exploration. There are two reasons for this observation. First, based on our assumption for PD and the divergence of a number of evictions which we will indicate later in Figure 16, the penalties are minimized over time for RL agents. It is a result of gradual convergence to an optimal policy. In contrast, StatAgn does not adjust or improve over time. Second, the increase in retrieval costs using the RL agents is a result of (i) additional refreshing due to exploratory caching, and (ii) transient models (RL models that are yet to reach ${\pi ^*}$) not caching the relevant context, causing retrievals.

5.2.4 Convergence and Long-term Efficiency.

We investigated the performance of the selective caching agents over the long-term. The best configurations for each type of agent (highlighted in Table 7) using the scalable context cache memory in the short-term were used in the following experiments. Database mode and redirector modes of operations are not indicated in the following graphs because they are not adaptive. As indicated in Figure 15, the DDPGAgn performed the best in the long-term as we argued in the methodology. The pessimistic returns (PessiRet) and the hit rate (HR) have been maximized, while the response time (RT) has been minimized as it was set in the objective function – a novel contribution of this work in optimizing an CMS for both cost- and performance-efficiency, unlike any previous work including the probabilistic method (Prob) [23].

Fig. 15.

We contend that the DDPGAgn converge to an optimal policy within 200–250 windows considering that the variance of all the performance metrics minimize between this time. The convergence time is significantly lower in number compared to related work in traditional data caching where more than a thousand iterations or windows are spent until convergence, e.g., ∼20,000 in [36] and ∼500 in [47]. (i.e., our approach is at least two times faster). Hence, our adaptive context caching model is extremely and flexible that it is capable of adapting to real world situations extremely fast – a feature that is useful in delivering reliable and relevant context information for users without having to incur high costs of context access in a highly transient environment such as in the motivating scenario.

It should be noted that variance of any performance metric will not be zero in context cache management because of context refreshing events that happen in an ad hoc manner (i.e., because situations can change at any time). While the ACAgn and Prob could be seen to perform sub optimally than the DDPGAgn, we argue that it has been able to reach an optimal policy within our testing period because the ACAgn and Prob show a high level of variance for all performance metrics. It is due to the model exploring the context state space, which we theoretically argued could make the ACAgn converge later than the DDPGAgn due to the need to gather a large amount of experience to accurately estimate the value functions. Given ACAgn and Prob almost overlaps in Figure 15(c) we may also argue that these agents might have converged to a local optimum. Hence, we confirm that the value function based reinforcement learning techniques or similar control theories are not applicable when developing fast adaptive, flexible adaptive context caching approaches under uncertainty. DDPGAgn is extremely selective efficient in the long-term given the smaller number of context entity and attribute evictions compared to the other agents, which can be argued to have resulted in the consistency in QoS (because we showed in the short-term that evictions contribute to inconsistent QoS and incurred costs). Hence, DDPGAgn has been able to select and cache the most reusable and repurposeful context. Further to our arguments in [43], we also show that StatAgn is rather a short-term optimal solution that that is oblivious of the long term efficiency objectives because (i) the returns of the StatAgn are lower than the DDPGAgn and have been continuously varying during the period of testing resulting in a variance of 0.7246 in returns, (ii) HR of the context cache using StatAgn is similar to that achieved in the short-term, while DDPGAgn improves subsequently to an average HR of 0.977, and (iii) the large number of evictions suggesting low context selection efficiency. Using every parameter, we have shown our method outperforms the probabilistic method as well [23]. Table 8 summarizes the results obtained the long-term for the agents.

Table 8.

Agent	$\overline {PessiRet}$	$\overline {Var( {PessiRet} )}$	$\overline {RT}$	$\overline {HR}$	$\overline {PD}$	Attribute Evictions	Entity Evictions
DDPGAgn	0.8139	0.1901	0.4142	0.9767	0.267	68	51
StatAgn [43]	0.6277	0.7246	0.4211	0.9354	0.272	657	436
ACAgn	0.0558	0.7769	0.5544	0.8082	0.384	1017	639
Probabilistic [23]	0.0124	0.9416	0.4972	0.7859	0.3467	864	566

Table 8. Performance Summary of the Agents in the Long-term

Highlights the best results among benchmarks. Prefer having it highlighted.

5.2.5 Scalability of Our Approach.

We evaluated the scalability of our approach by gradually increasing the number of concurrent users for all adaptive context caching agents. Since the POC is executed using a single node configuration, we tested only up to 100 concurrent users. Figure 16 compares the average response time for sub-queries (SQs) when 1, 10, and 100 concurrent users make context queries using the DDPGAgn. The system was stable throughout while the PessiRet were $$5.278 \pm 1.77$ and $$63.099 \pm 20.01$ when 10 and 100 concurrent consumers. PD also reduced from 0.2847 to 0.2324 due to the high AR of cached context information. It suggests that our approach is more cost- and performance-efficient under large-scale.

Fig. 16.

It should be noted that there are two reasons for this scalability. Firstly, we indicated that the DDPGAgn as of the already established lightweight StatAgn incur a low overhead to the CMS. DDPGAgn takes only $2.235 \pm 0.53{\boldsymbol{\ }}$ms to make the selective context caching decisions, or ∼450 pieces of context information evaluated for caching in a second. Secondly, since we employ adaptive context state-space exploration, and parameter sharing between subsequent PlnPrds, our model is capable of fast adapting from an already learnt optimal state to another in case the subsequent PlnPrd shows minor variations to the previously explored instance of the PlnPrd. Hence, the ACOCA algorithm is able to effectively use the experience with our heuristics to manage any uncertainty without incurring a significant overhead or taking a long time to adapt. We see this in Figure 16 since we randomly modified the context query load in timings (e.g., the same sub-query randomly varied in the time it was made emulating a waiting-time paradox scenario, where the same driver can do a routine work at relatively the same time but with random variations) where our model converged much faster within 5 minutes.

6 Conclusion and Further Work

In our work, we proposed an approach for distributed adaptive context caching that maximizes cost- and performance-efficiency of Context Management Systems (CMS) using extensive experiments and simulations. We compared two selective context caching agents that implement reinforcement learning techniques to maximize the cost- and performance-efficiency of a CMS along with suitable heuristics to overcome the uncertainty and lack of prior knowledge about context information, and context query loads, e.g., about real-world situations. The paper also introduced novel eviction policies, and cache organization for caching context information. These are explained and proved using mathematical models, supported by examples from a real-world inspired motivating scenario. We evaluated the cost of responding to context queries and the Quality of Service (QoS) under multiple settings to identify the most efficient adaptive caching configuration.

We showed that our novel Statistical based Agent (StatAgn) and the agent that implements a Deep Deterministic Policy Gradient method (DDPGAgn) can minimize costs by up to 47.94% and 60.22% respectively in comparison to the redirector mode for the CMS fast, and flexibly in the short-term. Least Value First (LVF) and our bespoke Time-Aware Hierarchical (TAH) eviction policy for context information performs well under highly volatile context query loads whereas Least Frequently Used (LFU) was suitable otherwise, especially when the CMS utilizes limited-sized context cache memory. Especially in the long-term efficiency measurement where DDPGAgn outperformed all the benchmarks, ACOCA reduced redundant costs such as hold-up costs, refreshing, and processing in the CMS which otherwise could have occurred from caching or storing (in database mode) intermittently accessed context information. Hence, the DDPAgn was 8.67x times selective efficient than the next best approach, the StatAgn and the probabilistic method [23]. We also showed that the DDPGAgn is at least 2x times faster to converge to an optimal policy compared to related work in traditional data caching techniques that leverage similar techniques despite the additional uncertainties involved with context, e.g., changes in situations, and context query requirements, e.g., quality of context.

In summary, we develop an approach for adaptive context caching that is cost-efficient, performance-efficient, fast adaptive, and flexible such that context consumers can be delivered relevant and dependable context information cheaply, and consistently. In fact, by levering StatAgn and DDPGAgn in tandem, we contend that the important functional requirement of ACOCA – long term optimal and short-term awareness [44] can be guaranteed as well, unlike any previous work which suffers from the cold start problem and underperformance when in circumstances such as context uncertainty and lack of prior knowledge that our approach effectively mitigates. In further work, we intend to develop an edge computing based distributed adaptive context caching for the Distributed Context Intelligence System (DCIS) to further scale CMSs in the global IoT eco-system.

Footnote

Available at: https://github.com/ShakthiYasas/attribute-lifetime-measurement

References

[1]

Gregory D. Abowd, Anind K. Dey, Peter J. Brown, Nigel Davies, Mark Smith, and Pete Steggles. 1999. Towards a better understanding of context and context-awareness. In Handheld and Ubiquitous Computing, Hans-W. Gellersen (Ed.). Springer Berlin. Berlin, 304–307.

Symbol	Description
\(\lambda\)	Poisson request arrival rate, e.g., 1.5/second
\(\varepsilon\)	Exploration factor \(( {0 \le \varepsilon \le 1} )\)
\(\zeta\)	Exploration factor correction
\(\eta\)	Eviction threshold
\(\gamma\)	Discount rate of the RL agent, i.e., 0.9 \(( {0 \le \gamma \le 1} )\)
\(\alpha\)	Learning rate of the actor-network, i.e., 0.0001
\(\beta\)	Learning rate of the critic-network, i.e., 0.001
\(\tau\)	Factor for soft target network parameter updates
\(S{R_q}\)	Sampling rate of context provider \(q\) in \(q \in \mathcal{Q}:= \{ {0,1,2, \ldots ,\ Q} \}\), e.g., 0.33Hz
\({P_{DC,i}}\)	\(P{(Delay\|Cached)_i}\) – The conditional probability of getting a delayed response though the item is cached
\({f_{thresh}}\)	Minimum accepted freshness given an SLA
\({f_{arrive}}\)	Freshness of an item once retrieved
\(Pric{e_{res}}\)	Price earned per valid context query response
\(Cos{t_{re}}\)	Cost of retrieving context from context providers
\(ReL\)	Retrieval latency of a context provider
\({L_i}\)	Lifetime of a context item
\(Cycl{e_{learn}}\)	Learning Cycle, e.g., 20 selective caching decisions

Notation	Description
\(AR{( {long} )_i}\)	Observed AR at \(T - long\) for i.
\(AR{( {mid} )_i}\)	Observed AR at \(T - mid\) for i.
\(AR{( {short} )_i}\)	Observed AR at \(T - short\) for i.
\(\mathbb{E}[AR{( {long} )_i}]\)	Expected AR at \(T + long\) for i.
\(\mathbb{E}[ {AR{{( {mid} )}_i}} ]\)	Expected AR at \(T + mid\) for i.
\(\mathbb{E}[ {AR{{( {short} )}_i}} ]\)	Expected AR at \(T + short\) for i.
\(HR{( {long} )_i}\)	Observed HR at \(T - long\) for i.
\(HR{( {mid} )_i}\)	Observed HR at \(T - mid\) for i.
\(HR{( {short} )_i}\)	Observed HR at \(T - short\) for i.
\(\mathbb{E}[HR{( {long} )_i}]\)	Expected HR at \(T + long\) for i.
\(\mathbb{E}[ {HR{{( {mid} )}_i}} ]\)	Expected HR at \(T + mid\) for i.
\(\mathbb{E}[ {HR{{( {short} )}_i}} ]\)	Expected HR at \(T + short\) for i.
\(\overline {C{L_i}}\)	Average cached lifetime of i.
\(\overline {R{L_i}}\)	Average context retrieval latency of i.
\(\overline {Cos{t_{Ret,i}}}\)	Average cost of context retrieval of i.

Parameter	Value	Parameter	Value	Parameter	Value
\(\varepsilon\)	0.5	\({\varepsilon _{min}}\)	0.001	\(\vartheta\)	16 decisions
\(\varphi\)	0.5	\({\varepsilon _{max}}\)	0.95	\(x\)	10 decisions
\(\gamma\)	0.9	\(\beta ,\tau\)	0.001	short	1 window
\(\alpha\)	0.0001	\(\omega ,{\rm{\ }}\zeta\)	\(\pm\)0.005	mid	5 windows
\(\kappa ,\mu ,\nu ,\eta\)	1.0	W	5 seconds	long	10 windows
\(Rewar{d_{min}}\)	1.0	\(Cycl{e_{learn}}\)	20 decisions

Agent	Cache Size/Mode	RT	HR	PD
None	Redirector mode	\(1.3843 \pm 0.01\)	\(0 \pm 0.00\)	\(0.6712 \pm 0.00\)
None	Database mode	\(0.7922 \pm 0.12\)	\(0.8755 \pm 0.07\)	\(0.5068 \pm 0.27\)
Probabilistic [23]	Limited	\(0.4547 \pm 0.04\)	\(0.7563 \pm 0.04\)	\(0.3407 \pm 0.18\)
Probabilistic [23]	Scalable	\(0.5347 \pm 0.04\)	\(0.6230 \pm 0.05\)	\(0.3927 \pm 0.20\)
StatAgn [43]	Limited	\(0.9903 \pm 0.09\)	\(0.7031 \pm 0.04\)	\(0.5197 \pm 0.05\)
StatAgn [43]	Scalable	\(0.6288 \pm 0.05\)	\(0.9007 \pm 0.02\)	\(0.4382 \pm 0.01\)
ACAgn	Limited	\(1.2262 \pm 0.18\)	\(0.4522 \pm 0.08\)	\(0.6346 \pm 0.02\)
ACAgn	Scalable	\(1.1719 \pm 0.17\)	\(0.4471 \pm 0.07\)	\(0.6077 \pm 0.05\)
DDPGAgn	Limited	\(1.1843 \pm 0.18\)	\(0.6030 \pm 0.10\)	\(0.6207 \pm 0.02\)
DDPGAgn	Scalable	\(0.9728 \pm 0.15\)	\(0.7283 \pm 0.09\)	\(0.5526 \pm 0.03\)

Abstract

1 Introduction

1.1 Difficulties and Challenges in Context Management

1.2 Motivating Scenario – a Busy City Circle

1.3 Novel Contributions of this Paper

2 Related Work in Adaptive Context Caching

3 Background of Selective Context Caching

3.1 Background of Context Selection

3.2 Reinforcement Learning for Adaptive Context Selection

4 Adaptive Context Caching Algorithm

4.1 Actor-Critic Agent

4.2 Deep Deterministic Policy Gradient Agent

4.3 Context Cache States

4.4 Handling Uncertainty: Heuristics, and Context State Space Exploration

4.5 Context Cache Memory Organization

4.6 Eviction Heuristics

5 Implementation and Evaluation

5.1 Prototype and Experimental Setup

5.1.1 Implementing the Reinforcement Learning Agents.

5.1.2 Deployment and Hyper-parameter Setting.

5.1.3 Simulating the Busy Intersection.

5.1.4 Performance Measurement.

5.1.5 Benchmarks.

5.2 Results and Discussion

5.2.1 Actor-Critic Agent in the Short Term.

5.2.2 Deep Deterministic Policy Gradient Agent.

5.2.3 Overall Cost-efficiency.

5.2.4 Convergence and Long-term Efficiency.

5.2.5 Scalability of Our Approach.

6 Conclusion and Further Work

Footnote

References

Index Terms

Recommendations

Adaptive Context Caching for Efficient Distributed Context Management Systems

Estimating the lifetime of transient context for adaptive caching in IoT applications

Estimating the dynamic lifetime of transient context in near real-time for cost-efficient adaptive caching

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations