US20180276143A1 - Dynamic cache balancing - Google Patents
Dynamic cache balancing Download PDFInfo
- Publication number
- US20180276143A1 US20180276143A1 US15/990,816 US201815990816A US2018276143A1 US 20180276143 A1 US20180276143 A1 US 20180276143A1 US 201815990816 A US201815990816 A US 201815990816A US 2018276143 A1 US2018276143 A1 US 2018276143A1
- Authority
- US
- United States
- Prior art keywords
- cache
- partition
- normalized
- data elements
- performance metric
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/128—Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0808—Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/0284—Multiple user address space allocation, e.g. using different base addresses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/15—Use in a specific computing environment
- G06F2212/152—Virtualized environment, e.g. logically partitioned system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/15—Use in a specific computing environment
- G06F2212/154—Networked environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/50—Control mechanisms for virtual memory, cache or TLB
- G06F2212/502—Control mechanisms for virtual memory, cache or TLB using adaptive policy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/601—Reconfiguration of cache memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/62—Details of cache specific to multiprocessor cache arrangements
- G06F2212/621—Coherency control relating to peripheral accessing, e.g. from DMA or I/O device
-
- G06F2212/69—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/70—Details relating to dynamic memory management
Definitions
- This disclosure relates to distributed data storage, and more particularly to techniques for dynamic sizing of multiple cache partitions.
- VMs virtual machines
- Such VMs can be characterized as software-based computing “machines” implemented in a virtualization environment comprising various hardware resources (e.g., CPU, memory, etc.).
- the VMs can operate based at least in part on the computer architecture and/or functions (e.g., operating system) of a real or hypothetical computer.
- Multiple VMs can operate on one physical machine (e.g., on a physical computer), with each VM sharing the resources of that physical machine across multiple environments.
- Various VMs can run multiple operating systems and/or multiple applications on the physical machine. Flexibility for such sharing can be facilitated at least in part by a hypervisor, which hypervisor allocates hardware resources dynamically and transparently to the running VM.
- the high storage I/O (input/output or IO) demands of VMs has precipitated an increase in distributed storage systems that are implemented in virtualization environments.
- distributed storage systems can aggregate various physical storage facilities to create a logical storage pool where certain data may be efficiently distributed according to various metrics and/or objectives.
- Metadata describing the storage pool and/or its virtualized representations may be also distributed any number of times among various nodes in the distributed storage system.
- Many of these distributed storage systems implement various caching techniques to reduce the latency in accessing the foregoing data and/or metadata by the VMs and/or by certain components (e.g., storage I/O controllers) of the distributed storage systems.
- multi-tier caching systems can improve the overall latency of operations by storing frequently used data elements in storage areas that correspond to observed access patterns.
- a dynamic random access memory can be used to cache items that are stored on hard disk drives (HDDs) to facilitate low latency access.
- a given cache might be allocated across multiple tiers of storage facilities such as local (e.g., to a storage I/O controller) DRAM, local (e.g., to a node) solid-state drives (SSDs), cluster SSDs, local hard disk drives (HDDs), networked HDDs, and/or other storage tiers.
- a cache in a distributed storage system might be partitioned into multiple cache partitions for various purposes.
- legacy techniques for sizing multiple cache partitions in a distributed storage system can be limited at least in their ability to address the performance by and between each of the multiple cache partitions.
- the performance of a cache can be determined at least in part by the data access latency improvement resulting from the data being stored in the cache as compared to the access latency if the data were stored in a non-cache storage facility (e.g., networked HDD).
- Some legacy approaches for sizing a cache might merely facilitate allocating a fixed amount of memory for the cache and applying certain cache management techniques (e.g., least recently used or LRU, least frequently used or LFU, adaptive replacement cache or ARC, etc.) to determine whether or not to store and/or to retain a particular data element in the cache.
- the foregoing legacy techniques might allocate 1 GB of memory to a given cache and, when the cache is full, evict data elements from the cache based at least in part on a data element access time, access frequency, a combination of access time and access frequency, and/or other criteria.
- a cache is partitioned for two or more data element types, or partitioned over two or more memory types
- applying such legacy cache management techniques to the cache partitions might not achieve optimal overall performance by and between the cache partitions. This is because if a cache configuration/management policy is applied independently to the two or more partitions, then the relative importance of data items with respect to each individual cache partition to the overall system is not considered, which results in overall cache performance that may be less than optimal.
- the present disclosure provides a detailed description of techniques used in systems, methods, and in computer program products for dynamic sizing of multiple cache partitions, which techniques advance the relevant technologies to address technological issues with legacy approaches. More specifically, the present disclosure provides a detailed description of techniques used in systems, methods, and in computer program products for dynamic sizing of multiple cache partitions. Certain embodiments are directed to technological solutions for implementing a heuristic cache partition balancing technique to dynamically size multiple cache partitions based at least in part on cost calculations.
- the disclosed embodiments modify and improve over legacy approaches.
- the herein-disclosed techniques provide technical solutions that address the technical problems attendant to balancing the sizes of multiple cache partitions to improve overall caching performance.
- Such technical solutions can serve to reduce the demand for computer memory, reduce the demand for computer processing power, reduce network bandwidth use, and reduce the demand for inter-component communication.
- Some embodiments disclosed herein use techniques to improve the functioning of multiple systems within the disclosed environments, and some embodiments advance peripheral technical fields as well.
- use of the disclosed techniques and devices within the shown environments as depicted in the figures provide advances in the technical field of high-performance computing as well as advances in various technical fields related to data storage.
- FIG. 1A presents a dynamic cache sizing technique used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment.
- FIG. 1B presents a normalized cache performance comparison technique.
- FIG. 1C illustrates a dynamic cache balancing technique used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment.
- FIG. 2 presents an environment in which embodiments of the present disclosure can operate.
- FIG. 3A depicts examples of implementation techniques as used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment.
- FIG. 3B presents a dynamic cache partition balancing technique as implemented in systems that perform dynamic sizing of multiple cache partitions, according to some embodiments.
- FIG. 4A and FIG. 4B depict system components as arrangements of computing modules that are interconnected so as to implement certain of the herein-disclosed embodiments.
- FIG. 5A and FIG. 5B depict virtualized controller architectures comprising collections of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments.
- Some embodiments of the present disclosure address the problem of balancing the sizes of multiple cache partitions to improve overall caching performance and some embodiments are directed to approaches for implementing a heuristic cache partition balancing technique to dynamically size multiple cache partitions based at least in part on cost calculations.
- the accompanying figures and discussions herein present example environments, systems, methods, and computer program products for dynamic sizing of multiple cache partitions.
- one data element type in a first cache might have a high “miss cost” (e.g., the latency to bring a data element into the cache) as compared to another data element type in a second cache.
- miss cost e.g., the latency to bring a data element into the cache
- allocating an equal amount of the total cache memory to the aforementioned two caches might result in the performance of the high miss cost cache partition being less than its maximum performance if the sizes of the caches were adjusted using the herein-disclosed techniques.
- cache performance associated with caches having a priori apportioned (or fixed) might need to change frequently based on real time conditions. Strictly as one example, relative sizes of multiple caches can be adjusted with respect to ongoing changes in measurable characteristics (e.g., I/O activity, storage utilization, etc.).
- attributes pertaining to the structure and/or content of the caches and/or their partitions or sub-partitions can be collected to derive normalized cache performance metrics that can be analyzed to determine cache size adjustments.
- certain cache partition attributes e.g., then-current cache size, etc.
- data element attributes e.g., miss cost, timestamp, etc.
- CRA cache reallocation amount
- Such balancing can be implemented by adjusting the cache size of one or more of the cache partitions according to a cache reallocation amount derived from comparing the normalized cache performance metrics for the cache partitions.
- the normalized cache performance metric can be derived from a cache partition “tail” to represent the incremental value of adding more data elements to the cache partition.
- the caches size adjustments can be dynamically determined and/or applied in response to various triggers such as a change in the cache partition attributes or such as a change in the constitution and/or of access patterns pertaining to the data elements stored in the cache.
- At least one of A or B means at least one of A, or at least one of B, or at least one of both A and B. In other words, this phrase is disjunctive.
- the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or is clear from the context to be directed to a singular form.
- FIG. 1A presents a dynamic cache sizing technique 1 A 00 used in systems that implement dynamic sizing of multiple cache partitions.
- dynamic cache sizing technique 1 A 00 used in systems that implement dynamic sizing of multiple cache partitions.
- one or more variations of dynamic cache sizing technique 1 A 00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the dynamic cache sizing technique 1 A 00 or any aspect thereof may be implemented in any environment.
- the dynamic cache sizing technique 1 A 00 shown in FIG. 1A depicts a distributed storage system 104 comprising multiple storage controllers implemented as a set of virtualized controllers 126 .
- the virtualized controllers 126 can access a set of non-cache storage 175 comprising, for example, remote and/or networked storage facilities.
- virtualized controllers 126 can further manage a cache memory space 150 .
- the cache memory space 150 might comprise a portion of the dynamic random access memory (DRAM) and/or solid state drives (SSDs) local to virtualized controllers 126 to facilitate low latency access to certain data elements.
- DRAM dynamic random access memory
- SSDs solid state drives
- the cache memory space 150 can be partitioned into multiple cache partitions (e.g., cache partition 152 1 , . . . , cache partition 152 K ) to hold a respective type of data element.
- cache partition 152 1 e.g., cache partition 152 1 , . . . , cache partition 152 K
- the cache memory space 150 might be allocated to such cache partitions in equal portions, such as indicated by cache partition size 154 11 and cache partition size 154 K1 .
- the cache partitions might implement certain cache management techniques (e.g., LRU, LFU, ARC, etc.) to determine whether or not to store and/or retain a particular data element in the cache.
- cache partition 152 1 when a cache is partitioned for differing data element types, applying such cache sizing and/or management techniques to the cache partitions might not maximize the combined performance of the cache partitions.
- one data element type in cache partition 152 1 might have a high “miss cost” (e.g., the latency incurred to bring a data element into the cache) as compared to another data element type in cache partition 152 K .
- miss cost e.g., the latency incurred to bring a data element into the cache
- allocating an equal amount of the total cache memory to the cache partitions might result in the performance of the high miss cost cache partition being less than its maximum performance.
- the cache partition performance associated with cache partitions with fixed sizes can vary with respect to varying characteristics (e.g., I/O activity, storage utilization, etc.) and/or real time conditions of distributed storage system 104 .
- attributes e.g., A 11 , A 12 , . . . , A K1 , A K2 , . . .
- attributes e.g., A 11 , A 12 , . . . , A K1 , A K2 , . . .
- normalized cache performance metrics e.g., normalized cache performance metric 156 1 , . . . , normalized cache performance metric 156 K
- CRA cache reallocation amount
- the normalized cache performance metric for each cache partition can be derived from a cache partition “tail” (e.g., tail 153 1 , . . . , tail 153 K ) to represent the incremental value of adding more data elements to the cache partition.
- the normalized cache performance metrics can be compared using a formulaic expression to determine a cache reallocation amount 158 (e.g., where the cache reallocation amount is a size or quantity of memory in bytes or Kbytes, or pages, etc.).
- the resulting CRA can be used to balance the sizes of the cache partitions to achieve improved overall performance.
- the CRA might be removed from the cache partition 152 K and added to the cache partition 152 1 , resulting in updated instances of a cache partition size 154 K2 and a cache partition size 154 12 , respectively.
- the caches size adjustments can perform dynamic cache balancing 160 , where caches size adjustments are determined and/or applied in response to various triggers such as a change in the cache partition attributes or such as a change in the constitution of data elements stored in the cache.
- FIG. 1B Further details related to comparing the performance of various cache partitions is shown and described as pertaining to FIG. 1B .
- FIG. 1B presents a normalized cache performance comparison technique 1 B 00 .
- normalized cache performance comparison technique 1 B 00 may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the normalized cache performance comparison technique 1 B 00 or any aspect thereof may be implemented in any environment.
- FIG. 1B depicts the representative cache partitions (e.g., cache partition 152 1 , . . . , cache partition 152 K ) earlier shown and described as pertaining to FIG. 1A . Further shown are a set of data elements 122 1 comprising cache partition 152 1 and a set of data elements 122 K comprising cache partition 152 K .
- One or more of the earlier described cache management techniques e.g., LRU, LFU, ARC, etc.
- New data elements that are first accessed e.g., first access data element 124 1 , . . . , first access data element 124 K
- Certain data elements can also be evicted from the cache partition.
- evicted data element 126 1 and evicted data element 126 K might be evicted from cache partition 152 1 and cache partition 152 K , respectively, to make room for first access data element 124 1 and first access data element 124 K , respectively.
- the evicted data elements can be selected, for example, based on queue position (e.g., least hit elements are evicted).
- a measure of the performance of a cache system or individual cache partition can be derived from the incremental “value” of caching one more data element. Since the purpose of the cache partition is at least in part to reduce the data element access latency, such value can be based at least in part from the “cost” (e.g., access latency) of retrieving a data element not in cache. Such a cost can be referred to as a “miss cost”.
- the miss cost can vary for respective data element types. For example, a data element from cache partition 152 1 might have a miss cost 128 1 of 500 ⁇ S and a data element from cache partition 152 K might have a miss cost 128 K of 50 ⁇ S.
- Performance metrics normalized by such attributes as miss cost can be used to compare the performance of various cache partitions.
- cache partition 152 1 might have a normalized cache performance 132 11 and cache partition 152 K might have a normalized cache performance 132 K1 .
- the performance of the two cache partitions is unbalanced. Such imbalance can be due to earlier described miss cost differences, relative cache partition sizes, total cache queries, and/or other factors.
- the unbalanced cache partitions can result in an overall cache performance 134 1 (e.g., for the entire allocated cache memory space) that is not maximized.
- the herein disclosed techniques can address the foregoing issues attendant to balancing the sizes of the cache partitions to improve overall caching performance as shown and described as pertaining to FIG. 1C .
- FIG. 1C illustrates a dynamic cache balancing technique 1 C 00 used in systems that implement dynamic sizing of multiple cache partitions.
- dynamic cache balancing technique 1 C 00 may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the dynamic cache balancing technique 1 C 00 or any aspect thereof may be implemented in any environment.
- a “tail” of the cache partition can be analyzed to facilitate balancing the performance of various cache partitions in a cache memory space. In some cases, such balancing can further improve the overall cache performance.
- the cache partition tail can represent the least accessed or hit data elements in the cache partition when determining the value of adding more space to the cache partition. More specifically, by monitoring the hit rate to the tail data element or data elements in the cache partition, the cache hit rate increase precipitated by adding more data elements (e.g., by increasing the cache partition size) can be predicted.
- the cache partition tail might be defined as a percentage of some cache partition attribute such as a total number of cache partition elements, or a total number of cache partition data element hits. For example, the cache partition tail might be defined as the least hit 10% of the cache partition data elements.
- Miss cost can normalize for the latency in retrieving a data element not in cache. For example, in some cases, the time to retrieve a metadata element from non-cache storage can be long as compared to the time to retrieve an extent data element from non-cache storage. In other cases, one cache partition can be queried more frequently as compared to another cache partition. Such query frequency differences can be addressed with a normalization factor to adjust for total cache partition queries. Further, data element sizes can vary by and between cache partitions. The tail size (e.g., in MB) for each respective cache partition can normalize for such data element size differences. Other normalization factors are possible.
- the foregoing techniques and/or normalization factors can be combined in a linear formula that approximates an objective function for maximizing the performance of a set of cache partitions.
- the normalized cache performance metric 156 1 and the normalized cache performance metric 156 K for cache partition 152 1 and cache partition 152 K can be determined according to the following formula:
- the formula in EQ. 1 can be used to balance two cache partitions by equating a first instance of EQ. 1 with +CRA corresponding to a first cache partition with a second instance of EQ. 1 with ⁇ CRA corresponding to a second cache partition.
- a first instance of EQ. 1 with +CRA corresponding to a first cache partition
- a second instance of EQ. 1 with ⁇ CRA corresponding to a second cache partition.
- cache partition 152 1 can be increased by 10 MB and cache partition 152 K can be decreased by 10 MB to balance the performance of the cache partitions.
- such balanced performance can be represented by a normalized cache performance 132 12 and a normalized cache performance 132 K2 .
- the performance of the two cache partitions is balanced.
- the normalized cache performance 132 K2 of cache partition 152 K may be reduced as compared to the normalized cache performance 132 K1 shown in FIG. 1B
- the overall cache performance 134 2 has increased as compared to the overall cache performance 134 1 shown in FIG. 1B .
- the herein disclosed techniques can address the problems attendant to balancing the sizes of multiple cache partitions in distributed storage systems.
- One embodiment of an environment comprising such a distributed storage system is shown and described as pertains to FIG. 2 .
- FIG. 2 presents an environment 200 in which embodiments of the present disclosure can operate.
- environment 200 may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the environment 200 or any aspect thereof may be implemented in any environment.
- the environment 200 shows various components associated with one instance of a distributed storage system 104 that can be used to implement the herein disclosed techniques.
- the environment 200 can comprise multiple nodes (e.g., node 230 1 , . . . , node 230 M ) that have multiple tiers of storage in a storage pool 270 .
- each node can be associated with one server, multiple servers, or portions of a server.
- a group of such nodes can be called a cluster.
- the multiple tiers of storage can include storage that is accessible through the network 214 , such as a networked storage 275 (e.g., a storage area network or SAN, network attached storage or NAS, etc.).
- a networked storage 275 e.g., a storage area network or SAN, network attached storage or NAS, etc.
- the storage pool 270 can also comprise one or more instances of local storage (e.g., local storage 272 1 , . . . , local storage 272 M ) that is within or directly attached to a server and/or appliance associated with the nodes.
- local storage can include solid state drives (SSD 273 1 , . . . , SSD 273 M ), hard disk drives (HDD 274 1 , . . . , HDD 274 M ), and/or other storage devices.
- Each node can implement at least one instance of a virtualized controller (e.g., virtualized controller 226 1 , . . . , virtualized controller 226 M ) to facilitate access to the storage pool 270 by one or more user virtual machine or VMs (e.g., user VM 224 11 , . . . , user VM 224 1N , . . . , user VM 224 M1 , . . . , user VM 224 MN ) that run client software.
- the hardware of the node can be emulated for the user VMs by various hypervisors.
- hypervisors can be implemented using virtualization software (e.g., VMware ESXi, Microsoft Hyper-V, RedHat KVM, Nutanix AHV, etc.) that includes a hypervisor.
- virtualization software e.g., VMware ESXi, Microsoft Hyper-V, RedHat KVM, Nutanix AHV, etc.
- Multiple instances of such virtualized controllers can coordinate within a cluster to form the distributed storage system 104 which can, among other operations, manage the storage pool 270 .
- This architecture further facilitates efficient scaling of the distributed computing and/or storage platform (e.g., see scale 282 ).
- the foregoing virtualized controllers can be implemented in environment 200 using various techniques.
- containers e.g., Docker containers
- the user VMs can access the storage pool 270 by interfacing with a controller container through a hypervisor and/or the kernel of the node host operating system.
- an instance of a virtual machine at a given node can be used as a virtualized controller to manage storage and I/O activities.
- the user VMs at the node can interface with a controller virtual machine (e.g., controller VM) through a hypervisor to access the storage pool 270 .
- controller virtual machine e.g., controller VM
- the controller VMs are not formed as part of specific implementations of the hypervisors. Instead, the controller VMs can run as virtual machines above the hypervisors on the various servers. When the controller VMs run above the hypervisors, varying virtual machine architectures and/or hypervisors can operate with the distributed storage system 104 .
- a hypervisor at one node in the distributed storage system 104 might correspond to VMware ESXi software
- a hypervisor at another node in the distributed storage system 104 might correspond to Nutanix AHV software.
- cache memory spaces can be implemented in various storage facilities in the nodes associated with the distributed storage system 104 .
- the cache memory space might be allocated using a portion of memory (e.g., DRAM of the node) and/or a portion of the SSD memory, and/or a portion of the HDD memory.
- a cache can be formed from any portions of the foregoing components and/or take on physical characteristics (e.g., megabytes of DRAM) and/or can take on logical characteristics (e.g., single-touch entries, multi-touch entries, large entries, small entries, etc.).
- a cache can be formed using purely hardware components, or a cache can be formed or addressed using software components or any combinations of hardware and software components.
- a cache might be composed of some DRAM, and some SDD and some HDD space (or combinations thereof), or a cache might be composed of a first data element type entry (e.g., comprising smaller entries) and a second data element type entry (e.g., comprising larger entries).
- Any cache of any type can host two or more partitions that can be bounded by hardware boundaries and/or bounded by software boundaries, or both.
- a read request (e.g., query) for a first access data element might place the data element for access by the virtualized controller (e.g., in a single-touch pool) where it can be managed by various cached management techniques (e.g., LRU) until it is evicted.
- LRU cached management techniques
- the data element might be moved to SSD memory (e.g., in a multi-touch pool).
- the various pools of cache might be managed in part by separate management algorithms (e.g., independent LRU counters). Multiple cache partitions comprising multiple pools are possible.
- one or more instances of a cache memory manager can be implemented in the distributed storage system 104 to facilitate the herein disclosed techniques.
- an instance of the cache memory manager 260 1 can be implemented in the virtualized controller 226 1
- another instance of the cache memory manager 260 M can be implemented in the virtualized controller 226 M .
- Such instances of the cache memory manager can be implemented in any node in any cluster. Further details regarding one implementation of the cache memory manager for facilitating the herein disclosed techniques is shown and described as pertaining to FIG. 3A .
- FIG. 3A depicts examples of implementation techniques 3 A 00 as used in systems that implement dynamic sizing of multiple cache partitions.
- implementation techniques 3 A 00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the implementation techniques 3 A 00 or any aspect thereof may be implemented in any environment.
- the cache memory manager 260 1 earlier described can interact with various components in a distributed storage platform to implement the herein disclosed techniques.
- the cache memory manager 260 1 might comprise a cache monitor 362 to receive various attributes pertaining to a certain cache memory space (e.g., cache memory space 150 ).
- Such attributes can comprise a set of data element attributes 352 and/or a set of cache partition attributes 354 .
- the data element attributes 352 can comprise certain attributes corresponding to the data elements in a given cache partition represented by a partition identifier or partitionID such as a data element identifier or elementID, a data element type or elementType (e.g., metadata, extent data, etc.), a miss cost or missCost associated with the data element, a timestamp or timeStamp corresponding to the last access time of the data element at the head of the queue, and/or other attributes.
- a partition identifier or partitionID such as a data element identifier or elementID, a data element type or elementType (e.g., metadata, extent data, etc.), a miss cost or missCost associated with the data element, a timestamp or timeStamp corresponding to the last access time of the data element at the head of the queue, and/or other attributes.
- the cache partition attributes 354 can comprise certain attributes corresponding to a given cache partition represented by a partition identifier or partitionID such as a cache partition size or partitionSize, a total number of cache partition queries or totalQueries, a total number of cache partition data elements or totalElements, a tail size or tailSize, and/or other attributes.
- the foregoing attributes can be stored in a tabular structure with the shown attribute representing the table columns and the each of the respective items (e.g., data elements, cache partitions) representing the table rows.
- the attributes can be stored as key-value pairs in computer programming objects.
- a cache balancing engine 364 in the cache memory manager 260 1 can receive one or more of the data element attributes 352 and/or the cache partition attributes 354 to generate a set of cache partition tail attributes 356 .
- the cache partition tail attributes 356 might described the tail 153 1 of cache partition 152 1 and/or the tail 153 K of cache partition 152 K .
- the cache partition tail attributes 356 generated by the cache balancing engine 364 can comprise certain attributes corresponding to the tail of a given cache partition represented by a partition identifier or partitionID such as a tail size or tailSize, a number of hits on data elements in the tail or tailHits, and/or other attributes.
- the timeStamp attribute of the data element attributes 352 can be used to efficiently determine one or more of the cache partition tail attributes 356 .
- Cache balancing engine 364 can further use the foregoing attributes and/or other information to determine the normalized cache performance metrics 156 corresponding to the cache partition analyzed, which normalized cache performance metrics can be used to determine a cache reallocation amount 158 for one or more cache partitions according to the herein disclosed techniques.
- the cache balancing engine 364 and/or other components of the cache memory manager 260 1 can access a set of cache sizing rules 366 for various purposes.
- cache sizing rules 366 might indicate certain thresholds that, when breached, can trigger execution of one or more cache balancing operations.
- continual monitoring of the cache memory space 150 by cache monitor 362 can facilitate a dynamic sizing of the cache partitions when, for example, certain data element and/or cache partition attributes breach their respective thresholds (operation 372 ).
- the thresholds can be set such that cache size adjustments do not occur too frequently (e.g., providing a hysteresis effect).
- the cache sizing rules 366 might also indicate certain constraints imposed by the underlying system (e.g., distributed storage system) and/or by a user 302 (e.g., system administrator). For example, a total cache memory size might be allocated in the cache sizing rules 366 based on system capabilities and/or user specification.
- a technique for monitoring cache partitions to perform dynamic cache balancing is shown and described as pertaining to FIG. 3B .
- FIG. 3B presents a dynamic cache partition balancing technique 3 B 00 as implemented in systems that perform dynamic sizing of multiple cache partitions.
- dynamic cache partition balancing technique 3 B 00 may be implemented in the context of the architecture and functionality of the embodiments described herein.
- the dynamic cache partition balancing technique 3 B 00 or any aspect thereof may be implemented in any environment.
- the dynamic cache partition balancing technique 3 B 00 presents one embodiment of certain steps and/or operations for facilitating dynamic sizing of multiple cache partitions according to the herein disclosed techniques.
- the steps and underlying operations comprising the dynamic cache partition balancing technique 3 B 00 can be executed by an instance of the cache memory manager 260 1 as shown and described in FIG. 3A and herein.
- the dynamic cache partition balancing technique 3 B 00 can perform a set of operations for each cache partition managed (e.g., at a node in a distributed storage system) commencing with collecting cache partition attributes and/or data element attributes pertaining to a given cache partition (at step 332 ). Using the collected attributes and/or other information, various cache partition tail attributes can be determined (at step 334 ). A normalized cache performance metric can then be generated for the cache partition (at step 336 ). When the foregoing attributes and/or metrics have been collected and/or generated for each cache partition, a determination can be made whether to perform any cache balancing (at decision 338 ).
- the cache partitions can be compared using the normalized cache performance metrics earlier determined (at step 340 ).
- Various approaches can be implemented to compare more than two cache partitions. For example, the cache partitions having the minimum and maximum normalized cache performance metrics might be first compared. The next pair of cache partitions having the broadest spread between respective normalized cache performance metrics might then be compared. This process can be iterated until all have been compared and/or a spread threshold has been reached.
- Certain linearization techniques can also be implemented to determine an equilibrium point (e.g., balanced performance) for the functions representing the normalized performance of the cache partitions.
- a set of cache reallocation amounts determined from such comparison techniques can be determined (at step 342 ). The cache reallocation amounts can be added to and/or subtracted from the appropriate cache partitions to balance the partitions (at step 344 ).
- the monitoring of the cache partitions can continue (operation 382 ).
- FIG. 4A depicts a system 4 A 00 as an arrangement of computing modules that are interconnected so as to operate cooperatively to implement certain of the herein-disclosed embodiments.
- the partitioning of system 4 A 00 is merely illustrative and other partitions are possible.
- the system 4 A 00 may be implemented in the context of the architecture and functionality of the embodiments described herein. Of course, however, the system 4 A 00 or any operation therein may be carried out in any desired environment.
- the system 4 A 00 comprises at least one processor and at least one memory, the memory serving to store program instructions corresponding to the operations of the system. As shown, an operation can be implemented in whole or in part using program instructions accessible by a module.
- the modules are connected to a communication path 4 A 05 , and any operation can communicate with other operations over communication path 4 A 05 .
- the modules of the system can, individually or in combination, perform method operations within system 4 A 00 . Any operations performed within system 4 A 00 may be performed in any order unless as may be specified in the claims.
- the shown embodiment implements a portion of a computer system, presented as system 4 A 00 , comprising a computer processor to execute a set of program code instructions (module 4 A 10 ) and modules for accessing memory to hold program code instructions to perform: defining a cache having two or more cache partitions, and at least two of the cache partitions having respective one or more cache partition attributes characterizing respective sizes of the cache partitions (module 4 A 20 ); receiving one or more data element attributes corresponding to one or more data elements that are stored in at least one of the cache partitions (module 4 A 30 ); generating at least one normalized cache performance metric for a respective at least one of the cache partitions, wherein the normalized cache performance metric is based at least in part on a combination of the cache partition attributes and the data element attributes, and (module 4 A 40 ); using the normalized cache performance metric to compare the cache partitions (module 4 A 50 ); determining at least one cache reallocation amount by equating a first normalized cache performance metric with a
- Variations of the foregoing may include more or fewer of the shown modules and variations may perform more or fewer (or different) steps, and/or may use data elements in more, or in fewer (or different) operations.
- some embodiments include:
- FIG. 4B depicts a system 4 B 00 as an arrangement of computing modules that are interconnected so as to operate cooperatively to implement certain of the herein-disclosed embodiments.
- the partitioning of system 4 B 00 is merely illustrative and other partitions are possible.
- the system 4 B 00 may be implemented in the context of the architecture and functionality of the embodiments described herein. Of course, however, the system 4 B 00 or any operation therein may be carried out in any desired environment.
- the system 4 B 00 comprises at least one processor and at least one memory, the memory serving to store program instructions corresponding to the operations of the system. As shown, an operation can be implemented in whole or in part using program instructions accessible by a module.
- the modules are connected to a communication path 4 B 05 , and any operation can communicate with other operations over communication path 4 B 05 .
- the modules of the system can, individually or in combination, perform method operations within system 4 B 00 . Any operations performed within system 4 B 00 may be performed in any order unless as may be specified in the claims.
- the shown embodiment implements a portion of a computer system, presented as system 4 B 00 , comprising a computer processor to execute a set of program code instructions (module 4 B 10 ) and modules for accessing memory to hold program code instructions to perform: defining a first cache having a first cache size and a second cache having a second cache size (module 4 B 20 ); receiving one or more data element attributes corresponding to one or more data elements that are stored in at least one of the caches (module 4 B 30 ); defining a first tail portion and a first head portion of the first cache and defining a second tail portion and a second head portion of the second cache wherein incoming data elements are initially stored in a respective head portion and wherein evicted data elements are evicted from a respective tail portion (module 4 B 40 ); generating a first normalized cache performance metric for the first cache wherein the first normalized cache performance metric is based at least in part on a predicted miss cost to be incurred after evicting one or more of
- FIG. 5A depicts a virtualized controller as implemented by the shown virtual machine architecture 5 A 00 .
- the virtual machine architecture comprises a collection of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments.
- the shown virtual machine architecture 5 A 00 includes a virtual machine instance in a configuration 501 that is further described as pertaining to the controller virtual machine instance 530 .
- a controller virtual machine instance receives block I/O (input/output or IO) storage requests as network file system (NFS) requests in the form of NFS requests 502 , and/or internet small computer storage interface (iSCSI) block IO requests in the form of iSCSI requests 503 , and/or Samba file system (SMB) requests in the form of SMB requests 504 .
- the controller virtual machine (CVM) instance publishes and responds to an internet protocol (IP) address (e.g., see CVM IP address 510 ).
- IP internet protocol
- I/O or IO can be handled by one or more IO control handler functions (see IOCTL functions 508 ) that interface to other functions such as data IO manager functions 514 and/or metadata manager functions 522 .
- the data IO manager functions can include communication with a virtual disk configuration manager 512 and/or can include direct or indirect communication with any of various block IO functions (e.g., NFS IO, iSCSI IO, SMB IO, etc.).
- the configuration 501 supports IO of any form (e.g., block IO, streaming IO, packet-based IO, HTTP traffic, etc.) through either or both of a user interface (UI) handler such as UI IO handler 540 and/or through any of a range of application programming interfaces (APIs), possibly through the shown API IO manager 545 .
- UI user interface
- APIs application programming interfaces
- the communications link 515 can be configured to transmit (e.g., send, receive, signal, etc.) any types of communications packets comprising any organization of data elements.
- the data elements can comprise a payload data, a destination address (e.g., a destination IP address) and a source address (e.g., a source IP address), and can include various packet processing techniques (e.g., tunneling), encodings (e.g., encryption), and/or formatting of bit fields into fixed-length blocks or into variable length fields used to populate the payload.
- packet characteristics include a version identifier, a packet or payload length, a traffic class, a flow label, etc.
- the payload comprises a data structure that is encoded and/or formatted to fit into byte or word boundaries of the packet.
- hard-wired circuitry may be used in place of or in combination with software instructions to implement aspects of the disclosure.
- embodiments of the disclosure are not limited to any specific combination of hardware circuitry and/or software.
- the term “logic” shall mean any combination of software or hardware that is used to implement all or part of the disclosure.
- Non-volatile media includes any non-volatile storage medium, for example, solid-state storage devices (SSDs) or optical or magnetic disks such as disk drives or tape drives.
- Volatile media includes dynamic memory such as a random access memory.
- the controller virtual machine instance 530 includes a content cache manager facility 516 that accesses storage locations, possibly including local dynamic random access memory (DRAM) (e.g., through the local memory device access block 518 ) and/or possibly including accesses to local solid-state storage (e.g., through local SSD device access block 520 ).
- DRAM dynamic random access memory
- Computer readable media includes any non-transitory computer readable medium, for example, floppy disk, flexible disk, hard disk, magnetic tape, or any other magnetic medium; CD-ROM or any other optical medium; punch cards, paper tape, or any other physical medium with patterns of holes; or any RAM, PROM, EPROM, FLASH-EPROM, or any other memory chip or cartridge.
- Any data can be stored, for example, in any form of external data repository 531 , which in turn can be formatted into any one or more storage areas, and which can comprise parameterized storage accessible by a key (e.g., a filename, a table name, a block address, an offset address, etc.).
- An external data repository 531 can store any forms of data, and may comprise a storage area dedicated to storage of metadata pertaining to the stored forms of data.
- metadata can be divided into portions. Such portions and/or cache copies can be stored in the external storage data repository and/or in a local storage area (e.g., in local DRAM areas and/or in local SSD areas). Such local storage can be accessed using functions provided by a local metadata storage access block 524 .
- the external data repository 531 can be configured using a CVM virtual disk controller 526 , which can in turn manage any number or any configuration of virtual disks.
- Execution of the sequences of instructions to practice certain embodiments of the disclosure are performed by a one or more instances of a processing element such as a data processor, or such as a central processing unit (e.g., CPU 1 , CPU 2 ).
- a processing element such as a data processor, or such as a central processing unit (e.g., CPU 1 , CPU 2 ).
- a central processing unit e.g., CPU 1 , CPU 2
- two or more instances of a configuration 501 can be coupled by a communications link 515 (e.g., backplane, LAN, PTSN, wired or wireless network, etc.) and each instance may perform respective portions of sequences of instructions as may be required to practice embodiments of the disclosure.
- the shown computing platform 506 is interconnected to the Internet 548 through one or more network interface ports (e.g., network interface port 523 1 and network interface port 523 2 ).
- the configuration 501 can be addressed through one or more network interface ports using an IP address.
- Any operational element within computing platform 506 can perform sending and receiving operations using any of a range of network protocols, possibly including network protocols that send and receive packets (e.g., see network protocol packet 521 1 and network protocol packet 521 2 ).
- the computing platform 506 may transmit and receive messages that can be composed of configuration data, and/or any other forms of data and/or instructions organized into a data structure (e.g., communications packets).
- the data structure includes program code instructions (e.g., application code) communicated through Internet 548 and/or through any one or more instances of communications link 515 .
- Received program code may be processed and/or executed by a CPU as it is received and/or program code may be stored in any volatile or non-volatile storage for later execution.
- Program code can be transmitted via an upload (e.g., an upload from an access device over the Internet 548 to computing platform 506 ). Further, program code and/or results of executing program code can be delivered to a particular user via a download (e.g., a download from the computing platform 506 over the Internet 548 to an access device).
- the configuration 501 is merely one sample configuration.
- Other configurations or partitions can include further data processors, and/or multiple communications interfaces, and/or multiple storage devices, etc. within a partition.
- a partition can bound a multi-core processor (e.g., possibly including embedded or co-located memory), or a partition can bound a computing cluster having plurality of computing elements, any of which computing elements are connected directly or indirectly to a communications link.
- a first partition can be configured to communicate to a second partition.
- a particular first partition and particular second partition can be congruent (e.g., in a processing element array) or can be different (e.g., comprising disjoint sets of components).
- a module as used herein can be implemented using any mix of any portions of the system memory and any extent of hard-wired circuitry including hard-wired circuitry embodied as a data processor. Some embodiments include one or more special-purpose hardware components (e.g., power control, logic, sensors, transducers, etc.).
- a module may include one or more state machines and/or combinational logic used to implement or facilitate the operational and/or other characteristics of performing dynamic sizing of multiple cache partitions.
- Various implementations of the data repository comprise storage media organized to hold a series of records or files such that individual records or files are accessed using a name or key (e.g., a primary key or a combination of keys and/or query clauses).
- Such files or records can be organized into one or more data structures (e.g., data structures used to implement or facilitate aspects of dynamic sizing of multiple cache partitions).
- Such files or records can be brought into and/or stored in volatile or non-volatile memory.
- FIG. 5B depicts a virtualized controller implemented by a containerized architecture 5 B 00 .
- the containerized architecture comprises a collection of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments.
- the shown containerized architecture 5 B 00 includes a container instance in a configuration 551 that is further described as pertaining to the container instance 550 .
- the configuration 551 includes an operating system layer (as shown) that performs addressing functions such as providing access to external requestors via an IP address (e.g., “P.Q.R.S”, as shown).
- Providing access to external requestors can include implementing all or portions of a protocol specification (e.g., “http:”) and possibly handling port-specific functions.
- a protocol specification e.g., “http:”
- the operating system layer can perform port forwarding to any container (e.g., container instance 550 ).
- a container instance can be executed by a processor.
- Runnable portions of a container instance sometimes derive from a container image, which in turn might include all, or portions of any of, a Java archive repository (JAR) and/or its contents, a script or scripts and/or a directory of scripts, a virtual machine configuration, and may include any dependencies therefrom.
- a configuration within a container might include an image comprising a minimum set of runnable code.
- start-up time for a container instance can be much faster than start-up time for a virtual machine instance, at least inasmuch as the container image might be much smaller than a respective virtual machine instance.
- start-up time for a container instance can be much faster than start-up time for a virtual machine instance, at least inasmuch as the container image might have many fewer code and/or data initialization steps to perform than a respective virtual machine instance.
- a container instance can serve as an instance of an application container. Any container of any sort can be rooted in a directory system, and can be configured to be accessed by file system commands (e.g., “1s” or “1s-a”, etc.).
- the container might optionally include operating system components 578 , however such a separate set of operating system components need not be provided.
- a container can include a runnable instance 558 , which is built (e.g., through compilation and linking, or just-in-time compilation, etc.) to include all of the library and OS-like functions needed for execution of the runnable instance.
- a runnable instance can be built with a virtual disk configuration manager, any of a variety of data IO management functions, etc.
- a runnable instance includes code for, and access to, a container virtual disk controller 576 .
- a container virtual disk controller can perform any of the functions that the aforementioned CVM virtual disk controller 526 can perform, yet such a container virtual disk controller does not rely on a hypervisor or any particular operating system so as to perform its range of functions.
- multiple containers can be collocated and/or can share one or more contexts.
- multiple containers that share access to a virtual disk can be assembled into a pod (e.g., a Kubernetes pod).
- Pods provide sharing mechanisms (e.g., when multiple containers are amalgamated into the scope of a pod) as well as isolation mechanisms (e.g., such that the namespace scope of one pod does not share the namespace scope of another pod).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- This present application is a continuation of U.S. patent application Ser. No. 15/214,264 filed on Jul. 19, 2016, issued on May 29, 2018 as U.S. Pat. No. 9,984,004 and entitled “DYNAMIC CACHE BALANCING”, which is hereby incorporated by reference in its entirety.
- This disclosure relates to distributed data storage, and more particularly to techniques for dynamic sizing of multiple cache partitions.
- The use of virtual machines (VMs) to improve the utilization of computing resources continues to increase. Such VMs can be characterized as software-based computing “machines” implemented in a virtualization environment comprising various hardware resources (e.g., CPU, memory, etc.). The VMs can operate based at least in part on the computer architecture and/or functions (e.g., operating system) of a real or hypothetical computer. Multiple VMs can operate on one physical machine (e.g., on a physical computer), with each VM sharing the resources of that physical machine across multiple environments. Various VMs can run multiple operating systems and/or multiple applications on the physical machine. Flexibility for such sharing can be facilitated at least in part by a hypervisor, which hypervisor allocates hardware resources dynamically and transparently to the running VM.
- The high storage I/O (input/output or IO) demands of VMs has precipitated an increase in distributed storage systems that are implemented in virtualization environments. Specifically, such distributed storage systems can aggregate various physical storage facilities to create a logical storage pool where certain data may be efficiently distributed according to various metrics and/or objectives. Metadata describing the storage pool and/or its virtualized representations may be also distributed any number of times among various nodes in the distributed storage system. Many of these distributed storage systems implement various caching techniques to reduce the latency in accessing the foregoing data and/or metadata by the VMs and/or by certain components (e.g., storage I/O controllers) of the distributed storage systems. Specifically, multi-tier caching systems can improve the overall latency of operations by storing frequently used data elements in storage areas that correspond to observed access patterns. As an example, a dynamic random access memory (DRAM) can be used to cache items that are stored on hard disk drives (HDDs) to facilitate low latency access. In some cases, a given cache might be allocated across multiple tiers of storage facilities such as local (e.g., to a storage I/O controller) DRAM, local (e.g., to a node) solid-state drives (SSDs), cluster SSDs, local hard disk drives (HDDs), networked HDDs, and/or other storage tiers. In other cases, a cache in a distributed storage system might be partitioned into multiple cache partitions for various purposes.
- Unfortunately, legacy techniques for sizing multiple cache partitions in a distributed storage system can be limited at least in their ability to address the performance by and between each of the multiple cache partitions. Specifically, the performance of a cache can be determined at least in part by the data access latency improvement resulting from the data being stored in the cache as compared to the access latency if the data were stored in a non-cache storage facility (e.g., networked HDD). Some legacy approaches for sizing a cache might merely facilitate allocating a fixed amount of memory for the cache and applying certain cache management techniques (e.g., least recently used or LRU, least frequently used or LFU, adaptive replacement cache or ARC, etc.) to determine whether or not to store and/or to retain a particular data element in the cache.
- For example, the foregoing legacy techniques might allocate 1 GB of memory to a given cache and, when the cache is full, evict data elements from the cache based at least in part on a data element access time, access frequency, a combination of access time and access frequency, and/or other criteria. However, when a cache is partitioned for two or more data element types, or partitioned over two or more memory types, applying such legacy cache management techniques to the cache partitions might not achieve optimal overall performance by and between the cache partitions. This is because if a cache configuration/management policy is applied independently to the two or more partitions, then the relative importance of data items with respect to each individual cache partition to the overall system is not considered, which results in overall cache performance that may be less than optimal.
- What is needed is a technique or techniques to improve over legacy and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
- The present disclosure provides a detailed description of techniques used in systems, methods, and in computer program products for dynamic sizing of multiple cache partitions, which techniques advance the relevant technologies to address technological issues with legacy approaches. More specifically, the present disclosure provides a detailed description of techniques used in systems, methods, and in computer program products for dynamic sizing of multiple cache partitions. Certain embodiments are directed to technological solutions for implementing a heuristic cache partition balancing technique to dynamically size multiple cache partitions based at least in part on cost calculations.
- The disclosed embodiments modify and improve over legacy approaches. In particular, the herein-disclosed techniques provide technical solutions that address the technical problems attendant to balancing the sizes of multiple cache partitions to improve overall caching performance. Such technical solutions can serve to reduce the demand for computer memory, reduce the demand for computer processing power, reduce network bandwidth use, and reduce the demand for inter-component communication. Some embodiments disclosed herein use techniques to improve the functioning of multiple systems within the disclosed environments, and some embodiments advance peripheral technical fields as well. As one specific example, use of the disclosed techniques and devices within the shown environments as depicted in the figures provide advances in the technical field of high-performance computing as well as advances in various technical fields related to data storage.
- Further details of aspects, objectives, and advantages of the technological embodiments are described herein and in the drawings and claims.
- The drawings described below are for illustration purposes only. The drawings are not intended to limit the scope of the present disclosure.
-
FIG. 1A presents a dynamic cache sizing technique used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment. -
FIG. 1B presents a normalized cache performance comparison technique. -
FIG. 1C illustrates a dynamic cache balancing technique used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment. -
FIG. 2 presents an environment in which embodiments of the present disclosure can operate. -
FIG. 3A depicts examples of implementation techniques as used in systems that implement dynamic sizing of multiple cache partitions, according to an embodiment. -
FIG. 3B presents a dynamic cache partition balancing technique as implemented in systems that perform dynamic sizing of multiple cache partitions, according to some embodiments. -
FIG. 4A andFIG. 4B depict system components as arrangements of computing modules that are interconnected so as to implement certain of the herein-disclosed embodiments. -
FIG. 5A andFIG. 5B depict virtualized controller architectures comprising collections of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments. - Some embodiments of the present disclosure address the problem of balancing the sizes of multiple cache partitions to improve overall caching performance and some embodiments are directed to approaches for implementing a heuristic cache partition balancing technique to dynamically size multiple cache partitions based at least in part on cost calculations. The accompanying figures and discussions herein present example environments, systems, methods, and computer program products for dynamic sizing of multiple cache partitions.
- In situations involving multiple caches, one data element type in a first cache might have a high “miss cost” (e.g., the latency to bring a data element into the cache) as compared to another data element type in a second cache. In such cases, allocating an equal amount of the total cache memory to the aforementioned two caches might result in the performance of the high miss cost cache partition being less than its maximum performance if the sizes of the caches were adjusted using the herein-disclosed techniques. Furthermore, cache performance associated with caches having a priori apportioned (or fixed) might need to change frequently based on real time conditions. Strictly as one example, relative sizes of multiple caches can be adjusted with respect to ongoing changes in measurable characteristics (e.g., I/O activity, storage utilization, etc.).
- Disclosed herein are techniques for implementing a normalized balancing technique to dynamically size multiple caches in a distributed storage system, while still observing constraints pertaining to a finite amount of memory being allocated to the multiple caches. In various embodiments, attributes pertaining to the structure and/or content of the caches and/or their partitions or sub-partitions can be collected to derive normalized cache performance metrics that can be analyzed to determine cache size adjustments. For example, in some embodiments, certain cache partition attributes (e.g., then-current cache size, etc.) and/or data element attributes (e.g., miss cost, timestamp, etc.) might be collected and combined with a variable representing a cache reallocation amount (CRA) to determine a normalized cache performance metric for each of the cache partitions, and then further, to balance the sizes of the cache partitions to achieve improved overall performance. Such balancing can be implemented by adjusting the cache size of one or more of the cache partitions according to a cache reallocation amount derived from comparing the normalized cache performance metrics for the cache partitions. In certain embodiments, the normalized cache performance metric can be derived from a cache partition “tail” to represent the incremental value of adding more data elements to the cache partition. In some embodiments, the caches size adjustments can be dynamically determined and/or applied in response to various triggers such as a change in the cache partition attributes or such as a change in the constitution and/or of access patterns pertaining to the data elements stored in the cache.
- Various embodiments are described herein with reference to the figures. It should be noted that the figures are not necessarily drawn to scale and that elements of similar structures or functions are sometimes represented by like reference characters throughout the figures. It should also be noted that the figures are only intended to facilitate the description of the disclosed embodiments—they are not representative of an exhaustive treatment of all possible embodiments, and they are not intended to impute any limitation as to the scope of the claims. In addition, an illustrated embodiment need not portray all aspects or advantages of usage in any particular environment.
- An aspect or an advantage described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced in any other embodiments even if not so illustrated. Also, references throughout this specification to “some embodiments” or “other embodiments” refers to a particular feature, structure, material or characteristic described in connection with the embodiments as being included in at least one embodiment. Thus, the appearance of the phrases “in some embodiments” or “in other embodiments” in various places throughout this specification are not necessarily referring to the same embodiment or embodiments.
- Some of the terms used in this description are defined below for easy reference. The presented terms and their respective definitions are not rigidly restricted to these definitions—a term may be further defined by the term's use within this disclosure. The term “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion. As used in this application and the appended claims, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or is clear from the context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A, X employs B, or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. As used herein, at least one of A or B means at least one of A, or at least one of B, or at least one of both A and B. In other words, this phrase is disjunctive. The articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or is clear from the context to be directed to a singular form.
- Reference is now made in detail to certain embodiments. The disclosed embodiments are not intended to be limiting of the claims.
-
FIG. 1A presents a dynamic cache sizing technique 1A00 used in systems that implement dynamic sizing of multiple cache partitions. As an option, one or more variations of dynamic cache sizing technique 1A00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. The dynamic cache sizing technique 1A00 or any aspect thereof may be implemented in any environment. - The dynamic cache sizing technique 1A00 shown in
FIG. 1A depicts a distributedstorage system 104 comprising multiple storage controllers implemented as a set ofvirtualized controllers 126. Thevirtualized controllers 126 can access a set ofnon-cache storage 175 comprising, for example, remote and/or networked storage facilities. As shown,virtualized controllers 126 can further manage acache memory space 150. For example, thecache memory space 150 might comprise a portion of the dynamic random access memory (DRAM) and/or solid state drives (SSDs) local tovirtualized controllers 126 to facilitate low latency access to certain data elements. - In some cases, the
cache memory space 150 can be partitioned into multiple cache partitions (e.g., cache partition 152 1, . . . , cache partition 152 K) to hold a respective type of data element. For example, one cache partition might be used to manage extent data while another cache partition might be used to manage metadata. In some cases, thecache memory space 150 might be allocated to such cache partitions in equal portions, such as indicated by cache partition size 154 11 and cache partition size 154 K1. Further, the cache partitions might implement certain cache management techniques (e.g., LRU, LFU, ARC, etc.) to determine whether or not to store and/or retain a particular data element in the cache. - However, when a cache is partitioned for differing data element types, applying such cache sizing and/or management techniques to the cache partitions might not maximize the combined performance of the cache partitions. For example, one data element type in cache partition 152 1 might have a high “miss cost” (e.g., the latency incurred to bring a data element into the cache) as compared to another data element type in cache partition 152 K. In such cases, allocating an equal amount of the total cache memory to the cache partitions might result in the performance of the high miss cost cache partition being less than its maximum performance. Further, the cache partition performance associated with cache partitions with fixed sizes can vary with respect to varying characteristics (e.g., I/O activity, storage utilization, etc.) and/or real time conditions of distributed
storage system 104. - The herein disclosed techniques can address the foregoing issues attendant to balancing the sizes of multiple cache partitions to improve overall caching performance. Specifically, attributes (e.g., A11, A12, . . . , AK1, AK2, . . . ) pertaining to the structure and/or content of the cache partitions can be collected and combined with a variable representing a cache reallocation amount (CRA) to derive normalized cache performance metrics (e.g., normalized
cache performance metric 156 1, . . . , normalized cache performance metric 156 K) that can be analyzed to determine cache size adjustments. - More specifically, as shown, the normalized cache performance metric for each cache partition can be derived from a cache partition “tail” (e.g., tail 153 1, . . . , tail 153 K) to represent the incremental value of adding more data elements to the cache partition. The normalized cache performance metrics can be compared using a formulaic expression to determine a cache reallocation amount 158 (e.g., where the cache reallocation amount is a size or quantity of memory in bytes or Kbytes, or pages, etc.). The resulting CRA can be used to balance the sizes of the cache partitions to achieve improved overall performance. For example, as shown, the CRA might be removed from the cache partition 152 K and added to the cache partition 152 1, resulting in updated instances of a cache partition size 154 K2 and a cache partition size 154 12, respectively. In some embodiments, the caches size adjustments can perform dynamic cache balancing 160, where caches size adjustments are determined and/or applied in response to various triggers such as a change in the cache partition attributes or such as a change in the constitution of data elements stored in the cache.
- Further details related to comparing the performance of various cache partitions is shown and described as pertaining to
FIG. 1B . -
FIG. 1B presents a normalized cache performance comparison technique 1B00. As an option, one or more variations of normalized cache performance comparison technique 1B00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. The normalized cache performance comparison technique 1B00 or any aspect thereof may be implemented in any environment. -
FIG. 1B depicts the representative cache partitions (e.g., cache partition 152 1, . . . , cache partition 152 K) earlier shown and described as pertaining toFIG. 1A . Further shown are a set of data elements 122 1 comprising cache partition 152 1 and a set of data elements 122 K comprising cache partition 152 K. One or more of the earlier described cache management techniques (e.g., LRU, LFU, ARC, etc.) might logically arrange the data elements in the cache partition according to a data element queue. For example, the data elements at the top of the queue might correspond to those data elements accessed or “hit” most often, and the data elements at the bottom of the queue might correspond to data elements accessed or “hit” least often. New data elements that are first accessed (e.g., first access data element 124 1, . . . , first access data element 124 K) can enter into the top of the queue fromnon-cache storage 175. - Certain data elements can also be evicted from the cache partition. For example, evicted
data element 126 1 and evicteddata element 126 K might be evicted from cache partition 152 1 and cache partition 152 K, respectively, to make room for first access data element 124 1 and first access data element 124 K, respectively. The evicted data elements can be selected, for example, based on queue position (e.g., least hit elements are evicted). - In one approach, a measure of the performance of a cache system or individual cache partition can be derived from the incremental “value” of caching one more data element. Since the purpose of the cache partition is at least in part to reduce the data element access latency, such value can be based at least in part from the “cost” (e.g., access latency) of retrieving a data element not in cache. Such a cost can be referred to as a “miss cost”. The miss cost can vary for respective data element types. For example, a data element from cache partition 152 1 might have a miss cost 128 1 of 500 μS and a data element from cache partition 152 K might have a miss cost 128 K of 50 μS.
- Performance metrics normalized by such attributes as miss cost can be used to compare the performance of various cache partitions. For example, as shown in
FIG. 1B , cache partition 152 1 might have a normalized cache performance 132 11 and cache partition 152 K might have a normalized cache performance 132 K1. As shown, the performance of the two cache partitions is unbalanced. Such imbalance can be due to earlier described miss cost differences, relative cache partition sizes, total cache queries, and/or other factors. In some cases, the unbalanced cache partitions can result in an overall cache performance 134 1 (e.g., for the entire allocated cache memory space) that is not maximized. The herein disclosed techniques can address the foregoing issues attendant to balancing the sizes of the cache partitions to improve overall caching performance as shown and described as pertaining toFIG. 1C . -
FIG. 1C illustrates a dynamic cache balancing technique 1C00 used in systems that implement dynamic sizing of multiple cache partitions. As an option, one or more variations of dynamic cache balancing technique 1C00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. The dynamic cache balancing technique 1C00 or any aspect thereof may be implemented in any environment. - In certain embodiments, a “tail” of the cache partition can be analyzed to facilitate balancing the performance of various cache partitions in a cache memory space. In some cases, such balancing can further improve the overall cache performance. Specifically, the cache partition tail can represent the least accessed or hit data elements in the cache partition when determining the value of adding more space to the cache partition. More specifically, by monitoring the hit rate to the tail data element or data elements in the cache partition, the cache hit rate increase precipitated by adding more data elements (e.g., by increasing the cache partition size) can be predicted. Since cache accesses or queries can be stochastic, the cache partition tail might be defined as a percentage of some cache partition attribute such as a total number of cache partition elements, or a total number of cache partition data element hits. For example, the cache partition tail might be defined as the least hit 10% of the cache partition data elements.
- Certain normalization factors can be used to facilitate comparison of various cache partitions comprising different types of data elements. One normalization factor earlier mentioned is miss cost. Miss cost can normalize for the latency in retrieving a data element not in cache. For example, in some cases, the time to retrieve a metadata element from non-cache storage can be long as compared to the time to retrieve an extent data element from non-cache storage. In other cases, one cache partition can be queried more frequently as compared to another cache partition. Such query frequency differences can be addressed with a normalization factor to adjust for total cache partition queries. Further, data element sizes can vary by and between cache partitions. The tail size (e.g., in MB) for each respective cache partition can normalize for such data element size differences. Other normalization factors are possible.
- According to some embodiments, the foregoing techniques and/or normalization factors can be combined in a linear formula that approximates an objective function for maximizing the performance of a set of cache partitions. For example, the normalized
cache performance metric 156 1 and the normalizedcache performance metric 156 K for cache partition 152 1 and cache partition 152 K, respectively, can be determined according to the following formula: -
CPMNx=[tailHitsx÷totalQueriesx]·missCostx·[tailSizex±CRA] (EQ. 1) - where:
- CPMNx=cache performance metric for cache N and partition X
- tailHitsx=number of data element hits in tail of cache partition X
- totalQueriesx=total number of data element queries for cache partition X
- missCostx=miss cost (e.g., in μS) of data element in cache partition X
- tailSizex=tail size (e.g., in MB) of tail of cache partition X
- CRA=cache reallocation amount.
- The formula in EQ. 1can be used to balance two cache partitions by equating a first instance of EQ. 1 with +CRA corresponding to a first cache partition with a second instance of EQ. 1 with −CRA corresponding to a second cache partition. As an example,
-
CPMN1=CPMNK -
[2÷100]·500 μs·[50 MB+CRA]=[5÷100]·50 μs·[250 MB−CRA] -
CRA=10 MB - In the foregoing example, cache partition 152 1 can be increased by 10 MB and cache partition 152 K can be decreased by 10 MB to balance the performance of the cache partitions. As illustrated in
FIG. 1C , such balanced performance can be represented by a normalized cache performance 132 12 and a normalized cache performance 132 K2. As shown, the performance of the two cache partitions is balanced. Further, while the normalized cache performance 132 K2 of cache partition 152 K may be reduced as compared to the normalized cache performance 132 K1 shown inFIG. 1B , the overall cache performance 134 2 has increased as compared to the overall cache performance 134 1 shown inFIG. 1B . - As earlier described, the herein disclosed techniques can address the problems attendant to balancing the sizes of multiple cache partitions in distributed storage systems. One embodiment of an environment comprising such a distributed storage system is shown and described as pertains to
FIG. 2 . -
FIG. 2 presents anenvironment 200 in which embodiments of the present disclosure can operate. As an option, one or more variations ofenvironment 200 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Theenvironment 200 or any aspect thereof may be implemented in any environment. - The
environment 200 shows various components associated with one instance of a distributedstorage system 104 that can be used to implement the herein disclosed techniques. Specifically, theenvironment 200 can comprise multiple nodes (e.g., node 230 1, . . . , node 230 M) that have multiple tiers of storage in astorage pool 270. For example, each node can be associated with one server, multiple servers, or portions of a server. A group of such nodes can be called a cluster. The multiple tiers of storage can include storage that is accessible through thenetwork 214, such as a networked storage 275 (e.g., a storage area network or SAN, network attached storage or NAS, etc.). Thestorage pool 270 can also comprise one or more instances of local storage (e.g., local storage 272 1, . . . , local storage 272 M) that is within or directly attached to a server and/or appliance associated with the nodes. Such local storage can include solid state drives (SSD 273 1, . . . , SSD 273 M), hard disk drives (HDD 274 1, . . . , HDD 274 M), and/or other storage devices. - Each node can implement at least one instance of a virtualized controller (e.g., virtualized controller 226 1, . . . , virtualized controller 226 M) to facilitate access to the
storage pool 270 by one or more user virtual machine or VMs (e.g., user VM 224 11, . . . , user VM 224 1N, . . . , user VM 224 M1, . . . , user VM 224 MN) that run client software. The hardware of the node can be emulated for the user VMs by various hypervisors. For example, such hypervisors can be implemented using virtualization software (e.g., VMware ESXi, Microsoft Hyper-V, RedHat KVM, Nutanix AHV, etc.) that includes a hypervisor. Multiple instances of such virtualized controllers can coordinate within a cluster to form the distributedstorage system 104 which can, among other operations, manage thestorage pool 270. This architecture further facilitates efficient scaling of the distributed computing and/or storage platform (e.g., see scale 282). - The foregoing virtualized controllers can be implemented in
environment 200 using various techniques. Specifically, containers (e.g., Docker containers) can be used to implement a virtualized controller at the node. In this case, the user VMs can access thestorage pool 270 by interfacing with a controller container through a hypervisor and/or the kernel of the node host operating system. As another virtualized controller implementation example, an instance of a virtual machine at a given node can be used as a virtualized controller to manage storage and I/O activities. In this case, the user VMs at the node can interface with a controller virtual machine (e.g., controller VM) through a hypervisor to access thestorage pool 270. In such cases, the controller VMs are not formed as part of specific implementations of the hypervisors. Instead, the controller VMs can run as virtual machines above the hypervisors on the various servers. When the controller VMs run above the hypervisors, varying virtual machine architectures and/or hypervisors can operate with the distributedstorage system 104. For example, a hypervisor at one node in the distributedstorage system 104 might correspond to VMware ESXi software, and a hypervisor at another node in the distributedstorage system 104 might correspond to Nutanix AHV software. - As shown in
environment 200, cache memory spaces (e.g., cache memory space 250 1, . . . , cache memory space 250 M) can be implemented in various storage facilities in the nodes associated with the distributedstorage system 104. Specifically, in one embodiment, the cache memory space might be allocated using a portion of memory (e.g., DRAM of the node) and/or a portion of the SSD memory, and/or a portion of the HDD memory. A cache can be formed from any portions of the foregoing components and/or take on physical characteristics (e.g., megabytes of DRAM) and/or can take on logical characteristics (e.g., single-touch entries, multi-touch entries, large entries, small entries, etc.). Strictly as examples, a cache can be formed using purely hardware components, or a cache can be formed or addressed using software components or any combinations of hardware and software components. A cache might be composed of some DRAM, and some SDD and some HDD space (or combinations thereof), or a cache might be composed of a first data element type entry (e.g., comprising smaller entries) and a second data element type entry (e.g., comprising larger entries). Any cache of any type can host two or more partitions that can be bounded by hardware boundaries and/or bounded by software boundaries, or both. - In one example of a cache, a read request (e.g., query) for a first access data element might place the data element for access by the virtualized controller (e.g., in a single-touch pool) where it can be managed by various cached management techniques (e.g., LRU) until it is evicted. At some moment in time prior to eviction, the data element might be moved to SSD memory (e.g., in a multi-touch pool). In some cases, the various pools of cache might be managed in part by separate management algorithms (e.g., independent LRU counters). Multiple cache partitions comprising multiple pools are possible.
- Further details regarding general approaches to allocation of memory to caches are described in U.S. Pat. No. 9,910,774 titled, “SPONTANEOUS RECONFIGURATION OF DATA STRUCTURES USING BALLOON MEMORY ALLOCATION” issued on Mar. 6, 2018, which is hereby incorporated by reference in its entirety.
- In certain embodiments, one or more instances of a cache memory manager can be implemented in the distributed
storage system 104 to facilitate the herein disclosed techniques. Specifically, an instance of the cache memory manager 260 1 can be implemented in the virtualized controller 226 1, and another instance of the cache memory manager 260 M can be implemented in the virtualized controller 226 M. Such instances of the cache memory manager can be implemented in any node in any cluster. Further details regarding one implementation of the cache memory manager for facilitating the herein disclosed techniques is shown and described as pertaining toFIG. 3A . -
FIG. 3A depicts examples of implementation techniques 3A00 as used in systems that implement dynamic sizing of multiple cache partitions. As an option, one or more variations of implementation techniques 3A00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. The implementation techniques 3A00 or any aspect thereof may be implemented in any environment. - As shown in
FIG. 3A , the cache memory manager 260 1 earlier described can interact with various components in a distributed storage platform to implement the herein disclosed techniques. Specifically, the cache memory manager 260 1 might comprise acache monitor 362 to receive various attributes pertaining to a certain cache memory space (e.g., cache memory space 150). Such attributes can comprise a set of data element attributes 352 and/or a set of cache partition attributes 354. For example, the data element attributes 352 can comprise certain attributes corresponding to the data elements in a given cache partition represented by a partition identifier or partitionID such as a data element identifier or elementID, a data element type or elementType (e.g., metadata, extent data, etc.), a miss cost or missCost associated with the data element, a timestamp or timeStamp corresponding to the last access time of the data element at the head of the queue, and/or other attributes. Also, for example, the cache partition attributes 354 can comprise certain attributes corresponding to a given cache partition represented by a partition identifier or partitionID such as a cache partition size or partitionSize, a total number of cache partition queries or totalQueries, a total number of cache partition data elements or totalElements, a tail size or tailSize, and/or other attributes. In some embodiments, the foregoing attributes can be stored in a tabular structure with the shown attribute representing the table columns and the each of the respective items (e.g., data elements, cache partitions) representing the table rows. In other embodiments, the attributes can be stored as key-value pairs in computer programming objects. - Referring to
FIG. 3A , acache balancing engine 364 in the cache memory manager 260 1 can receive one or more of the data element attributes 352 and/or the cache partition attributes 354 to generate a set of cache partition tail attributes 356. For example, the cache partition tail attributes 356 might described the tail 153 1 of cache partition 152 1 and/or the tail 153 K of cache partition 152 K. For example, the cache partition tail attributes 356 generated by thecache balancing engine 364 can comprise certain attributes corresponding to the tail of a given cache partition represented by a partition identifier or partitionID such as a tail size or tailSize, a number of hits on data elements in the tail or tailHits, and/or other attributes. Specifically, for example, when there is a uniform distribution of data elements in a cache partition queue, the timeStamp attribute of the data element attributes 352 can be used to efficiently determine one or more of the cache partition tail attributes 356. -
Cache balancing engine 364 can further use the foregoing attributes and/or other information to determine the normalizedcache performance metrics 156 corresponding to the cache partition analyzed, which normalized cache performance metrics can be used to determine acache reallocation amount 158 for one or more cache partitions according to the herein disclosed techniques. In some cases, thecache balancing engine 364 and/or other components of the cache memory manager 260 1 can access a set ofcache sizing rules 366 for various purposes. For example,cache sizing rules 366 might indicate certain thresholds that, when breached, can trigger execution of one or more cache balancing operations. In such cases, continual monitoring of thecache memory space 150 by cache monitor 362 can facilitate a dynamic sizing of the cache partitions when, for example, certain data element and/or cache partition attributes breach their respective thresholds (operation 372). In some cases, the thresholds can be set such that cache size adjustments do not occur too frequently (e.g., providing a hysteresis effect). The cache sizing rules 366 might also indicate certain constraints imposed by the underlying system (e.g., distributed storage system) and/or by a user 302 (e.g., system administrator). For example, a total cache memory size might be allocated in thecache sizing rules 366 based on system capabilities and/or user specification. - A technique for monitoring cache partitions to perform dynamic cache balancing is shown and described as pertaining to
FIG. 3B . -
FIG. 3B presents a dynamic cache partition balancing technique 3B00 as implemented in systems that perform dynamic sizing of multiple cache partitions. As an option, one or more variations of dynamic cache partition balancing technique 3B00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. The dynamic cache partition balancing technique 3B00 or any aspect thereof may be implemented in any environment. - The dynamic cache partition balancing technique 3B00 presents one embodiment of certain steps and/or operations for facilitating dynamic sizing of multiple cache partitions according to the herein disclosed techniques. In one or more embodiments, the steps and underlying operations comprising the dynamic cache partition balancing technique 3B00 can be executed by an instance of the cache memory manager 260 1 as shown and described in
FIG. 3A and herein. - As shown, the dynamic cache partition balancing technique 3B00 can perform a set of operations for each cache partition managed (e.g., at a node in a distributed storage system) commencing with collecting cache partition attributes and/or data element attributes pertaining to a given cache partition (at step 332). Using the collected attributes and/or other information, various cache partition tail attributes can be determined (at step 334). A normalized cache performance metric can then be generated for the cache partition (at step 336). When the foregoing attributes and/or metrics have been collected and/or generated for each cache partition, a determination can be made whether to perform any cache balancing (at decision 338). For example, in some cases, there might be no change to the aforementioned attributes and/or metric since an earlier balancing operation such that another balancing operation need not be invoked. Specifically, one or more of the attributes and/or metrics might be compared to various thresholds and/or rules to make such a determination. If no cache partition balancing is to be performed (see “No” path of decision 338), then monitoring of the cache partitions can continue (operation 382).
- If cache partition balancing is invoked (see “Yes” path of decision 338), the cache partitions can be compared using the normalized cache performance metrics earlier determined (at step 340). Various approaches can be implemented to compare more than two cache partitions. For example, the cache partitions having the minimum and maximum normalized cache performance metrics might be first compared. The next pair of cache partitions having the broadest spread between respective normalized cache performance metrics might then be compared. This process can be iterated until all have been compared and/or a spread threshold has been reached. Certain linearization techniques can also be implemented to determine an equilibrium point (e.g., balanced performance) for the functions representing the normalized performance of the cache partitions. A set of cache reallocation amounts determined from such comparison techniques can be determined (at step 342). The cache reallocation amounts can be added to and/or subtracted from the appropriate cache partitions to balance the partitions (at step 344). When the cache partition resizing is complete, the monitoring of the cache partitions can continue (operation 382).
-
FIG. 4A depicts a system 4A00 as an arrangement of computing modules that are interconnected so as to operate cooperatively to implement certain of the herein-disclosed embodiments. The partitioning of system 4A00 is merely illustrative and other partitions are possible. As an option, the system 4A00 may be implemented in the context of the architecture and functionality of the embodiments described herein. Of course, however, the system 4A00 or any operation therein may be carried out in any desired environment. The system 4A00 comprises at least one processor and at least one memory, the memory serving to store program instructions corresponding to the operations of the system. As shown, an operation can be implemented in whole or in part using program instructions accessible by a module. The modules are connected to a communication path 4A05, and any operation can communicate with other operations over communication path 4A05. The modules of the system can, individually or in combination, perform method operations within system 4A00. Any operations performed within system 4A00 may be performed in any order unless as may be specified in the claims. The shown embodiment implements a portion of a computer system, presented as system 4A00, comprising a computer processor to execute a set of program code instructions (module 4A10) and modules for accessing memory to hold program code instructions to perform: defining a cache having two or more cache partitions, and at least two of the cache partitions having respective one or more cache partition attributes characterizing respective sizes of the cache partitions (module 4A20); receiving one or more data element attributes corresponding to one or more data elements that are stored in at least one of the cache partitions (module 4A30); generating at least one normalized cache performance metric for a respective at least one of the cache partitions, wherein the normalized cache performance metric is based at least in part on a combination of the cache partition attributes and the data element attributes, and (module 4A40); using the normalized cache performance metric to compare the cache partitions (module 4A50); determining at least one cache reallocation amount by equating a first normalized cache performance metric with a second normalized cache performance metric, the first normalized cache performance metric corresponding to a first cache partition from the cache partitions and the second normalized cache performance metric corresponding to a second cache partition from the cache partitions (module 4A60); and balancing the cache partitions by increasing a first cache partition size by the cache reallocation amount and decreasing a second cache partition size by the cache reallocation amount, the first cache partition size corresponding to the first cache partition and the second cache partition size corresponding to the second cache partition (module 4A70). - Variations of the foregoing may include more or fewer of the shown modules and variations may perform more or fewer (or different) steps, and/or may use data elements in more, or in fewer (or different) operations.
- Strictly as examples, some embodiments include:
-
- Variations where determining the cache reallocation amount is responsive to a change to at least one of, the cache partition attributes, or the data element attributes.
- Variations where the first cache partition comprises a first portion of the data elements characterized by a first data element type and a second cache partition from the cache partitions comprises a second portion of the data elements characterized by a second data element type.
- Variations where the first portion of the data elements comprises extent data and the second portion of the data elements comprises metadata.
- Variations where the data element attributes comprise at least one of, a partition identifier, a data element identifier, a data element type, a miss cost, or a timestamp.
- Variations where the cache partition attributes comprise at least one of, a partition identifier, a cache partition size, or a total number of cache partition queries.
- Variations where the normalized cache performance metric is derived at least in part from one or more cache partition tail attributes characterizing a cache partition tail corresponding to a respective one of the cache partitions.
- Variations where the cache partition tail attributes comprise at least one of, a partition identifier, a tail size, or a number of hits on data elements in the cache partition tail.
- Variations where the cache partition tail is defined based at least in part on at least one of, a number of cache partition data elements, or a number of cache partition data element hits.
- Variations where generating the normalized cache performance metric is based at least in part on a set of cache sizing rules.
-
FIG. 4B depicts a system 4B00 as an arrangement of computing modules that are interconnected so as to operate cooperatively to implement certain of the herein-disclosed embodiments. The partitioning of system 4B00 is merely illustrative and other partitions are possible. As an option, the system 4B00 may be implemented in the context of the architecture and functionality of the embodiments described herein. Of course, however, the system 4B00 or any operation therein may be carried out in any desired environment. The system 4B00 comprises at least one processor and at least one memory, the memory serving to store program instructions corresponding to the operations of the system. As shown, an operation can be implemented in whole or in part using program instructions accessible by a module. The modules are connected to a communication path 4B05, and any operation can communicate with other operations over communication path 4B05. The modules of the system can, individually or in combination, perform method operations within system 4B00. Any operations performed within system 4B00 may be performed in any order unless as may be specified in the claims. The shown embodiment implements a portion of a computer system, presented as system 4B00, comprising a computer processor to execute a set of program code instructions (module 4B10) and modules for accessing memory to hold program code instructions to perform: defining a first cache having a first cache size and a second cache having a second cache size (module 4B20); receiving one or more data element attributes corresponding to one or more data elements that are stored in at least one of the caches (module 4B30); defining a first tail portion and a first head portion of the first cache and defining a second tail portion and a second head portion of the second cache wherein incoming data elements are initially stored in a respective head portion and wherein evicted data elements are evicted from a respective tail portion (module 4B40); generating a first normalized cache performance metric for the first cache wherein the first normalized cache performance metric is based at least in part on a predicted miss cost to be incurred after evicting one or more of the evicted data elements from the first tail portion (module 4B50); generating a second normalized cache performance metric for the second cache wherein the second normalized cache performance metric is based at least in part on a predicted miss cost to be incurred after evicting one or more of the evicted data elements from the second tail portion (module 4B60); determining at least one cache reallocation amount by equating the first normalized cache performance metric with the second normalized cache performance metric (module 4B70); and balancing the caches by increasing the first cache size by the cache reallocation amount and decreasing the second cache size by the cache reallocation amount (module 4B80). -
FIG. 5A depicts a virtualized controller as implemented by the shown virtual machine architecture 5A00. The virtual machine architecture comprises a collection of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments. Moreover, the shown virtual machine architecture 5A00 includes a virtual machine instance in a configuration 501 that is further described as pertaining to the controllervirtual machine instance 530. A controller virtual machine instance receives block I/O (input/output or IO) storage requests as network file system (NFS) requests in the form ofNFS requests 502, and/or internet small computer storage interface (iSCSI) block IO requests in the form ofiSCSI requests 503, and/or Samba file system (SMB) requests in the form of SMB requests 504. The controller virtual machine (CVM) instance publishes and responds to an internet protocol (IP) address (e.g., see CVM IP address 510). Various forms of input and output (I/O or IO) can be handled by one or more IO control handler functions (see IOCTL functions 508) that interface to other functions such as data IO manager functions 514 and/or metadata manager functions 522. As shown, the data IO manager functions can include communication with a virtual disk configuration manager 512 and/or can include direct or indirect communication with any of various block IO functions (e.g., NFS IO, iSCSI IO, SMB IO, etc.). - In addition to block IO functions, the configuration 501 supports IO of any form (e.g., block IO, streaming IO, packet-based IO, HTTP traffic, etc.) through either or both of a user interface (UI) handler such as
UI IO handler 540 and/or through any of a range of application programming interfaces (APIs), possibly through the shownAPI IO manager 545. - The communications link 515 can be configured to transmit (e.g., send, receive, signal, etc.) any types of communications packets comprising any organization of data elements. The data elements can comprise a payload data, a destination address (e.g., a destination IP address) and a source address (e.g., a source IP address), and can include various packet processing techniques (e.g., tunneling), encodings (e.g., encryption), and/or formatting of bit fields into fixed-length blocks or into variable length fields used to populate the payload. In some cases, packet characteristics include a version identifier, a packet or payload length, a traffic class, a flow label, etc. In some cases, the payload comprises a data structure that is encoded and/or formatted to fit into byte or word boundaries of the packet.
- In some embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement aspects of the disclosure. Thus, embodiments of the disclosure are not limited to any specific combination of hardware circuitry and/or software. In embodiments, the term “logic” shall mean any combination of software or hardware that is used to implement all or part of the disclosure.
- The term “computer readable medium” or “computer usable medium” as used herein refers to any medium that participates in providing instructions to a data processor for execution. Such a medium may take many forms including, but not limited to, non-volatile media and volatile media. Non-volatile media includes any non-volatile storage medium, for example, solid-state storage devices (SSDs) or optical or magnetic disks such as disk drives or tape drives. Volatile media includes dynamic memory such as a random access memory. As shown, the controller
virtual machine instance 530 includes a contentcache manager facility 516 that accesses storage locations, possibly including local dynamic random access memory (DRAM) (e.g., through the local memory device access block 518) and/or possibly including accesses to local solid-state storage (e.g., through local SSD device access block 520). - Common forms of computer readable media includes any non-transitory computer readable medium, for example, floppy disk, flexible disk, hard disk, magnetic tape, or any other magnetic medium; CD-ROM or any other optical medium; punch cards, paper tape, or any other physical medium with patterns of holes; or any RAM, PROM, EPROM, FLASH-EPROM, or any other memory chip or cartridge. Any data can be stored, for example, in any form of
external data repository 531, which in turn can be formatted into any one or more storage areas, and which can comprise parameterized storage accessible by a key (e.g., a filename, a table name, a block address, an offset address, etc.). Anexternal data repository 531 can store any forms of data, and may comprise a storage area dedicated to storage of metadata pertaining to the stored forms of data. In some cases, metadata, can be divided into portions. Such portions and/or cache copies can be stored in the external storage data repository and/or in a local storage area (e.g., in local DRAM areas and/or in local SSD areas). Such local storage can be accessed using functions provided by a local metadatastorage access block 524. Theexternal data repository 531 can be configured using a CVMvirtual disk controller 526, which can in turn manage any number or any configuration of virtual disks. - Execution of the sequences of instructions to practice certain embodiments of the disclosure are performed by a one or more instances of a processing element such as a data processor, or such as a central processing unit (e.g., CPU1, CPU2). According to certain embodiments of the disclosure, two or more instances of a configuration 501 can be coupled by a communications link 515 (e.g., backplane, LAN, PTSN, wired or wireless network, etc.) and each instance may perform respective portions of sequences of instructions as may be required to practice embodiments of the disclosure.
- The shown
computing platform 506 is interconnected to theInternet 548 through one or more network interface ports (e.g., network interface port 523 1 and network interface port 523 2). The configuration 501 can be addressed through one or more network interface ports using an IP address. Any operational element withincomputing platform 506 can perform sending and receiving operations using any of a range of network protocols, possibly including network protocols that send and receive packets (e.g., seenetwork protocol packet 521 1 and network protocol packet 521 2). - The
computing platform 506 may transmit and receive messages that can be composed of configuration data, and/or any other forms of data and/or instructions organized into a data structure (e.g., communications packets). In some cases, the data structure includes program code instructions (e.g., application code) communicated throughInternet 548 and/or through any one or more instances of communications link 515. Received program code may be processed and/or executed by a CPU as it is received and/or program code may be stored in any volatile or non-volatile storage for later execution. Program code can be transmitted via an upload (e.g., an upload from an access device over theInternet 548 to computing platform 506). Further, program code and/or results of executing program code can be delivered to a particular user via a download (e.g., a download from thecomputing platform 506 over theInternet 548 to an access device). - The configuration 501 is merely one sample configuration. Other configurations or partitions can include further data processors, and/or multiple communications interfaces, and/or multiple storage devices, etc. within a partition. For example, a partition can bound a multi-core processor (e.g., possibly including embedded or co-located memory), or a partition can bound a computing cluster having plurality of computing elements, any of which computing elements are connected directly or indirectly to a communications link. A first partition can be configured to communicate to a second partition. A particular first partition and particular second partition can be congruent (e.g., in a processing element array) or can be different (e.g., comprising disjoint sets of components).
- A module as used herein can be implemented using any mix of any portions of the system memory and any extent of hard-wired circuitry including hard-wired circuitry embodied as a data processor. Some embodiments include one or more special-purpose hardware components (e.g., power control, logic, sensors, transducers, etc.). A module may include one or more state machines and/or combinational logic used to implement or facilitate the operational and/or other characteristics of performing dynamic sizing of multiple cache partitions.
- Various implementations of the data repository comprise storage media organized to hold a series of records or files such that individual records or files are accessed using a name or key (e.g., a primary key or a combination of keys and/or query clauses). Such files or records can be organized into one or more data structures (e.g., data structures used to implement or facilitate aspects of dynamic sizing of multiple cache partitions). Such files or records can be brought into and/or stored in volatile or non-volatile memory.
-
FIG. 5B depicts a virtualized controller implemented by a containerized architecture 5B00. The containerized architecture comprises a collection of interconnected components suitable for implementing embodiments of the present disclosure and/or for use in the herein-described environments. Moreover, the shown containerized architecture 5B00 includes a container instance in a configuration 551 that is further described as pertaining to thecontainer instance 550. The configuration 551 includes an operating system layer (as shown) that performs addressing functions such as providing access to external requestors via an IP address (e.g., “P.Q.R.S”, as shown). Providing access to external requestors can include implementing all or portions of a protocol specification (e.g., “http:”) and possibly handling port-specific functions. - The operating system layer can perform port forwarding to any container (e.g., container instance 550). A container instance can be executed by a processor. Runnable portions of a container instance sometimes derive from a container image, which in turn might include all, or portions of any of, a Java archive repository (JAR) and/or its contents, a script or scripts and/or a directory of scripts, a virtual machine configuration, and may include any dependencies therefrom. In some cases a configuration within a container might include an image comprising a minimum set of runnable code. Contents of larger libraries and/or code or data that would not be accessed during runtime of the container instance can be omitted from the larger library to form a smaller library composed of only the code or data that would be accessed during runtime of the container instance. In some cases, start-up time for a container instance can be much faster than start-up time for a virtual machine instance, at least inasmuch as the container image might be much smaller than a respective virtual machine instance. Furthermore, start-up time for a container instance can be much faster than start-up time for a virtual machine instance, at least inasmuch as the container image might have many fewer code and/or data initialization steps to perform than a respective virtual machine instance.
- A container instance (e.g., a Docker container) can serve as an instance of an application container. Any container of any sort can be rooted in a directory system, and can be configured to be accessed by file system commands (e.g., “1s” or “1s-a”, etc.). The container might optionally include
operating system components 578, however such a separate set of operating system components need not be provided. As an alternative, a container can include arunnable instance 558, which is built (e.g., through compilation and linking, or just-in-time compilation, etc.) to include all of the library and OS-like functions needed for execution of the runnable instance. In some cases, a runnable instance can be built with a virtual disk configuration manager, any of a variety of data IO management functions, etc. In some cases, a runnable instance includes code for, and access to, a containervirtual disk controller 576. Such a container virtual disk controller can perform any of the functions that the aforementioned CVMvirtual disk controller 526 can perform, yet such a container virtual disk controller does not rely on a hypervisor or any particular operating system so as to perform its range of functions. - In some environments multiple containers can be collocated and/or can share one or more contexts. For example, multiple containers that share access to a virtual disk can be assembled into a pod (e.g., a Kubernetes pod). Pods provide sharing mechanisms (e.g., when multiple containers are amalgamated into the scope of a pod) as well as isolation mechanisms (e.g., such that the namespace scope of one pod does not share the namespace scope of another pod).
- In the foregoing specification, the disclosure has been described with reference to specific embodiments thereof. It will however be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the disclosure. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of many of the described process actions may be changed without affecting the scope or operation of the disclosure. The specification and drawings are to be regarded in an illustrative sense rather than in a restrictive sense.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/990,816 US20180276143A1 (en) | 2016-07-19 | 2018-05-28 | Dynamic cache balancing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/214,264 US9984004B1 (en) | 2016-07-19 | 2016-07-19 | Dynamic cache balancing |
US15/990,816 US20180276143A1 (en) | 2016-07-19 | 2018-05-28 | Dynamic cache balancing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/214,264 Continuation US9984004B1 (en) | 2016-07-19 | 2016-07-19 | Dynamic cache balancing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180276143A1 true US20180276143A1 (en) | 2018-09-27 |
Family
ID=62165991
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/214,264 Active 2036-11-25 US9984004B1 (en) | 2016-07-19 | 2016-07-19 | Dynamic cache balancing |
US15/990,816 Abandoned US20180276143A1 (en) | 2016-07-19 | 2018-05-28 | Dynamic cache balancing |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/214,264 Active 2036-11-25 US9984004B1 (en) | 2016-07-19 | 2016-07-19 | Dynamic cache balancing |
Country Status (1)
Country | Link |
---|---|
US (2) | US9984004B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11243718B2 (en) * | 2019-12-20 | 2022-02-08 | SK Hynix Inc. | Data storage apparatus and operation method i'hereof |
US11625325B2 (en) | 2020-06-01 | 2023-04-11 | Oracle International Corporation | Partitioned mid-tier cache based on user type |
US11663143B2 (en) * | 2019-11-26 | 2023-05-30 | Oracle International Corporation | Multi-state midtier dynamic cache replacement |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9612961B2 (en) * | 2013-08-29 | 2017-04-04 | Empire Technology Development Llc | Cache partitioning in a multicore processor |
US11243707B2 (en) | 2014-03-12 | 2022-02-08 | Nutanix, Inc. | Method and system for implementing virtual machine images |
US11775341B2 (en) | 2016-02-05 | 2023-10-03 | Sas Institute Inc. | Automated job flow generation to provide object views in container-supported many task computing |
US10482019B2 (en) * | 2016-02-24 | 2019-11-19 | Hitachi, Ltd. | Storage apparatus and control method thereof |
US9984004B1 (en) * | 2016-07-19 | 2018-05-29 | Nutanix, Inc. | Dynamic cache balancing |
US10034407B2 (en) * | 2016-07-22 | 2018-07-24 | Intel Corporation | Storage sled for a data center |
WO2019142153A1 (en) * | 2018-01-19 | 2019-07-25 | Hossein Asadi | Cache allocation to a virtual machine |
US20190278715A1 (en) * | 2018-03-12 | 2019-09-12 | Nutanix, Inc. | System and method for managing distribution of virtual memory over multiple physical memories |
US11748158B2 (en) * | 2018-09-30 | 2023-09-05 | Sas Institute Inc. | Data object preparation for execution of multiple task routine instances in many task computing |
US11113192B2 (en) * | 2019-11-22 | 2021-09-07 | EMC IP Holding Company LLC | Method and apparatus for dynamically adapting cache size based on estimated cache performance |
US20240232086A1 (en) * | 2022-04-06 | 2024-07-11 | Dell Products L.P. | Using cache loss signal as a basis to optimize hit rate and utilization through cache partitioning |
Citations (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4691277A (en) * | 1984-10-24 | 1987-09-01 | International Business Machines Corp. | Small instruction cache using branch target table to effect instruction prefetch |
US5717893A (en) * | 1989-03-22 | 1998-02-10 | International Business Machines Corporation | Method for managing a cache hierarchy having a least recently used (LRU) global cache and a plurality of LRU destaging local caches containing counterpart datatype partitions |
US5732267A (en) * | 1995-08-02 | 1998-03-24 | Microsoft Corporation | Caching/prewarming data loaded from CD-ROM |
US6209058B1 (en) * | 1999-01-27 | 2001-03-27 | Quantum Corp. | Cache management for data transfer control from target disk areas |
US20010001872A1 (en) * | 1998-06-10 | 2001-05-24 | International Business Machines Corp. | Data caching with a partially compressed cache |
US20020169784A1 (en) * | 2001-03-05 | 2002-11-14 | Cha Sang K. | Compression scheme for improving cache behavior |
US6486803B1 (en) * | 2000-09-22 | 2002-11-26 | Digital Fountain, Inc. | On demand encoding with a window |
US20030028728A1 (en) * | 2001-07-31 | 2003-02-06 | Mitsubishi Denki Kabushiki Kaisha | Cache memory control device |
US20030093622A1 (en) * | 2001-11-09 | 2003-05-15 | International Business Machines Corporation | Caching memory contents into cache partitions baesed on memory locations |
US6671766B1 (en) * | 2000-01-07 | 2003-12-30 | Storage Technology Corporation | Method and system for implementing memory efficient track aging |
US6698015B1 (en) * | 2000-06-13 | 2004-02-24 | Cisco Technology, Inc. | Apparatus and method for improving performance of critical code execution |
US6708330B1 (en) * | 2000-06-13 | 2004-03-16 | Cisco Technology, Inc. | Performance improvement of critical code execution |
US20040098541A1 (en) * | 2002-11-14 | 2004-05-20 | International Business Machines Corporation | System and method for implementing an adaptive replacement cache policy |
US20040186862A1 (en) * | 2003-03-18 | 2004-09-23 | Garthwaite Alexander T. | Parallel caching of insertions into remembered-sets |
US20040268051A1 (en) * | 2002-01-24 | 2004-12-30 | University Of Washington | Program-directed cache prefetching for media processors |
US20050268037A1 (en) * | 2004-05-27 | 2005-12-01 | International Business Machines Corporation | Cache hit ratio estimating apparatus, cache hit ratio estimating method, program, and recording medium |
US20060069876A1 (en) * | 2004-09-30 | 2006-03-30 | Sorav Bansal | Method and system of clock with adaptive cache replacement and temporal filtering |
US20060090036A1 (en) * | 2004-10-27 | 2006-04-27 | Ofir Zohar | Method for differential discarding of cached data in distributed storage systems |
US20060181953A1 (en) * | 2005-02-11 | 2006-08-17 | Eric Rotenberg | Systems, methods and devices for providing variable-latency write operations in memory devices |
US20060184737A1 (en) * | 2005-02-17 | 2006-08-17 | Hideshi Yamada | Data stream generation method for enabling high-speed memory access |
US20070033341A1 (en) * | 2005-08-04 | 2007-02-08 | Akiyoshi Hashimoto | Storage system for controlling disk cache |
US20070050603A1 (en) * | 2002-08-07 | 2007-03-01 | Martin Vorbach | Data processing method and device |
US20070083730A1 (en) * | 2003-06-17 | 2007-04-12 | Martin Vorbach | Data processing device and method |
US20070174550A1 (en) * | 2003-07-16 | 2007-07-26 | Matsushita Electric Industrial Co., Ltd. | Data area managing method in information recording medium and information processor employing data area managing method |
US20070226422A1 (en) * | 2006-03-08 | 2007-09-27 | Matsushita Electric Industrial Co., Ltd. | Multi-master system and data transfer system |
US20080177975A1 (en) * | 2007-01-23 | 2008-07-24 | Nobuo Kawamura | Database management system for controlling setting of cache partition area in storage system |
US20090055496A1 (en) * | 2002-10-08 | 2009-02-26 | Gaurav Garg | Advanced processor with credit based scheme for optimal packet flow in a multi-processor system on a chip |
US20100095057A1 (en) * | 2008-10-15 | 2010-04-15 | Seagate Technology Llc | Non-volatile resistive sense memory on-chip cache |
US20100115240A1 (en) * | 2008-11-05 | 2010-05-06 | Ohad Falik | Optimizing performance of instructions based on sequence detection or information associated with the instructions |
US20100153654A1 (en) * | 2002-08-07 | 2010-06-17 | Martin Vorbach | Data processing method and device |
US7800632B2 (en) * | 2003-03-18 | 2010-09-21 | Qualcomm Incorporated | Triangle rendering using direct evaluation |
US20100318744A1 (en) * | 2009-06-15 | 2010-12-16 | International Business Machines Corporation | Differential caching mechanism based on media i/o speed |
US20110191522A1 (en) * | 2010-02-02 | 2011-08-04 | Condict Michael N | Managing Metadata and Page Replacement in a Persistent Cache in Flash Memory |
US20110191547A1 (en) * | 2009-11-19 | 2011-08-04 | Hitachi, Ltd. | Computer system and load equalization control method for the same |
US20110219208A1 (en) * | 2010-01-08 | 2011-09-08 | International Business Machines Corporation | Multi-petascale highly efficient parallel supercomputer |
US20110238948A1 (en) * | 2002-08-07 | 2011-09-29 | Martin Vorbach | Method and device for coupling a data processing unit and a data processing array |
US20120159074A1 (en) * | 2011-12-23 | 2012-06-21 | Sodhi Inder M | Method, apparatus, and system for energy efficiency and energy conservation including dynamic cache sizing and cache operating voltage management for optimal power performance |
US20120198174A1 (en) * | 2011-01-31 | 2012-08-02 | Fusion-Io, Inc. | Apparatus, system, and method for managing eviction of data |
US20120221774A1 (en) * | 2011-02-25 | 2012-08-30 | Fusion-Io, Inc. | Apparatus, system, and method for managing contents of a cache |
US20130094318A1 (en) * | 2011-10-12 | 2013-04-18 | Nam Sung Kim | Energy Efficient Processor Having Heterogeneous Cache |
US20130297655A1 (en) * | 2012-05-02 | 2013-11-07 | Microsoft Corporation | Performance service level agreements in multi-tenant database systems |
US20130318196A1 (en) * | 2012-05-23 | 2013-11-28 | Hitachi, Ltd. | Storage system and storage control method for using storage area based on secondary storage as cache area |
US20140115569A1 (en) * | 2012-10-23 | 2014-04-24 | Yong-Kyu Jung | Adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system |
US20150106596A1 (en) * | 2003-03-21 | 2015-04-16 | Pact Xpp Technologies Ag | Data Processing System Having Integrated Pipelined Array Data Processor |
US20150128007A1 (en) * | 2013-11-06 | 2015-05-07 | Wisconsin Alumni Research Foundation | Dynamic Error Handling For On-Chip Memory Structures |
US9317379B2 (en) * | 2014-01-24 | 2016-04-19 | International Business Machines Corporation | Using transactional execution for reliability and recovery of transient failures |
US20170161193A1 (en) * | 2015-12-02 | 2017-06-08 | International Business Machines Corporation | Hybrid cache |
US20170220719A1 (en) * | 2016-02-01 | 2017-08-03 | King Fahd University Of Petroleum And Minerals | Multi-core compact executable trace processor |
US20180046583A1 (en) * | 2016-08-12 | 2018-02-15 | Advanced Micro Devices, Inc. | Updating least-recently-used data for greater persistence of higher generality cache entries |
US20180052772A1 (en) * | 2015-05-14 | 2018-02-22 | Hitachi, Ltd. | Storage system and storage control method |
US9984004B1 (en) * | 2016-07-19 | 2018-05-29 | Nutanix, Inc. | Dynamic cache balancing |
US10503792B1 (en) * | 2019-05-10 | 2019-12-10 | Georgetown University | Cache optimization via topics in web search engines |
-
2016
- 2016-07-19 US US15/214,264 patent/US9984004B1/en active Active
-
2018
- 2018-05-28 US US15/990,816 patent/US20180276143A1/en not_active Abandoned
Patent Citations (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4691277A (en) * | 1984-10-24 | 1987-09-01 | International Business Machines Corp. | Small instruction cache using branch target table to effect instruction prefetch |
US5717893A (en) * | 1989-03-22 | 1998-02-10 | International Business Machines Corporation | Method for managing a cache hierarchy having a least recently used (LRU) global cache and a plurality of LRU destaging local caches containing counterpart datatype partitions |
US5732267A (en) * | 1995-08-02 | 1998-03-24 | Microsoft Corporation | Caching/prewarming data loaded from CD-ROM |
US20010001872A1 (en) * | 1998-06-10 | 2001-05-24 | International Business Machines Corp. | Data caching with a partially compressed cache |
US6209058B1 (en) * | 1999-01-27 | 2001-03-27 | Quantum Corp. | Cache management for data transfer control from target disk areas |
US6671766B1 (en) * | 2000-01-07 | 2003-12-30 | Storage Technology Corporation | Method and system for implementing memory efficient track aging |
US6698015B1 (en) * | 2000-06-13 | 2004-02-24 | Cisco Technology, Inc. | Apparatus and method for improving performance of critical code execution |
US6708330B1 (en) * | 2000-06-13 | 2004-03-16 | Cisco Technology, Inc. | Performance improvement of critical code execution |
US6486803B1 (en) * | 2000-09-22 | 2002-11-26 | Digital Fountain, Inc. | On demand encoding with a window |
US20020169784A1 (en) * | 2001-03-05 | 2002-11-14 | Cha Sang K. | Compression scheme for improving cache behavior |
US20030028728A1 (en) * | 2001-07-31 | 2003-02-06 | Mitsubishi Denki Kabushiki Kaisha | Cache memory control device |
US20030093622A1 (en) * | 2001-11-09 | 2003-05-15 | International Business Machines Corporation | Caching memory contents into cache partitions baesed on memory locations |
US20040268051A1 (en) * | 2002-01-24 | 2004-12-30 | University Of Washington | Program-directed cache prefetching for media processors |
US20070050603A1 (en) * | 2002-08-07 | 2007-03-01 | Martin Vorbach | Data processing method and device |
US20100153654A1 (en) * | 2002-08-07 | 2010-06-17 | Martin Vorbach | Data processing method and device |
US20110238948A1 (en) * | 2002-08-07 | 2011-09-29 | Martin Vorbach | Method and device for coupling a data processing unit and a data processing array |
US20090055496A1 (en) * | 2002-10-08 | 2009-02-26 | Gaurav Garg | Advanced processor with credit based scheme for optimal packet flow in a multi-processor system on a chip |
US20040098541A1 (en) * | 2002-11-14 | 2004-05-20 | International Business Machines Corporation | System and method for implementing an adaptive replacement cache policy |
US20040186862A1 (en) * | 2003-03-18 | 2004-09-23 | Garthwaite Alexander T. | Parallel caching of insertions into remembered-sets |
US7800632B2 (en) * | 2003-03-18 | 2010-09-21 | Qualcomm Incorporated | Triangle rendering using direct evaluation |
US20150106596A1 (en) * | 2003-03-21 | 2015-04-16 | Pact Xpp Technologies Ag | Data Processing System Having Integrated Pipelined Array Data Processor |
US20070083730A1 (en) * | 2003-06-17 | 2007-04-12 | Martin Vorbach | Data processing device and method |
US20070174550A1 (en) * | 2003-07-16 | 2007-07-26 | Matsushita Electric Industrial Co., Ltd. | Data area managing method in information recording medium and information processor employing data area managing method |
US20050268037A1 (en) * | 2004-05-27 | 2005-12-01 | International Business Machines Corporation | Cache hit ratio estimating apparatus, cache hit ratio estimating method, program, and recording medium |
US20060069876A1 (en) * | 2004-09-30 | 2006-03-30 | Sorav Bansal | Method and system of clock with adaptive cache replacement and temporal filtering |
US20060090036A1 (en) * | 2004-10-27 | 2006-04-27 | Ofir Zohar | Method for differential discarding of cached data in distributed storage systems |
US20060181953A1 (en) * | 2005-02-11 | 2006-08-17 | Eric Rotenberg | Systems, methods and devices for providing variable-latency write operations in memory devices |
US20060184737A1 (en) * | 2005-02-17 | 2006-08-17 | Hideshi Yamada | Data stream generation method for enabling high-speed memory access |
US20070033341A1 (en) * | 2005-08-04 | 2007-02-08 | Akiyoshi Hashimoto | Storage system for controlling disk cache |
US20070226422A1 (en) * | 2006-03-08 | 2007-09-27 | Matsushita Electric Industrial Co., Ltd. | Multi-master system and data transfer system |
US20080177975A1 (en) * | 2007-01-23 | 2008-07-24 | Nobuo Kawamura | Database management system for controlling setting of cache partition area in storage system |
US20100095057A1 (en) * | 2008-10-15 | 2010-04-15 | Seagate Technology Llc | Non-volatile resistive sense memory on-chip cache |
US20100115240A1 (en) * | 2008-11-05 | 2010-05-06 | Ohad Falik | Optimizing performance of instructions based on sequence detection or information associated with the instructions |
US20100318744A1 (en) * | 2009-06-15 | 2010-12-16 | International Business Machines Corporation | Differential caching mechanism based on media i/o speed |
US20110191547A1 (en) * | 2009-11-19 | 2011-08-04 | Hitachi, Ltd. | Computer system and load equalization control method for the same |
US20110219208A1 (en) * | 2010-01-08 | 2011-09-08 | International Business Machines Corporation | Multi-petascale highly efficient parallel supercomputer |
US20110191522A1 (en) * | 2010-02-02 | 2011-08-04 | Condict Michael N | Managing Metadata and Page Replacement in a Persistent Cache in Flash Memory |
US9678874B2 (en) * | 2011-01-31 | 2017-06-13 | Sandisk Technologies Llc | Apparatus, system, and method for managing eviction of data |
US20120198174A1 (en) * | 2011-01-31 | 2012-08-02 | Fusion-Io, Inc. | Apparatus, system, and method for managing eviction of data |
US8825937B2 (en) * | 2011-02-25 | 2014-09-02 | Fusion-Io, Inc. | Writing cached data forward on read |
US20120221774A1 (en) * | 2011-02-25 | 2012-08-30 | Fusion-Io, Inc. | Apparatus, system, and method for managing contents of a cache |
US20130094318A1 (en) * | 2011-10-12 | 2013-04-18 | Nam Sung Kim | Energy Efficient Processor Having Heterogeneous Cache |
US20120159074A1 (en) * | 2011-12-23 | 2012-06-21 | Sodhi Inder M | Method, apparatus, and system for energy efficiency and energy conservation including dynamic cache sizing and cache operating voltage management for optimal power performance |
US20130297655A1 (en) * | 2012-05-02 | 2013-11-07 | Microsoft Corporation | Performance service level agreements in multi-tenant database systems |
US20130318196A1 (en) * | 2012-05-23 | 2013-11-28 | Hitachi, Ltd. | Storage system and storage control method for using storage area based on secondary storage as cache area |
US20140115569A1 (en) * | 2012-10-23 | 2014-04-24 | Yong-Kyu Jung | Adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system |
US20150128007A1 (en) * | 2013-11-06 | 2015-05-07 | Wisconsin Alumni Research Foundation | Dynamic Error Handling For On-Chip Memory Structures |
US9317379B2 (en) * | 2014-01-24 | 2016-04-19 | International Business Machines Corporation | Using transactional execution for reliability and recovery of transient failures |
US20160196193A1 (en) * | 2014-01-24 | 2016-07-07 | International Business Machines Corporation | Using transactional execution for reliability and recovery of transient failures |
US20180052772A1 (en) * | 2015-05-14 | 2018-02-22 | Hitachi, Ltd. | Storage system and storage control method |
US20170161193A1 (en) * | 2015-12-02 | 2017-06-08 | International Business Machines Corporation | Hybrid cache |
US20170220719A1 (en) * | 2016-02-01 | 2017-08-03 | King Fahd University Of Petroleum And Minerals | Multi-core compact executable trace processor |
US9984004B1 (en) * | 2016-07-19 | 2018-05-29 | Nutanix, Inc. | Dynamic cache balancing |
US20180046583A1 (en) * | 2016-08-12 | 2018-02-15 | Advanced Micro Devices, Inc. | Updating least-recently-used data for greater persistence of higher generality cache entries |
US10503792B1 (en) * | 2019-05-10 | 2019-12-10 | Georgetown University | Cache optimization via topics in web search engines |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11663143B2 (en) * | 2019-11-26 | 2023-05-30 | Oracle International Corporation | Multi-state midtier dynamic cache replacement |
US11243718B2 (en) * | 2019-12-20 | 2022-02-08 | SK Hynix Inc. | Data storage apparatus and operation method i'hereof |
US11625325B2 (en) | 2020-06-01 | 2023-04-11 | Oracle International Corporation | Partitioned mid-tier cache based on user type |
Also Published As
Publication number | Publication date |
---|---|
US9984004B1 (en) | 2018-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9984004B1 (en) | Dynamic cache balancing | |
US11740818B2 (en) | Dynamic data compression | |
US11562091B2 (en) | Low latency access to physical storage locations by implementing multiple levels of metadata | |
US10785299B2 (en) | Generating cloud-hosted storage objects from observed data access patterns | |
US10474656B1 (en) | Repurposing log files | |
Kourtis et al. | Reaping the performance of fast {NVM} storage with {uDepot} | |
US10567009B2 (en) | Dynamic erasure coding | |
US11734040B2 (en) | Efficient metadata management | |
US10824369B2 (en) | Elastic method of remote direct memory access memory advertisement | |
US7529867B2 (en) | Adaptive, scalable I/O request handling architecture in virtualized computer systems and networks | |
US10990441B2 (en) | Multi-level job processing queues | |
US20190370043A1 (en) | Cooperative memory management | |
US11347558B2 (en) | Security-aware scheduling of virtual machines in a multi-tenant infrastructure | |
US20230280944A1 (en) | Tiering Data Strategy for a Distributed Storage System | |
US20080104589A1 (en) | Adaptive, Scalable I/O Request Handling Architecture in Virtualized Computer Systems and Networks | |
WO2021218038A1 (en) | Storage system, memory management method, and management node | |
US11216420B2 (en) | System and method for high replication factor (RF) data replication | |
US11016676B2 (en) | Spot coalescing of distributed data concurrent with storage I/O operations | |
US20210397345A1 (en) | Managing i/o amplification in log-structured merge trees | |
US20080104590A1 (en) | Adaptive, Scalable I/O Request Handling Architecture in Virtualized Computer Systems and Networks | |
US20200026659A1 (en) | Virtualized memory paging using random access persistent memory devices | |
US12001338B2 (en) | Method and system for implementing metadata compression in a virtualization environment | |
US20190258420A1 (en) | Managing multi-tiered swap space | |
US11733894B2 (en) | Dynamically formatted storage allocation record | |
US20230132493A1 (en) | Importing workload data into a sharded virtual disk |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NUTANIX, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LITTLE, GARY JEFFREY;YUAN, HUAPENG;GUPTA, KARAN;AND OTHERS;SIGNING DATES FROM 20180130 TO 20180201;REEL/FRAME:045912/0339 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |