Nothing Special   »   [go: up one dir, main page]

CN114860783B - Graph database caching method and device, electronic equipment and readable storage medium - Google Patents

Graph database caching method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN114860783B
CN114860783B CN202210781029.2A CN202210781029A CN114860783B CN 114860783 B CN114860783 B CN 114860783B CN 202210781029 A CN202210781029 A CN 202210781029A CN 114860783 B CN114860783 B CN 114860783B
Authority
CN
China
Prior art keywords
point
objects
edge
sampling
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210781029.2A
Other languages
Chinese (zh)
Other versions
CN114860783A (en
Inventor
高昆仑
陈国宝
赵保华
乔贵邠
张亮
周飞
林剑超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fangtu Data Beijing Software Co ltd
State Grid Smart Grid Research Institute Co ltd
Original Assignee
Fangtu Data Beijing Software Co ltd
State Grid Smart Grid Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fangtu Data Beijing Software Co ltd, State Grid Smart Grid Research Institute Co ltd filed Critical Fangtu Data Beijing Software Co ltd
Priority to CN202210781029.2A priority Critical patent/CN114860783B/en
Publication of CN114860783A publication Critical patent/CN114860783A/en
Application granted granted Critical
Publication of CN114860783B publication Critical patent/CN114860783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a graph database caching method, a graph database caching device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling; acquiring memory space occupation of each sampling point object and each sampling edge object; calculating the average memory space occupation of the sampling point objects and the average memory space occupation of the sampling edge objects; determining the maximum point object number and the maximum side object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling side objects and the size of the cache space; and caching based on the maximum point object number and the maximum edge object number. The technical scheme provided by the invention can improve the utilization rate of the cache space, thereby improving the efficiency of subsequent data access.

Description

Graph database caching method and device, electronic equipment and readable storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a graph database caching method, a graph database caching device, electronic equipment and a readable storage medium.
Background
With the explosive development of the internet, the mobile internet, the social network, the internet of things and the associated network in the industrial field, such as the power network, the storage of the relational graph and the application of the network topology analysis and the function analysis based on the relational graph have great requirements, and the development of graph databases is also promoted.
The traditional relational database can stably respond to the On-Line Transaction Processing (OLTP) demand in real time, for example, a user purchasing behavior record of an online shopping mall is recorded by a system after the purchasing behavior occurs, and the behavior data is updated in the relational database in an incremental and destructive manner. When an online mall main body wants to recommend a commodity to another user based On historical purchase data of users having the same or similar purchasing habits or predict whether a certain class of users will purchase a certain class of commodities, such as On-Line Analytical Processing (OLAP) data analysis requirements based On a relational graph, the conventional relational database faces difficulties of complex Schema (Schema), uneconomical computation time and space, and long response time delay. In a traditional relational database, characterizing relationships between entities requires the creation of an association table. When the relationship between the links is huge and the entity relationship of the query is deep, the time consumed by building the association table through the keywords in the modeling stage cannot accelerate the relationship query between the entities, which is just the advantage of the graph database.
In order to quickly query and analyze massive graph network data and improve the performance of large-scale graph data query and analysis, a graph database based on a disk needs to introduce a graph data logic object (namely a point object and an edge object) to cache disk data in a memory so as to accelerate the acquisition and operation process of the object data and reduce disk read-write requests, thereby avoiding the limitation of disk IO to data access capability. For the cache, management of cache space, cache algorithm (or called cache replacement algorithm, cache eviction algorithm), and cache loading are important.
Common object-based cache management is mostly based on a fixed number of object caches, the cache management is simple and direct to implement and high in efficiency, however, dynamic fluctuation of a cache space may be brought to variable-size memory object caches, and effective utilization of a physical cache space cannot be well performed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, an electronic device, and a readable storage medium for caching a graph database, so as to solve the problem of low utilization rate of a cache space.
According to a first aspect, an embodiment of the present invention provides a method for caching a database, where the method includes:
acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling;
acquiring memory space occupation of each sampling point object and each sampling edge object;
calculating the average memory space occupation of the sampling point objects and the average memory space occupation of the sampling edge objects;
determining the maximum point object number and the maximum side object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling side objects and the size of the cache space;
and caching based on the maximum point object number and the maximum edge object number.
Optionally, the obtaining memory space occupation of each sampling point object and each sampling edge object includes:
for each sampling object, acquiring the memory space occupation of at least one of the following items, and summing up to obtain the memory space occupation of the sampling object:
the head part of the object of the sampled object in the memory;
the identification of the sampling object in the memory;
a reference to the sample object;
an alignment space of the sampling object;
a property of the sampling object;
wherein the sampling object comprises the sampling point object and the sampling edge object.
Optionally, the caching based on the maximum number of point objects and the maximum number of edge objects includes:
selecting a first point object with the largest degree;
acquiring and caching an edge object of the first point object, N layers of neighbor nodes of the first point object and edge objects of the N layers of neighbor nodes, wherein N is a positive integer;
judging whether the cached point objects and edge objects reach the maximum point object number and the maximum edge object number;
if not, acquiring a second point object, an edge object of the second point object, M-layer neighbor nodes of the second point object and edge objects of the M-layer neighbor nodes, and caching; wherein M is a positive integer, and the second point object is a point object with the largest degree among point objects which are not yet cached, or a point object with the largest degree among all point objects except the first point object.
Optionally, before performing the caching based on the maximum point object number and the maximum edge object number, the method further includes:
when the point object and/or the side object are changed, adjusting the degree index of the point object related to the change according to the changed degree of the point object related to the change;
wherein the change includes addition, modification, and deletion.
Optionally, after the caching is performed based on the maximum point object number and the maximum edge object number, at least one of the following is further included:
if the first data object aimed at by each read and write operation is not cached, caching the first data object after the read and write operations;
if the second data object aimed by the read and write operations does not exist, identifying the second data object in the cache space to indicate that the second data object does not exist;
wherein the first data object and the second data object comprise a point object and an edge object.
Optionally, the point object and the edge object respectively correspond to a hot data cache sequence and a cold data cache sequence;
the method further comprises at least one of:
adding the corresponding cold data cache sequence to the point object or the edge object which is just cached;
for a point object or an edge object in the cold data cache sequence, if accessed, moving into the corresponding hot data cache sequence;
after the point objects or the edge objects in the hot data cache sequence are listed according to a first preset rule, moving the point objects or the edge objects into the corresponding cold data cache sequence;
after the point objects or the edge objects in the cold data cache sequence are listed according to a second preset rule, the point objects or the edge objects are moved out of the cache;
for point objects in the cold data cache sequence, if an adjacent edge object in the hot data cache sequence exists, the position of the point object is moved forward in the corresponding cold data cache sequence;
and for the edge object in the cold data cache sequence, if the adjacent point object in the hot data cache sequence exists, the position is moved forward in the corresponding cold data cache sequence.
Optionally, the first preset rule and the second preset rule are used least recently.
According to a second aspect, an embodiment of the present invention provides a graph database caching apparatus, including:
the sampling module is used for acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling;
the first calculation module is used for acquiring the memory space occupation of each sampling point object and each sampling edge object;
the second calculation module is used for calculating the average memory space occupation of the sampling point object and the average memory space occupation of the sampling edge object;
the determining module is used for determining the maximum point object number and the maximum edge object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling edge objects and the size of the cache space;
and the caching module is used for caching based on the maximum point object number and the maximum edge object number.
According to a third aspect, an embodiment of the present invention provides an electronic device, including:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory being used for storing a computer program, and the computer program, when executed by the processor, implementing any of the graph database caching methods described above in relation to the first aspect.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing a computer program, which when executed by a processor, implements any of the graph database caching methods described in the first aspect above.
In the embodiment of the invention, the point and side data in the graph database are sampled, the storage space (byte unit) of the sampled point and side object in the memory is calculated to obtain the actual memory space, and then the average memory space occupation required by the point and side object of the whole graph can be estimated according to the memory space occupation of the sampled object. Therefore, a quantifiable management mechanism for the size of the cache space of the graph database points and the data at the edges is provided, namely, the calculation of the object memory space is introduced in a quantification mode, so that the cache space is managed finely, the utilization rate of the cache space is improved, and the subsequent data access speed is further improved.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart diagram illustrating a method for caching a database according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a structure of a graph database caching apparatus according to an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
It should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
Referring to fig. 1, an embodiment of the present invention provides a method for caching a graph database, where the method includes:
s101: acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling;
s102: acquiring memory space occupation of each sampling point object and each sampling edge object;
s103: calculating the average memory space occupation of the sampling point object and the average memory space occupation of the sampling edge object;
s104: determining the maximum point object number and the maximum side object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling side objects and the size of the cache space;
the size of the cache space can be configured when the server is started. When determining the maximum number of point objects and the maximum number of edge objects that can be accommodated in the cache space, the ratio of the point objects and the edge objects that enter the cache according to the plan load may also be, for example, 1: and 2, respectively calculating the maximum point object number and the maximum edge object number.
S105: and caching based on the maximum point object number and the maximum edge object number.
In the embodiment of the invention, the point and edge data in the graph database are sampled, the storage space (byte as unit) of the sampled point and edge objects in the memory is calculated to obtain the actual memory space, and then the average memory space occupation required by the point and edge objects of the whole graph can be estimated according to the memory space occupation of the sampled objects. Therefore, a quantifiable management mechanism for the size of the cache space of the graph database points and the data at the edges is provided, namely, the calculation of the object memory space is introduced in a quantification mode, so that the cache space is managed finely, the utilization rate of the cache space is improved, and the subsequent data access speed is further improved.
Before obtaining a plurality of sampled sampling point objects and sampling edge objects, the sampling number of the point objects and the sampling number of the edge objects can be calculated, so that a certain number of point objects and edge objects are selected to calculate the average memory space occupation (also called as a memory capacity average value or a memory space average value). Specifically, the sampling sample amount number calculation formula is as follows: n = z 2 * p * (1 - p) / e 2 Wherein the confidence (α) may be chosen to be 95%, the population ratio (p) may be chosen to be 50%, and the margin error (e) may be chosen to be 0.05. The corresponding confidence is 95%, and the corresponding z value is 1.96 according to table 1 below, thus the number of samples is 385 by calculation.
TABLE 1
Confidence level Z value
85% 1.44
90% 1.645
95% 1.96
99% 2.576
99.9% 3.29
99.99% 3.89
According to the sampling number obtained by the calculation, when the number of points and edges of the graph is less than 385 x 3, all the points and edge objects are selected to carry out memory space calculation so as to obtain average memory space occupation; if the number of points and edges in the graph is greater than or equal to 385 × 3, at most 385 × 3 points and edges are selected to sample the object to calculate the memory space so as to obtain the average memory space occupation, and the sampling can be specifically performed in a pseudo-random manner.
In some optional specific embodiments, the obtaining memory space occupation of each sampling point object and each sampling edge object includes:
for each sampling object, acquiring the memory space occupation of at least one of the following items, and summing up to obtain the memory space occupation of the sampling object:
the header part of the object in the memory of the sampled object is 8/12/16 bytes according to the difference of 32/64 bit Java Virtual Machines (JVMs) and the size of the available memory;
the identification of the sampling object in the memory; taking the point object as an example, the Identification (ID) in the memory is of Long type, and occupies the memory space of Long type. In addition, the identification of the object of the first edge associated with the point object is also of Long type;
a reference to the sample object; taking the point object as an example, the point object maintains references to two memory objects, namely a schema object and properties, and occupies memory spaces of two reference types respectively (different JVMs, such as 32/64 bit, occupy different memory spaces: or 4 bytes, or 8 bytes);
an alignment space of the sampling object; specifically, the number of bytes of the obtained object may be an alignment space introduced by 8-byte alignment;
a property of the sampling object; if the attributes of the sample object are loaded in the memory (i.e. properties reference points to a non-empty memory), the memory space calculation is required for each attribute, which specifically includes at least one of the following:
the header part of the attribute object in the memory (8/12/16 bytes according to 32/64 bit JVM and available memory size);
the ID of the attribute object in the memory is of the Long type, and occupies the memory space of the Long type;
calculating the size of the memory space of the key object for the key (key) of the attribute;
calculating the memory space size of the value object for the value (value) of the attribute;
extra alignment memory space introduced by byte number of the obtained attribute object according to 8 byte alignment;
wherein the sampling object comprises the sampling point object and the sampling edge object.
In some optional specific embodiments, the caching based on the maximum point object number and the maximum edge object number includes:
selecting a first point object with the largest degree;
acquiring and caching an edge object of the first point object, N layers of neighbor nodes of the first point object and edge objects of the N layers of neighbor nodes, wherein N is a positive integer; specifically, for a first point object, a corresponding edge and a certain number of layers of points can be obtained according to breadth-first traversal (BFS);
judging whether the cached point objects and edge objects reach the maximum point object number and the maximum edge object number;
if not, acquiring a second point object, an edge object of the second point object, M-layer neighbor nodes of the second point object and edge objects of the M-layer neighbor nodes, and caching; wherein M is a positive integer, and the second point object is a point object with the largest degree among point objects that are not yet cached, or a point object with the largest degree among all point objects except the first point object. Similarly, when the edge object of the second point object and the M-layer neighbor nodes of the second point object are obtained, the edge corresponding to the second point object and the points of a certain number of layers may be obtained according to breadth-first traversal (BFS).
Certainly, after caching the second point object, the M-layer neighbor nodes of the second point object, and the edge objects of the M-layer neighbor nodes, it still needs to determine whether the cached point object and edge object reach the maximum number of point objects and the maximum number of edge objects, and if not, obtain and cache a third point object, an edge object of the third point object, an L-layer neighbor node of the third point object, and an edge object of the L-layer neighbor node, where L is a positive integer, and the third point object is a point object with the maximum number of degrees among the point objects that are not cached, or a point object with the maximum number of degrees among all the point objects except the first point object and the second point object. And analogizing until the cached point objects and edge objects reach the maximum point object number and the maximum edge object number.
In addition, the value of N, M may be the same or different, and may be specifically selected according to actual needs, empirical values, and the like, and the number of remaining cacheable point objects and the number of edge objects may also be considered. For data objects that have been previously cached in the cache space, caching is not repeated.
Consider that in a particular graph instance, the more edges that a point joins the greater the number of degrees, the greater the likelihood of being traversed in an edge-based graph data network, and thus, inherently becomes potential "hot spot" data. For the situation, the embodiment of the present invention proposes an "approximate minimum allocation set", that is, points with the greatest number of degrees are selected, and from these points, several layers of neighbor nodes of the points are obtained in a layered manner to form an "approximate minimum allocation set". This "approximate minimum dominance set" approximates that the set formed by the edges with the least number of points is most likely to reach other points, edges (or traverse from other points, edges to points, edges in this set). I.e. the set of approximate minimum dominance can be used to determine relatively common point, edge objects.
In the embodiment of the invention, the degree of the point is comprehensively considered, the point and the edge object introduced by the approximate minimum dominating set are determined according to the degree of the point, and the cache preloading is carried out on the data of the approximate minimum dominating set when the graph data is started, namely the degree is taken as the loading standard of the hot data, so that the hit rate of potential cache data is improved when the cache preloading of the data is carried out, the preheating process of the cache data is accelerated, the preheating efficiency is improved, and the later data access process is accelerated.
In some optional specific embodiments, before performing caching based on the maximum point object number and the maximum edge object number, the method further includes:
when the point object and/or the side object are changed, adjusting the degree index of the point object related to the change according to the changed degree of the point object related to the change;
wherein the change includes addition, modification, and deletion.
In the embodiment of the present invention, the adding, modifying, and deleting operations of all the point or edge data may cause the change of the degree of the point (including the out degree and the in degree). Therefore, the quick ordered list of the adding degree of the point and the edge data is reversely indexed when the point and the edge data are added, modified and deleted. Specifically, a B + tree may be introduced to index the degree of a point object, that is, to introduce correspondence of "degree" → "point ID", and to arrange the degrees in order of increasing degree to decreasing degree. The degree index of the point object is changed and reordered corresponding to the degree change of the point object caused by adding, modifying and deleting operation of the point or edge data. Thus, in the above embodiment, the first point object, the second point object, and the third point object may be sequentially selected according to the degree index of the point object.
In some specific embodiments, after the caching based on the maximum point object number and the maximum edge object number, at least one of the following is further included:
if the first data object aimed at by each read and write operation is not cached, caching the first data object after the read and write operations;
if the second data object aimed by the read and write operations does not exist, identifying the second data object in the cache space to indicate that the second data object does not exist;
wherein the first data object and the second data object comprise a point object and an edge object.
In the embodiment of the invention, after the graph database is started and the cache preloading of the approximate minimum dominating set data is carried out, when the read-write operation is carried out on the appointed data object (namely the first data object), if the cache is not hit, the cache of the cache space is established for the object one by adopting the mapping mode from the ID to the memory object according to the ID of the data object; of course, if the data object targeted by the read-write operation already exists in the cache space, the data object targeted by the read-write operation is updated in the cache space according to the read-write operation, that is, the updated data object is only retained in the cache space.
If the second data object aimed at by the read and write operations does not exist in the whole graph database, the identification is carried out in the cache space, for example, null value is recorded, and the phenomenon that when the subsequent read-write operations are still aimed at the data object (the second data object), the subsequent read-write operations are not read in the cache space, and the disk is demagnetized to read again is avoided, so that the read-write times of the disk are reduced.
In the embodiment of the invention, each time of the read-write operation of the point or edge object passes through the cache, the cache does not hit when reading data, the corresponding object is loaded to the cache space, the loading and reading of the object support the atomic operation, and each time of the write operation updates the updated latest object data to the cache space, thereby maintaining the timeliness of the object in the cache space. In addition, the expiration time of the cached objects may be set to a long term.
In some specific embodiments, the point object and the edge object correspond to a hot data buffer sequence (which may also be referred to as a hot data sequence and a hot buffer sequence, and the sequence may also be referred to as a queue) and a cold data buffer sequence (which may also be referred to as a cold data sequence and a cold buffer sequence, and the sequence may also be referred to as a queue), respectively; that is, the point object corresponds to a hot data cache sequence and a cold data cache sequence, and the edge object corresponds to a hot data cache sequence and a cold data cache sequence;
the method further comprises at least one of:
adding the corresponding cold data cache sequence to the point object or the edge object which is just cached;
if the point object or the edge object in the cold data cache sequence is accessed, the corresponding hot data cache sequence is moved into; that is, for a point object, if an object in its cold data cache sequence is accessed, it is moved into the corresponding hot data cache sequence; for an edge object, if an object in a cold data cache sequence of the edge object is accessed, moving the edge object into the corresponding hot data cache sequence;
after the point objects or the edge objects in the hot data cache sequence are listed according to a first preset rule, moving the point objects or the edge objects into the corresponding cold data cache sequence; that is, for a point object, after the object in its hot data cache sequence is dequeued, it moves into the corresponding cold data cache sequence; for the edge object, after the object in the hot data cache sequence is listed, the edge object is moved into the corresponding cold data cache sequence;
after the point objects or the edge objects in the cold data cache sequence are listed according to a second preset rule, the point objects or the edge objects are moved out of the cache; that is, for a point object, the object in its cold data cache sequence is dequeued and then moved out of the cache; for the edge object, after the object in the cold data cache sequence is dequeued, the cache is shifted out;
for point objects in the cold data cache sequence, if an adjacent edge object in the hot data cache sequence exists, the position of the point object is moved forward in the corresponding cold data cache sequence;
and for the edge object in the cold data cache sequence, if the adjacent point object in the hot data cache sequence exists, the position is moved forward in the corresponding cold data cache sequence.
When the graph database is started, the hot data cache sequence and the cold data cache sequence can be empty, one of the hot data cache sequence and the cold data cache sequence can be filled, and the two data cache sequences can be filled. When the thermal data cache sequence of the point object is filled, the point object can be selected according to the order of the degree of the point from large to small for filling, the point with the maximum degree is placed at the front end of the sequence, and finally the point is listed; when the cold data cache sequence of the point objects is full, the point objects can be selected to be full according to the order of the degree of the points from small to large, the point with the minimum degree is placed at the rear end of the sequence and is firstly listed. When the hot data buffer sequence and the cold data buffer sequence of the edge object are filled, they may be selected according to the degree of their associated points.
Specifically, when the cache space is completely occupied, that is, the cache space already caches the point objects with the maximum number of point objects and/or the edge objects with the maximum number of edge objects, when a new object needs to be cached, data is added to the cache data according to a conventional LRU process. The adding process is to add the cache object into the cold data cache sequence, and to add the hot data cache sequence when the cache object in the cold data cache sequence is accessed again; the hot data cache sequence is arranged into the cold data cache sequence after being arranged according to the LRU, and the cold data cache sequence is arranged out of the cache according to the LRU;
in addition, the points and edges in the cold data queue may be scanned periodically at regular time intervals, and if the adjacent edges (or points) of the points (or edges) in the cold data queue are in the hot data sequence, the positions of the data objects in the sequence are moved forward, that is, the adjacent points and edges of the hot data have longer buffer survival periods.
In some optional embodiments, the first predetermined rule and the second predetermined rule are Least Recently Used (LRU).
That is, in the embodiment of the present invention, after the cache space is completely occupied, when a new object needs to be cached (for example, a data object targeted by a read/write operation and not yet cached), considering that a point or an edge connected to a recently accessed point or edge has a higher access possibility later, an LRU algorithm based on a point and edge connection relationship is introduced to eliminate an existing cache object in the cache space, and the space is released to add the new cache object.
Common buffer exchange algorithms also include first-in-first-out (FIFO), belidy, last-in-first-out (LIFO) or last-in-first-out (FILO), Most Recently Used (MRU), Random Replacement (RR), CLOCK algorithm (CLOCK), etc.
Accordingly, referring to fig. 2, an embodiment of the present invention provides a graph database caching device, including:
a sampling module 201, configured to obtain a plurality of sampled sample point objects and sampled edge objects;
the first calculation module 202 is configured to obtain memory space occupation of each sampling point object and each sampling edge object;
a second calculating module 203, configured to calculate an average memory space occupation of the sampling point objects and an average memory space occupation of the sampling edge objects;
a determining module 204, configured to determine, based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling edge objects, and the size of the cache space, the maximum number of point objects and the maximum number of edge objects that can be accommodated in the cache space;
a caching module 205, configured to perform caching based on the maximum number of point objects and the maximum number of edge objects.
In the embodiment of the invention, the point and side data in the graph database are sampled, the storage space (byte unit) of the sampled point and side object in the memory is calculated to obtain the actual memory space, and then the average memory space occupation required by the point and side object of the whole graph can be estimated according to the memory space occupation of the sampled object. Therefore, a quantifiable management mechanism for the size of the cache space of the graph database points and the data at the edges is provided, namely, the calculation of the object memory space is introduced in a quantification mode, so that the cache space is managed finely, the utilization rate of the cache space is improved, and the subsequent data access speed is further improved.
In some embodiments, the first calculation module 202 includes:
the statistical unit is used for acquiring the memory space occupation of at least one of the following sampling objects according to each sampling object, and summing up to obtain the memory space occupation of the sampling object:
the head part of the object of the sampled object in the memory;
the identification of the sampling object in the memory;
a reference to the sample object;
an alignment space of the sampling object;
a property of the sampling object;
wherein the sampling object comprises the sampling point object and the sampling edge object.
In some embodiments, the caching module 205 includes:
the first selecting unit is used for selecting a first point object with the largest degree;
the first cache unit is used for acquiring and caching the edge object of the first point object, N layers of neighbor nodes of the first point object and the edge object of the N layers of neighbor nodes, wherein N is a positive integer;
a judging unit, configured to judge whether the cached point objects and edge objects reach the maximum point object number or not;
a second cache unit, configured to, when the cached point objects and edge objects have not reached the maximum point object number or the maximum edge object number, obtain and cache a second point object, an edge object of the second point object, M-layer neighbor nodes of the second point object, and edge objects of the M-layer neighbor nodes; wherein M is a positive integer, and the second point object is a point object with the largest degree among point objects that are not yet cached, or a point object with the largest degree among all point objects except the first point object.
In some embodiments, the apparatus further comprises:
the adjusting module is used for adjusting the degree index of the point object related to the change according to the degree of the point object related to the change after the point object related to the change is changed;
wherein the change includes addition, modification, and deletion.
In some embodiments, the apparatus further comprises at least one of:
the cache updating module is used for caching the first data object after the read and write operations under the condition that the first data object aimed at by the read and write operations is not cached every time;
the identification module is used for identifying the cache space to indicate that the second data object does not exist under the condition that the second data object aimed by the read and write operations does not exist;
wherein the first data object and the second data object comprise a point object and an edge object.
In some embodiments, the point object and the edge object correspond to a hot data cache sequence and a cold data cache sequence, respectively;
the apparatus further comprises at least one of:
the first updating module is used for adding the point object or the edge object which is just cached into the corresponding cold data caching sequence;
the second updating module is used for moving the accessed point object or edge object in the cold data cache sequence into the corresponding hot data cache sequence;
a third updating module, configured to move the point object or the edge object in the hot data cache sequence into the corresponding cold data cache sequence after the point object or the edge object is listed according to a first preset rule;
the fourth updating module is used for shifting out the point objects or the edge objects in the cold data cache sequence from the cache after the point objects or the edge objects in the cold data cache sequence are listed according to a second preset rule;
a fifth updating module, configured to, in a case that a point object in the cold data cache sequence exists in an adjacent edge object of the hot data cache sequence, move forward a position in the corresponding cold data cache sequence;
and a sixth updating module, configured to, when an edge object in the cold data cache sequence exists in an adjacent point object in the hot data cache sequence, move forward a position in the corresponding cold data cache sequence.
In some specific embodiments, the first preset rule and the second preset rule are the least recently used rules.
The embodiment of the present invention and the embodiment of the method are device embodiments based on the same inventive concept, and specific technical details and corresponding advantageous effects may refer to the embodiment of the method.
An embodiment of the present invention further provides an electronic device, as shown in fig. 3, the electronic device may include a processor 31 and a memory 32, where the processor 31 and the memory 32 may be communicatively connected to each other through a bus or in another manner, and fig. 3 illustrates an example of a connection through a bus.
The processor 31 may be a Central Processing Unit (CPU). The Processor 31 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 32, which is a non-transitory computer-readable storage medium, may be used for storing non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the graph database caching method in the embodiment of the present invention (for example, the sampling module 201, the first calculation module 202, the second calculation module 203, the determination module 204, and the caching module 205 shown in fig. 2). The processor 31 executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory 32, that is, implements the method in the above-described method embodiments.
The memory 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 31, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 32 may optionally include memory located remotely from the processor 31, and these remote memories may be connected to the processor 31 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 32 and, when executed by the processor 31, perform the method of the embodiment shown in fig. 1.
The details of the electronic device may be understood with reference to the corresponding related description and effects in the embodiment shown in fig. 1, and are not described herein again.
Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the graph database caching method, and can achieve the same technical effect, and is not described herein again to avoid repetition.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (9)

1. A method for caching a graph database, the method comprising:
acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling;
acquiring memory space occupation of each sampling point object and each sampling edge object;
calculating the average memory space occupation of the sampling point objects and the average memory space occupation of the sampling edge objects;
determining the maximum point object number and the maximum side object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling side objects and the size of the cache space;
caching based on the maximum point object number and the maximum edge object number;
wherein the caching based on the maximum number of point objects and the maximum number of edge objects comprises:
selecting a first point object with the largest degree;
acquiring and caching an edge object of the first point object, N layers of neighbor nodes of the first point object and edge objects of the N layers of neighbor nodes, wherein N is a positive integer;
judging whether the cached point objects and edge objects reach the maximum point object number and the maximum edge object number;
if not, acquiring a second point object, an edge object of the second point object, M-layer neighbor nodes of the second point object and edge objects of the M-layer neighbor nodes, and caching; wherein M is a positive integer, and the second point object is a point object with the largest degree among point objects that are not yet cached, or a point object with the largest degree among all point objects except the first point object.
2. The method of claim 1, wherein the obtaining memory space usage of each of the sample point objects and the sample edge objects comprises:
for each sampling object, acquiring the memory space occupation of at least one of the following items, and summing up to obtain the memory space occupation of the sampling object:
the head part of the object of the sampled object in the memory;
the identification of the sampling object in the memory;
a reference to the sample object;
an alignment space of the sampling object;
a property of the sampling object;
wherein the sampling object comprises the sampling point object and the sampling edge object.
3. The method of claim 1, wherein prior to caching based on the maximum number of point objects and the maximum number of edge objects, further comprising:
when the point object and/or the side object are changed, adjusting the degree index of the point object related to the change according to the changed degree of the point object related to the change;
wherein the change includes addition, modification, and deletion.
4. The method of claim 1, wherein after the caching based on the maximum number of point objects and the maximum number of edge objects, further comprising at least one of:
if the first data object aimed at by each read and write operation is not cached, caching the first data object after the read and write operations;
if the second data object aimed by the read and write operations does not exist, identifying the second data object in the cache space to indicate that the second data object does not exist;
wherein the first data object and the second data object comprise a point object and an edge object.
5. The method of claim 1, wherein the point object and the edge object correspond to a hot data cache sequence and a cold data cache sequence, respectively;
the method further comprises at least one of:
adding the corresponding cold data cache sequence to the point object or the edge object which is just cached;
for a point object or an edge object in the cold data cache sequence, if accessed, moving into the corresponding hot data cache sequence;
after the point objects or the edge objects in the hot data cache sequence are listed according to a first preset rule, moving the point objects or the edge objects into the corresponding cold data cache sequence;
after the point objects or the edge objects in the cold data cache sequence are listed according to a second preset rule, the point objects or the edge objects are moved out of the cache;
for point objects in the cold data cache sequence, if an adjacent edge object in the hot data cache sequence exists, the position of the point object is moved forward in the corresponding cold data cache sequence;
and for the edge object in the cold data cache sequence, if the adjacent point object in the hot data cache sequence exists, the position is moved forward in the corresponding cold data cache sequence.
6. The method of claim 5, wherein the first predetermined rule and the second predetermined rule are least recently used.
7. A graph database caching apparatus, comprising:
the sampling module is used for acquiring a plurality of sampling point objects and sampling edge objects obtained by sampling;
the first calculation module is used for acquiring the memory space occupation of each sampling point object and each sampling edge object;
the second calculation module is used for calculating the average memory space occupation of the sampling point objects and the average memory space occupation of the sampling edge objects;
the determining module is used for determining the maximum point object number and the maximum edge object number which can be accommodated in the cache space based on the average memory space occupation of the sampling point objects, the average memory space occupation of the sampling edge objects and the size of the cache space;
the caching module is used for caching based on the maximum point object number and the maximum edge object number;
wherein, the cache module comprises:
the first selecting unit is used for selecting a first point object with the largest degree;
the first cache unit is used for acquiring and caching the edge object of the first point object, N layers of neighbor nodes of the first point object and the edge object of the N layers of neighbor nodes, wherein N is a positive integer;
a judging unit, configured to judge whether the cached point objects and edge objects reach the maximum point object number or not;
a second cache unit, configured to, when the cached point objects and edge objects have not reached the maximum point object number or the maximum edge object number, obtain and cache a second point object, an edge object of the second point object, M-layer neighbor nodes of the second point object, and edge objects of the M-layer neighbor nodes; wherein M is a positive integer, and the second point object is a point object with the largest degree among point objects that are not yet cached, or a point object with the largest degree among all point objects except the first point object.
8. An electronic device, comprising:
a memory and a processor communicatively coupled to each other, the memory for storing a computer program that, when executed by the processor, implements the graph database caching method of any one of claims 1 to 6.
9. A computer-readable storage medium for storing a computer program which, when executed by a processor, implements the graph database caching method according to any one of claims 1 to 6.
CN202210781029.2A 2022-07-05 2022-07-05 Graph database caching method and device, electronic equipment and readable storage medium Active CN114860783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210781029.2A CN114860783B (en) 2022-07-05 2022-07-05 Graph database caching method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210781029.2A CN114860783B (en) 2022-07-05 2022-07-05 Graph database caching method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114860783A CN114860783A (en) 2022-08-05
CN114860783B true CN114860783B (en) 2022-09-27

Family

ID=82625718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210781029.2A Active CN114860783B (en) 2022-07-05 2022-07-05 Graph database caching method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114860783B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778071A (en) * 2014-01-20 2014-05-07 华为技术有限公司 Cache space distribution method and device
CN112988078A (en) * 2021-04-27 2021-06-18 山东英信计算机技术有限公司 Management method and device for cache memory occupation in distributed storage application
CN114116952A (en) * 2021-09-29 2022-03-01 浙江大华技术股份有限公司 Graph data access method, electronic device and storage medium
WO2022062537A1 (en) * 2020-09-27 2022-03-31 苏州浪潮智能科技有限公司 Data compression method and apparatus, and computer-readable storage medium
CN114282071A (en) * 2021-12-29 2022-04-05 北京同心尚科技发展有限公司 Request processing method, device and equipment based on graph database and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778071A (en) * 2014-01-20 2014-05-07 华为技术有限公司 Cache space distribution method and device
WO2022062537A1 (en) * 2020-09-27 2022-03-31 苏州浪潮智能科技有限公司 Data compression method and apparatus, and computer-readable storage medium
CN112988078A (en) * 2021-04-27 2021-06-18 山东英信计算机技术有限公司 Management method and device for cache memory occupation in distributed storage application
CN114116952A (en) * 2021-09-29 2022-03-01 浙江大华技术股份有限公司 Graph data access method, electronic device and storage medium
CN114282071A (en) * 2021-12-29 2022-04-05 北京同心尚科技发展有限公司 Request processing method, device and equipment based on graph database and storage medium

Also Published As

Publication number Publication date
CN114860783A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
US10176057B2 (en) Multi-lock caches
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
CN107851123B (en) Internalizing expressions within virtual column units in memory to accelerate analytics queries
US8601216B2 (en) Method and system for removing cache blocks
CN110321325B (en) File index node searching method, terminal, server, system and storage medium
CN107491523B (en) Method and device for storing data object
US10007691B2 (en) Prioritizing repopulation of in-memory compression units
CN109240946A (en) The multi-level buffer method and terminal device of data
US9229869B1 (en) Multi-lock caches
CN107092701A (en) The data processing method and device of a kind of Multidimensional Data Model
CN110196847A (en) Data processing method and device, storage medium and electronic device
CN109634746B (en) Web cluster cache utilization system and optimization method
CN106155934B (en) Caching method based on repeated data under a kind of cloud environment
CN111913917A (en) File processing method, device, equipment and medium
US9336155B2 (en) Statistical cache promotion
CN110874360A (en) Ordered queue caching method and device based on fixed capacity
CN114860783B (en) Graph database caching method and device, electronic equipment and readable storage medium
CN117539835A (en) Distributed caching method and device for graph data
CN116680276A (en) Data tag storage management method, device, equipment and storage medium
US20170177476A1 (en) System and method for automated data organization in a storage system
CN112269947B (en) Caching method and device for space text data, electronic equipment and storage medium
Song et al. A Novel Hot-cold Data Identification Mechanism Based on Multidimensional Data
GB2614676A (en) Managing least-recently-used data cache with persistent body
CN106991060B (en) Elimination optimization method and device for read cache
CN111949439B (en) Database-based data file updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant