CN106921713A - A kind of resource caching method and device - Google Patents
A kind of resource caching method and device Download PDFInfo
- Publication number
- CN106921713A CN106921713A CN201510999566.4A CN201510999566A CN106921713A CN 106921713 A CN106921713 A CN 106921713A CN 201510999566 A CN201510999566 A CN 201510999566A CN 106921713 A CN106921713 A CN 106921713A
- Authority
- CN
- China
- Prior art keywords
- cache
- resource
- log
- cache log
- domain name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000014509 gene expression Effects 0.000 claims abstract description 40
- 238000012546 transfer Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008676 import Effects 0.000 description 2
- 210000001503 joint Anatomy 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of resource caching method and device, it is used to realize automatically generate caching rule according to log cache resource.The method includes:Obtain domain name to be analyzed;For any domain name to be analyzed, obtain and specify in the time period the corresponding log cache of the domain name to be analyzed;First kind key message is extracted from the corresponding any log cache of the domain name to be analyzed;Determine whether the log cache is that can optimize log cache according at least to first kind key message;If it is determined that the log cache in the corresponding regular expression of resource depth levels of the domain name field information input of the URL in the log cache to the URL, will then generate the caching rule of the log cache for that can optimize log cache.The method automatically generates correspondence caching rule according to the resource depth and designated domain name that can optimize log cache URL, and the regular expression for being generated is more targeted, and each cached parameters setting of new caching rule is also more reasonable, can effectively lift buffer efficiency.
Description
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a resource caching method and device.
Background
In the existing Cache direct path system, the resource is cached and updated by adding a new domain name manually and setting rules such as hot spot resource caching.
The existing resource caching and updating method is carried out according to newly added domain names and corresponding caching rules of a straight-path system, operators regularly provide Top N domain names, each domain name is tested and cached one by utilizing manpower, network elements and coarse granularity flow are mainly concerned, manual analysis is mainly used, the efficiency is low, more manpower is consumed, the period is too long, and the updating frequency of internet hot spot resources cannot be met.
The cache rule of the existing resource cache method can only write the corresponding cache rule according to a single resource URL (Uniform resource locator) and adjust each parameter in the cache rule, and the unification of the cache rule cannot be realized. Specifically, in the Cache straight-path system, a general caching rule is set for all resources, if a certain resource has no special rule, the general rule is automatically matched for caching, all parameters of the general rule are experience values, and the problems that a regular expression is written more widely, suffixes are excessive and the like exist.
In summary, the prior art has the defects of low efficiency and low update frequency in resource caching and updating depending on human power, and a method for automatically generating a caching rule according to statistical data is urgently needed.
Disclosure of Invention
The embodiment of the invention provides a resource caching method and device, which are used for automatically generating caching rules according to caching log resources.
The embodiment of the invention provides a resource caching method, which comprises the following steps:
acquiring a domain name to be analyzed;
aiming at any domain name to be analyzed, obtaining a cache log corresponding to the domain name to be analyzed in a specified time period; and are
Extracting first-class key information from any cache log corresponding to the domain name to be analyzed;
determining whether the cache log is an optimized cache log or not according to at least the first type of key information;
if the cache log is determined to be the optimized cache log, inputting the domain name field information of the URL in the cache log into a regular expression corresponding to the resource depth level of the URL to generate a cache rule of the cache log;
the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
An embodiment of the present invention provides a resource caching apparatus, including:
the first acquisition unit is used for acquiring the domain name to be analyzed;
the second acquisition unit is used for acquiring a cache log corresponding to any domain name to be analyzed in a specified time period;
the resource analysis unit is used for extracting first-class key information from any cache log corresponding to the domain name to be analyzed; determining whether the cache log is an optimized cache log or not at least according to the first type of key information;
the rule generating unit is used for inputting the domain name field information of the URL in the cache log into a regular expression corresponding to the resource depth level of the URL to generate a cache rule of the cache log if the cache log is determined to be the optimized cache log;
the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
In the above embodiment, the cache log corresponding to the domain name to be analyzed in the specified time period is obtained, first-type key information, such as hit identifier, HTTP status code, resource size, resource URL, data of whether to match the universal cache rule, and the like, is extracted from any cache log corresponding to the domain name to be analyzed, and whether the cache log is an optimizable cache log is determined at least according to the first-type key information; further automatically generating a corresponding caching rule according to the resource depth of the optimized caching log URL and the specified domain name; the specific domain name is analyzed, the regular expression paradigms generated according to different resource URL depths are more pertinent, the setting of each cache parameter is more reasonable, the matching time length of the domain name and the optimized cache rule can be shortened when caching is carried out according to the optimized cache rule subsequently, and the cache efficiency can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of a resource caching method according to an embodiment of the present invention;
fig. 2 is a flowchart of automatically generating an optimized cache rule according to an embodiment of the present invention;
fig. 3 is a flowchart of a resource caching method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a resource caching apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the technical problems of low efficiency and low update frequency in resource caching and updating depending on manpower in the prior art, an embodiment of the present invention provides a resource caching method as shown in fig. 1, which is used for automatically generating a caching rule according to a caching log resource, and the specific process includes:
step 101, acquiring a domain name to be analyzed;
102, acquiring a cache log corresponding to any domain name to be analyzed in a specified time period for the domain name to be analyzed;
103, extracting first-class key information from any cache log corresponding to the domain name to be analyzed;
step 104, determining whether the cache log is an optimized cache log at least according to the first type of key information;
step 105, if the cache log is determined to be an optimized cache log, inputting domain name field information of the URL in the cache log into a regular expression corresponding to the resource depth level of the URL, and generating a cache rule of the cache log; the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
In the embodiment of the invention, the Cache optimization tool with a resource analysis functional module is adopted to execute the method flow to realize the statistics and analysis of the acquired Cache log resources and the automatic generation of the Cache rule. And the Cache optimization tool is butted with the current network Cache server, so that the Cache log to be analyzed can be obtained, and the utilization rate of the Cache log in the system is further improved. By utilizing the Cache optimization tool, an operator can independently realize the operation of a Cache system, and the working efficiency of Cache optimization is greatly improved.
Specifically, in the above method flow step 101, the domain name to be analyzed may be obtained through a domain name import interface, and the domain name import interface may support a Top100 resource list of the cache system and a manually imported resource list to be analyzed, so that the imported domain name data source may be a CSS-WEB log (Cascading style sheet), or may be a manually imported domain name resource list. The domain name data source in the embodiment of the invention is a cacheable domain name which is optimized by a Cache optimization tool and is online.
In order to judge whether links of various resource types corresponding to the domain name to be analyzed need to be optimized and the optimization space of the links, firstly, a cache log corresponding to any domain name to be analyzed needs to be acquired aiming at any acquired domain name to be analyzed, and secondly, the cache log corresponding to the same acquired domain name to be analyzed is analyzed; thirdly, judging whether the cache resources corresponding to the domain name to be analyzed need to be optimized or not according to the analysis result.
Preferably, the specified interface of the Cache server is an interface after the Cache optimization tool is docked with the existing network Cache server, and is used for reading the Cache log corresponding to the domain name to be analyzed in a specified time period. The specified interface is packaged to provide a function of reading the log in the specified time, the interface can read an access log (access log) recorded by a cache server or under a log directory in a time period from the input starting time to the input ending time, and copy the read cache log to the local, and the cache log records information of all users who pass through the straight path and make uplink requests.
According to the embodiment of the invention, the obtained cache log is automatically analyzed by adopting the functional module with the resource analysis function. Specifically, in step 103, analyzing the obtained cache log corresponding to the same domain name to be analyzed, and extracting first-type key information from any cache log corresponding to the domain name to be analyzed, where the extracted first key information at least includes an HTTP status code, and may further include: hit identification, resource size, resource URL, etc.
The HIT identifier comprises a TCP _ HIT and a TCP _ MISS, the TCP _ HIT means a HIT Cache server Cache resource, and the TCP _ MISS means a MISS Cache server Cache resource.
If the fragment field of the analyzed cache log contains matching identification information matched with the universal cache rule, determining that the structure of the cache log is matched with the universal cache rule; and if the analyzed cache log does not contain the matching identification information matched with the general cache rule, determining that the structure of the cache log is not matched with the general cache rule.
The embodiment of the present invention provides an optional manner for determining whether the cache log is an optimizable cache log according to the first type of key information, which is specifically: if the HTTP (hypertext transfer Protocol) status code in the first type of key information is the specified HTTP status code and the first type of key information does not include the matching identification information that matches the cache log with the common cache rule, determining the cache log as the optimizable cache log.
For example, as the parsing results of several cache logs presented in table 1, among the parsing results of the cache logs numbered 4 and 5, the HTTP status code 200 or 206 is a designated HTTP status code, and it is parsed that only the configurations of the cache logs numbered 4 and 5 do not match the general cache rule, and therefore, the cache logs numbered 4 and 5 are determined as optimizable cache logs.
TABLE 1
ID | HTTP status | Hit identification | Whether to match a rule |
1 | 200or 206 | HIT | Matching |
2 | 200or 206 | MISS | Matching |
3 | Others | Others | Matching |
4 | 200or 206 | HIT | Unmatched |
5 | 200or 206 | MISS | Unmatched |
6 | Others | Others | Unmatched |
In the embodiment of the present invention, it is certainly not limited to determining whether the cache log is the optimizable cache log only according to the HTTP status code and the information whether the cache log matches the general cache rule, and whether the cache log is the optimizable cache log may also be determined by combining the information such as the hit identifier, the resource size, the resource suffix identifier, and the URL domain name information with the HTTP status code and the information whether the cache log matches the general cache rule as needed.
And according to the method flow, taking the optimizable cache logs in all the cache logs corresponding to the domain name to be analyzed as optimizable cache log resources, and generating an optimized cache rule for each optimizable cache log resource. Specifically, the embodiment of the present invention is implemented by a Cache rule automatic generation component in a Cache optimization tool, and the step of automatically generating an optimized Cache rule by the Cache rule automatic generation component according to input data includes, as shown in fig. 2:
step 201, obtaining a URL contained in an optimizable cache log; for example, the URL is:
“http://p1.meituan.net/200.120/deal/69fd3838a512e78e1b5bd30774d3efcd195473.jpg”。
step 202, analyzing the structure of the URL to obtain the domain name field content of the URL; the specific implementation can intercept the domain name field content recorded in the cache log through a domain name analysis tool.
According to the URL configuration in step 201, the domain name field content is "p 1. meitan. net". Only "p 1. mean. net" in URL is used as the specific domain name for generating the caching rule, so that other URLs, such as "http:// p1. mean. net/230.126/utop/1321 jjasdfasdfasdfasdfasdfasdf. png", may also generate the caching log matching the new caching rule if the other URLs also include "p 1. mean. net". Compared with the prior art, the method has the advantages that the specific domain name part of the URL is input into the regular expression to generate a new cache rule, and the matching efficiency of any domain name and the new cache rule in the using process of the new cache rule can be improved.
Step 203, acquiring the resource depth of the URL;
generally, the value of the resource depth is represented by an inner number { }, the data range is [0-15], and the specific implementation can adopt a tool with a resource depth grabbing function to acquire the resource depth of the URL.
Step 204, calling a regular expression corresponding to the resource depth level of the URL, inputting the domain name field content of the URL and the value of the resource depth of the URL, and outputting the caching rule of the caching log.
The regular expression has certain existing rules, an automation program is compiled according to the regular expression basic rules, and according to the specific constitution of the target URL, the domain name field content of the URL and the value of the resource depth of the URL are input, so that a caching rule paradigm which can be matched with the URL can be automatically generated. The caching rules automatically generated by an automation tool are utilized to analyze the specific domain name according to the specific constitution of the URL, the domain name field is extracted, the regular expressions respectively generated according to the resource URL depth have higher pertinence, the efficiency is higher when the rules are matched, the setting of each caching parameter is more reasonable, and the caching efficiency can be effectively improved.
For example, the domain name field content of the cache log URL is: http:// res.kfc.com.cn;
if the resource depth value of the URL is 3, the output caching rule is:
[policy-res]
matchurl regex http://[^/]*res\.kfc.com.cn(?:/[^/\?]+){3}(?<file>.+)/?
cache_always=yes
cache_delay=1
cache_index=res.kfc.com.cn/$file
cache_never=no
cache_ttl=1209600
if the resource depth value of the URL is 4, the output caching rule is:
[policy-res]
matchurl regex http://[^/]*res\.kfc.com.cn(?:/[^/\?]+){4}(?<file>.+)/?
cache_always=yes
cache_delay=1
cache_index=res.kfc.com.cn/$file
cache_never=no
cache_ttl=1209600
wherein, the caching parameters in the caching rule include:
cache _ always: forced caching, after setting, performing forced caching no matter whether the file header allows caching or not;
cache _ delay: the hot spot threshold value, namely after the user request exceeds the threshold value for times, listing the request as hot spot content and caching the hot spot content by utilizing a Cache server;
cache _ index: caching indexes, and improving the readability and the marking property of a caching rule; the specific content of the cache index is related to the content of the domain field of the URL, and in the above example, the content of the domain field of the URL is: "http:// res.kfc.com.cn", and "res.kfc.com.cn/$ file" as the cache index, thus making the cache rule more clear.
cache _ new: forbidding caching, and permanently not caching the resource after setting yes;
cache _ ttl: file expiration time (unit: s), beyond which the file stops providing service.
In the above embodiment, the resource analysis module may perform statistical analysis on data such as a hit identifier, an HTTP status code, a resource size, a resource URL, a hit rule, and the like according to an operation condition of an online domain name; according to the resource depth and the resource type, the system automatically generates a corresponding cache rule, and can compare the domain name hit conditions before and after the new cache rule is applied by using 'clone analysis'. The specific domain name is analyzed, regular expressions respectively generated according to different resource URL depths are more pertinent, the efficiency is higher when the rules are matched, the setting of each caching parameter is more reasonable, and the caching efficiency can be effectively improved. The Cache optimization tool with the resource analysis functional module is in butt joint with the existing network Cache server, utilization rate of Cache logs in the system is improved, meanwhile, statistics and analysis of appointed URLs through Cache log resources can be achieved, Cache rules can be automatically generated, and Cache optimization effect comparison functions can be achieved.
In order to make the presentation of the output Cache rule clearer and provide a selectable optimization scheme for an operator, the Cache optimization tool with the resource analysis functional module may further classify the optimizable Cache log, that is, after determining that the Cache log is the optimizable Cache log, the method further includes:
and extracting second type key information from the cache log, and dividing the category of the cache log according to the second type key information.
Specifically, the second type of key information includes resource suffix identification information and resource size information, and the category to which the cache log belongs may be determined according to the resource suffix identification information and/or the resource size information recorded in the cache log.
For example, when the optimizable cache log is divided by the size of the resource suffix of the cache log, the identification type of the resource suffix large file may be set to wmv, asf, asx, mpg, mpeg, mlv, m2v, etc., and the identification type of the resource suffix small file may be set to wmp, cif, gif, jpg, jpeg, bmp, pcx, etc. And comparing the resource suffix identification information of the cache log with the setting, classifying the cache log into the category of the resource suffix large file if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix large file, and classifying the cache log into the category of the resource suffix small file if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix small file.
For example, when partitioning the optimizable cache log by the resource size of the cache log, the threshold range for large resource files may be set to (1024, 3145728) KB, and the threshold range for small resource files may be set to (0, 1024) KB. If the resource size information of the cache log meets the threshold range of the large resource file, classifying the cache log into the category of the large resource file; if the resource size information of the cache log meets the threshold range of the small resource file, classifying the cache log into the category of the small resource file.
For example, the cache log may be optimized as:
“23Sep2015:085211.418611 0.000366 0.000413 TCP_HIT 200144150 GET183.*.*.181
http://i3.itc.cn/20150909/340e_4227c137_0f65_bd6b_f501_e2cc10220cab_1.jpg”
the URL corresponding to the cache log is:
"http:// i3.itc. cn/20150909/340e _4227c137_0f65_ bd6b _ f501_ e2cc10220cab _1. jpg"; the domain name field information of the URL is "i 3.itc. cn"; the resource size information of the cache log is as follows: "14415"; the resource suffix identification information of the cache log is ". jpg".
In an optional implementation manner of the embodiment of the present invention, according to the classification method, all the optimizable cache logs of the domain name to be analyzed may be classified, and for the cache logs in each category, an optimized cache rule is generated for the cache logs according to domain name field information of a URL of any cache log in each category and a regular expression corresponding to a resource depth level of the URL, and a correspondence between the cache logs and the optimized cache rule is stored in the category.
In an optional implementation manner of the embodiment of the present invention, according to the classification method, domain name field information of a URL in any cache log in a category of a resource suffix file and/or a large resource file may be input into a regular expression corresponding to a resource depth level of the URL, a cache rule of the cache log is generated, and the generated cache rule is stored in the category.
For example, an embodiment of the present invention provides a Cache optimization tool, which at least includes an input interface module, a resource analysis processing module, and a Cache rule automatic generation component, where steps of the Cache optimization tool for executing the above method flow are shown in fig. 3, and include:
step 301, calling an input interface module to acquire an analyzable domain name of a CSS-WEB Top100 domain name resource list or a manually imported domain name resource list;
step 302, calling an interface between a Cache optimization tool and a Cache server to obtain a Cache log capable of analyzing a domain name in a selected time period;
step 303, calling a resource analysis processing module to analyze the cache log of the analyzable domain name;
wherein analyzing the cache log of any analyzable domain name comprises:
determining an optimizable cache log based on resolution information of any cache log of the analyzable domain name; for example, the cache log in which the parsed HTTP status code is the designated HTTP status code and the matching identifier matching the general cache rule is not parsed is determined as the optimizable cache log.
Classifying the optimizable cache log based on other resolution information of any cache log of the analyzable domain name; for example, the optimizable cache logs are classified according to the analyzed resource suffix identification information and the analyzed resource size information, and the cache logs are divided into four categories of resource suffix large files, resource suffix small files, resource large files and resource small files;
step 304, invoking a cache rule automatic generation component, executing the steps 201 to 204 aiming at each cache log under the selected resource category, and outputting an optimized cache rule of each cache log under the selected resource category;
in this step, the cache rule can be optimized only for the resource category concerned by the operator according to the requirement, for example, the cache rule is optimized for the cache log under the category of the resource suffix large file or the resource large file;
305, applying the optimized caching rule output by the automatic caching rule generating component to the current network;
specifically, the cache server stores a corresponding relationship in which a URL domain name is used as an index and an optimized cache rule model corresponding to the URL is used as index content.
Further, in an optional implementation manner of the present invention, in order to present a result of analyzing the cache resource, the analyzing, by the step 303, the cache log of any analyzable domain name further includes:
resource analysis and data statistics are performed on the optimizable cache logs in each category, such as total traffic for domain name hits, total analyzable traffic, domain name hit rates, and analyzable traffic fraction.
Wherein, the number of the optimized cache logs with HIT identifiers (TCP _ HIT) in each category is counted to obtain the total flow of HITs in each category. And calculating the hit rate of each category according to the total flow of the domain name to be analyzed counted before classification and the total flow hit by each category. The total flow of the domain name to be analyzed counted before classification refers to the superposition value of the resource size values of all the cache logs corresponding to the domain name to be analyzed.
And superposing the resource size values of the optimized cache logs in each category to obtain the total analyzable traffic in the category. And calculating the analyzable flow ratio of each category according to the total flow of the domain name to be analyzed counted before classification and the analyzable total flow in each category.
Further, in an optional embodiment of the present invention, in order to clearly present an optimization effect on a Cache rule of an optimized Cache log, the Cache optimization tool further includes a clone acquisition module, configured to execute the step 302 and acquire the Cache log of the same domain name within a period of time after the optimized Cache rule is online.
According to step 303 and step 304, resource analysis and data statistics are performed on the newly acquired cache logs, and the statistical data of the cache logs of the same category are compared with the statistical data before online and presented, for example, the domain name hit rate of the resource suffix big file category with the domain name of "ykimg.com" is compared, so that the domain name hit rate of the domain name resource suffix big file category before cache rule optimization is 10%, and the domain name hit rate of the domain name resource suffix big file category after cache rule optimization can reach 70%.
In the above embodiment, the resource analysis processing module may perform statistical analysis on data such as a hit identifier, an HTTP status code, a resource size, a resource URL, a hit rule, and the like according to an operation condition of an online domain name; the cache rule automatic generation component automatically generates a corresponding cache rule according to the resource depth of the optimized cache log URL and the specified domain name; the specific domain name is analyzed, the regular expression paradigms generated according to different resource URL depths are more pertinent, the setting of each cache parameter is more reasonable, the matching time length of the domain name and the optimized cache rule can be shortened when caching is carried out according to the optimized cache rule subsequently, and the cache efficiency can be effectively improved.
By comparing the domain name hit conditions before and after the application of the new caching rule, more resource statistical information is provided for the operator.
The Cache optimization tool of the embodiment of the invention is in butt joint with the existing network Cache server, improves the utilization rate of Cache logs in the system, and simultaneously can realize the functions of counting and analyzing the appointed URL by utilizing Cache log resources, automatically generating Cache rules and comparing Cache optimization effects.
For the above method flow, the embodiments of the present invention further provide a resource caching apparatus, and the specific contents of these apparatuses refer to the above method flow, which is not described here again.
A resource caching apparatus as shown in fig. 4, includes:
a first obtaining unit 401, configured to obtain a domain name to be analyzed;
a second obtaining unit 402, configured to obtain, for any domain name to be analyzed, a cache log corresponding to the domain name to be analyzed in a specified time period; and are
A resource analysis unit 403, configured to extract first-class key information from any cache log corresponding to the domain name to be analyzed; determining whether the cache log is an optimized cache log or not at least according to the first type of key information;
a rule generating unit 404, configured to, if it is determined that the cache log is an optimizable cache log, input domain name field information of a URL in the cache log into a regular expression corresponding to the resource depth level of the URL, and generate a cache rule of the cache log;
the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
The first obtaining unit 401 has the same function as the input interface in the above embodiment; the second obtaining unit 402 has the same function as the designated interface of the cache server in the above embodiment; the resource analysis unit 403 has the same function as the resource analysis processing module in the above embodiment; the rule generating unit 404 has the same function as the cache rule automatic generating component in the above-described embodiment.
Further, the second obtaining unit 402 is specifically configured to:
and reading the cache log corresponding to the domain name to be analyzed in a specified time period through a specified interface of the cache server.
Further, the first key information at least comprises an HTTP status code;
the resource analysis unit 403 is specifically configured to:
and if the HTTP status code in the first type of key information is the designated HTTP status code and the first type of key information does not include the matching identification information of the cache log matched with the general cache rule, determining the cache log as an optimized cache log.
Further, the resource analysis unit 403 is further configured to:
if the cache log is determined to be the optimized cache log, extracting second-type key information from the cache log, wherein the second-type key information at least comprises resource suffix identification information and resource size information;
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix large file, classifying the cache log into the category of the resource suffix large file; or,
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix small file, classifying the cache log into the category of the resource suffix small file; or,
if the resource size information of the cache log meets the threshold range of the large resource file, classifying the cache log into the category of the large resource file; or,
if the resource size information of the cache log meets the threshold range of the small resource file, classifying the cache log into the category of the small resource file;
the rule generating unit 404 is specifically configured to:
inputting the domain name field information of the URL in any cache log in each category into the regular expression corresponding to the resource depth level of the URL, generating the cache rule of the cache log, and storing the cache rule in the category.
Further, the rule generating unit 404 is specifically configured to:
inputting the domain name field information of the URL in any cache log in the category of the resource postfix arrival file and/or the large resource file into a regular expression corresponding to the resource depth level of the URL, generating the cache rule of the cache log, and storing the cache rule in the category.
In the above embodiment, the cache log corresponding to the domain name to be analyzed in the specified time period is obtained, first-type key information, such as hit identifier, HTTP status code, resource size, resource URL, data of whether to match the universal cache rule, and the like, is extracted from any cache log corresponding to the domain name to be analyzed, and whether the cache log is an optimizable cache log is determined at least according to the first-type key information; further automatically generating a corresponding caching rule according to the resource depth of the optimized caching log URL and the specified domain name; the specific domain name is analyzed, the regular expression paradigms generated according to different resource URL depths are more pertinent, the setting of each cache parameter is more reasonable, the matching time length of the domain name and the optimized cache rule can be shortened when caching is carried out according to the optimized cache rule subsequently, and the cache efficiency can be effectively improved.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A resource caching method, comprising:
acquiring a domain name to be analyzed;
aiming at any domain name to be analyzed, obtaining a cache log corresponding to the domain name to be analyzed in a specified time period; and are
Extracting first-class key information from any cache log corresponding to the domain name to be analyzed;
determining whether the cache log is an optimized cache log or not according to at least the first type of key information;
if the cache log is determined to be the optimized cache log, inputting domain name field information of a Uniform Resource Locator (URL) in the cache log into a regular expression corresponding to the resource depth level of the URL to generate a cache rule of the cache log;
the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
2. The method of claim 1,
the method for acquiring the cache log corresponding to any domain name to be analyzed in the specified time period includes:
and reading the cache log corresponding to the domain name to be analyzed in a specified time period through a specified interface of the cache server.
3. The method of claim 1, wherein the first key information comprises at least a hypertext transfer protocol (HTTP) status code;
the determining whether the cache log is an optimizable cache log resource at least according to the first type of key information includes:
and if the HTTP status code in the first type of key information is the designated HTTP status code and the first type of key information does not include the matching identification information of the cache log matched with the general cache rule, determining the cache log as an optimized cache log.
4. The method of claim 1, wherein if the cache log is determined to be an optimizable cache log, further comprising:
extracting second-type key information from the cache log, wherein the second-type key information at least comprises resource suffix identification information and resource size information;
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix large file, classifying the cache log into the category of the resource suffix large file; or,
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix small file, classifying the cache log into the category of the resource suffix small file; or,
if the resource size information of the cache log meets the threshold range of the large resource file, classifying the cache log into the category of the large resource file; or,
if the resource size information of the cache log meets the threshold range of the small resource file, classifying the cache log into the category of the small resource file;
then, the inputting the domain name field information of the URL in the cache log into the regular expression corresponding to the resource depth level of the URL, and generating the cache rule of the cache log includes:
inputting the domain name field information of the URL in any cache log in each category into the regular expression corresponding to the resource depth level of the URL, generating the cache rule of the cache log, and storing the cache rule in the category.
5. The method of claim 4, wherein the inputting the domain name field information of the URL in the cache log into the regular expression corresponding to the resource depth level of the URL to generate the cache rule of the cache log comprises:
and inputting the domain name field information of the URL in any cache log in the resource postfix arrival file and/or the category of the large resource file into a regular expression corresponding to the resource depth level of the URL, generating a cache rule of the cache log, and storing the cache rule in the category.
6. A resource caching apparatus, comprising:
the first acquisition unit is used for acquiring the domain name to be analyzed;
the second acquisition unit is used for acquiring a cache log corresponding to any domain name to be analyzed in a specified time period; and are
The resource analysis unit is used for extracting first-class key information from any cache log corresponding to the domain name to be analyzed; determining whether the cache log is an optimized cache log or not at least according to the first type of key information;
the rule generating unit is used for inputting the domain name field information of the uniform resource locator URL in the cache log into a regular expression corresponding to the resource depth level of the URL to generate a cache rule of the cache log if the cache log is determined to be the optimized cache log;
the regular expression corresponding to the resource depth level of the URL is a regular expression which is written in advance at least according to the resource depth level of the URL and the cache parameter corresponding to the resource depth level.
7. The apparatus of claim 6,
the second obtaining unit is specifically configured to:
and reading the cache log corresponding to the domain name to be analyzed in a specified time period through a specified interface of the cache server.
8. The apparatus of claim 6, in which the first critical information comprises at least an HTTP status code;
the resource analysis unit is specifically configured to:
and if the hypertext transfer protocol (HTTP) status code in the first type of key information is a designated HTTP status code and the first type of key information does not include matching identification information matching the cache log with the general cache rule, determining the cache log as an optimized cache log.
9. The apparatus of claim 6, wherein the resource analysis unit is further to:
if the cache log is determined to be the optimized cache log, extracting second-type key information from the cache log, wherein the second-type key information at least comprises resource suffix identification information and resource size information;
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix large file, classifying the cache log into the category of the resource suffix large file; or,
if the resource suffix identification information of the cache log belongs to the identification type of the resource suffix small file, classifying the cache log into the category of the resource suffix small file; or,
if the resource size information of the cache log meets the threshold range of the large resource file, classifying the cache log into the category of the large resource file; or,
if the resource size information of the cache log meets the threshold range of the small resource file, classifying the cache log into the category of the small resource file;
the rule generating unit is specifically configured to:
inputting the domain name field information of the URL in any cache log in each category into the regular expression corresponding to the resource depth level of the URL, generating the cache rule of the cache log, and storing the cache rule in the category.
10. The apparatus of claim 9, wherein the rule generation unit is specifically configured to:
and inputting the domain name field information of the URL in any cache log in the resource postfix arrival file and/or the category of the large resource file into a regular expression corresponding to the resource depth level of the URL, generating a cache rule of the cache log, and storing the cache rule in the category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510999566.4A CN106921713B (en) | 2015-12-25 | 2015-12-25 | Resource caching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510999566.4A CN106921713B (en) | 2015-12-25 | 2015-12-25 | Resource caching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106921713A true CN106921713A (en) | 2017-07-04 |
CN106921713B CN106921713B (en) | 2019-12-06 |
Family
ID=59456083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510999566.4A Active CN106921713B (en) | 2015-12-25 | 2015-12-25 | Resource caching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106921713B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819837A (en) * | 2017-10-31 | 2018-03-20 | 南京优速网络科技有限公司 | A kind of method and log cache analysis system for lifting buffer service quality |
CN108704312A (en) * | 2018-04-26 | 2018-10-26 | 网易(杭州)网络有限公司 | The test method and device of fine arts resource |
CN109145220A (en) * | 2018-09-10 | 2019-01-04 | 北京知道创宇信息技术有限公司 | Data processing method, device and electronic equipment |
CN109586937A (en) * | 2017-09-28 | 2019-04-05 | 中兴通讯股份有限公司 | A kind of O&M method, equipment and the storage medium of caching system |
CN110020249A (en) * | 2017-12-28 | 2019-07-16 | 中国移动通信集团山东有限公司 | A kind of caching method, device and the electronic equipment of URL resource |
CN110401553A (en) * | 2018-04-25 | 2019-11-01 | 阿里巴巴集团控股有限公司 | The method and apparatus of server configuration |
CN110677270A (en) * | 2018-07-03 | 2020-01-10 | 长春亿阳计算机开发有限公司 | Domain name cacheability analysis method and system |
WO2022152086A1 (en) * | 2021-01-15 | 2022-07-21 | 华为云计算技术有限公司 | Data caching method and apparatus, and device and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103825919A (en) * | 2012-11-16 | 2014-05-28 | 中国移动通信集团北京有限公司 | Method, device and system for data resource caching |
CN104010010A (en) * | 2013-02-25 | 2014-08-27 | 中国移动通信集团北京有限公司 | Internet resource acquisition method, device and cache system |
CN104079534A (en) * | 2013-03-27 | 2014-10-01 | 中国移动通信集团北京有限公司 | Method and system of implementing HTTP (Hyper Text Transport Protocol) cache |
CN104111900A (en) * | 2013-04-22 | 2014-10-22 | 中国移动通信集团公司 | Method and device for replacing data in cache |
CN104426838A (en) * | 2013-08-20 | 2015-03-18 | 中国移动通信集团北京有限公司 | Internet cache scheduling method and system |
-
2015
- 2015-12-25 CN CN201510999566.4A patent/CN106921713B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103825919A (en) * | 2012-11-16 | 2014-05-28 | 中国移动通信集团北京有限公司 | Method, device and system for data resource caching |
CN104010010A (en) * | 2013-02-25 | 2014-08-27 | 中国移动通信集团北京有限公司 | Internet resource acquisition method, device and cache system |
CN104079534A (en) * | 2013-03-27 | 2014-10-01 | 中国移动通信集团北京有限公司 | Method and system of implementing HTTP (Hyper Text Transport Protocol) cache |
CN104111900A (en) * | 2013-04-22 | 2014-10-22 | 中国移动通信集团公司 | Method and device for replacing data in cache |
CN104426838A (en) * | 2013-08-20 | 2015-03-18 | 中国移动通信集团北京有限公司 | Internet cache scheduling method and system |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109586937A (en) * | 2017-09-28 | 2019-04-05 | 中兴通讯股份有限公司 | A kind of O&M method, equipment and the storage medium of caching system |
CN107819837A (en) * | 2017-10-31 | 2018-03-20 | 南京优速网络科技有限公司 | A kind of method and log cache analysis system for lifting buffer service quality |
CN110020249A (en) * | 2017-12-28 | 2019-07-16 | 中国移动通信集团山东有限公司 | A kind of caching method, device and the electronic equipment of URL resource |
CN110020249B (en) * | 2017-12-28 | 2021-11-30 | 中国移动通信集团山东有限公司 | URL resource caching method and device and electronic equipment |
CN110401553B (en) * | 2018-04-25 | 2022-06-03 | 阿里巴巴集团控股有限公司 | Server configuration method and device |
CN110401553A (en) * | 2018-04-25 | 2019-11-01 | 阿里巴巴集团控股有限公司 | The method and apparatus of server configuration |
US11431669B2 (en) | 2018-04-25 | 2022-08-30 | Alibaba Group Holding Limited | Server configuration method and apparatus |
CN108704312A (en) * | 2018-04-26 | 2018-10-26 | 网易(杭州)网络有限公司 | The test method and device of fine arts resource |
CN110677270A (en) * | 2018-07-03 | 2020-01-10 | 长春亿阳计算机开发有限公司 | Domain name cacheability analysis method and system |
CN110677270B (en) * | 2018-07-03 | 2023-02-28 | 长春亿阳计算机开发有限公司 | Domain name cacheability analysis method and system |
CN109145220A (en) * | 2018-09-10 | 2019-01-04 | 北京知道创宇信息技术有限公司 | Data processing method, device and electronic equipment |
CN109145220B (en) * | 2018-09-10 | 2022-03-29 | 北京知道创宇信息技术股份有限公司 | Data processing method and device and electronic equipment |
WO2022152086A1 (en) * | 2021-01-15 | 2022-07-21 | 华为云计算技术有限公司 | Data caching method and apparatus, and device and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106921713B (en) | 2019-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106921713B (en) | Resource caching method and device | |
US10812358B2 (en) | Performance-based content delivery | |
US11196820B2 (en) | System and method for main page identification in web decoding | |
US10362050B2 (en) | System and methods for scalably identifying and characterizing structural differences between document object models | |
CN102819591B (en) | A kind of content-based Web page classification method and system | |
US8255273B2 (en) | Evaluating online marketing efficiency | |
CN104283933B (en) | Method, client and the system of downloading data | |
CN105049287A (en) | Log processing method and log processing devices | |
CN112260990B (en) | Method and device for safely accessing intranet application | |
CN111131070B (en) | Port time sequence-based network traffic classification method and device and storage medium | |
CN102752154A (en) | Detecting method of dead link of Web site | |
CN109241733A (en) | Crawler Activity recognition method and device based on web access log | |
CN104298782B (en) | Internet user actively accesses the analysis method of action trail | |
CN105959358A (en) | CDN server and method of CDN server of caching data | |
CN104462320A (en) | Method and device for realizing classification of network users | |
CN112131507A (en) | Website content processing method, device, server and computer-readable storage medium | |
CN104199945A (en) | Data storing method and device | |
US11120176B2 (en) | Object count estimation by live object simulation | |
CN114024904B (en) | Access control method, device, equipment and storage medium | |
CN103399968A (en) | Microblog information acquisition method and microblog information acquisition system | |
JP6872853B2 (en) | Detection device, detection method and detection program | |
Kapusta et al. | User Identification in the Process of Web Usage Data Preprocessing. | |
CN111078975A (en) | Multi-node incremental data acquisition system and acquisition method | |
CN106933840A (en) | Forum's catalogue page content crawling method and device | |
US10868872B2 (en) | Method and system for determining a source link to a source object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |