WO2018121430A1

WO2018121430A1 - File storage and indexing method, apparatus, media, device and method for reading files

Info

Publication number: WO2018121430A1
Application number: PCT/CN2017/117967
Authority: WO
Inventors: 陈闯; 张炎泼
Original assignee: 贵州白山云科技有限公司
Priority date: 2016-12-26
Filing date: 2017-12-22
Publication date: 2018-07-05
Also published as: CN106874348A; CN106874348B

Abstract

Provided herein are a file storage and indexing method, apparatus, media, device and a method for reading files, wherein said file storage and indexing method comprises: storing each file according to the alphabetical order of actual key values of files, and obtaining a data file; generating an index file which is used for indexing each file in the data file, wherein an index in the index file uses first N bytes of an actual key value of each file as a key value, each index pointing to one or more files in the data file, while an offset value corresponding to the key value is an offset value of the first file in one or more files to which the key value points, and a size value corresponding to the key value is a size value of the first file of one or more files to which the key value points. Solved herein is the problem wherein memory resource consumption of an indexing solution used by a Haystack system is large, thereby reducing consumption of memory resources by an indexing system.

Description

File storage and indexing method, device, medium, device, and method of reading a file

[Correct according to Rule 26 19.01.2018]
This application claims the priority of the Chinese Patent Application filed on Dec. 26, 2016, the Chinese Patent Application No. 2016112212151.1, the invention titled "File Storage and Indexing Method, Apparatus, and Method of Reading Documents", the entire contents of which are The citations are incorporated herein by reference.

Technical field

Embodiments of the present invention relate to, but are not limited to, the field of file storage and indexing, and in particular, to a file storage and indexing method, device, medium, device, and method for reading a file.

Background technique

Internet data is exploding, and various applications such as social networks, mobile communications, online video, and e-commerce can often generate huge files of billions or even billions and tens of billions. Due to the huge challenges in metadata management, access performance, storage efficiency, etc., the massive file problem has become a recognized problem in the industry.

Some well-known Internet companies in the industry have proposed solutions for a large number of small files. For example, the famous social networking site Facebook has stored more than 60 billion images and has launched the Haystack system to customize and optimize the storage of large numbers of images. Other small file processing schemes include Taobao's TFS, etc. The core idea of these systems is to append small files to a data file, and at the same time generate an index file to locate the location of the small file through the index file.

Here's a look at the Haystack solution that Facebook uses:

Facebook's Haystack's solution to small files is to put together small files, append the data of some small files to the data file and generate an index file, and use the index to find the offset and size of the small file in the data file. , read the file.

(1) Haystack's data file part: Haystack's data file, which encapsulates each small file into a file containing the key value, size, data, etc. of the file. All small files are appended to the data file in the order in which they were written.

(2) Haystack's index file part: Haystack's index file stores the key value of each file pin, as well as the offset, size and other information of the file pin in the data file. The program loads the index into memory when it starts, and locates the offset and size in the data file by looking up the index in memory.

(3) Read request index: Load the index file into memory, locate the index, and locate the offset and size of the file to be read.

(4) Write request to use the index: Write a file each time to add a file, add the file's data to the end of the file pin n. Generate an index added to the file pin n index record.

As can be seen from the above description, Facebook's Haystack feature is to load the full key value of the file into memory for file location. When the machine memory is large enough, Facebook's full 8-byte key value can be fully loaded into memory, but there are two problems in the real world:

(1) The storage server memory will not be too large, generally 32G to 64G;

(2) The key value corresponding to the small file is difficult to control. Generally, MD5 or SHA1 of the file content is selected as the key value of the file.

Suppose a storage server has 12 4T disks and the memory is about 32GB. The server now needs to store about 4K avatars, thumbnails and other files, about 1 billion. The key value of the file uses MD5, plus the offset and size fields, and the index information corresponding to an average small file occupies 28 bytes. In this case, the index occupies nearly 30GB of memory and the disk occupies only 4TB. Memory consumption is nearly 100%, and disk consumption is only 8%.

It can be seen that the indexing scheme adopted by the Haystack system consumes a large amount of memory resources, and the memory resources limit the utilization of disk resources. Therefore, in order to obtain a larger utilization of disk resources, an excessive increase in memory resources is required.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

The embodiments of the present invention provide a file storage and indexing method, device, medium, device, and method for reading a file, so as to at least solve the problem that the indexing scheme adopted by the Haystack system consumes a large amount of memory resources.

The file storage and indexing method provided by the embodiment of the invention includes:

Store each file in alphabetical order according to the actual key value of the file to obtain a data file;

Generating an index file for indexing each file in the data file, wherein an index in the index file uses a first N bytes of an actual key value of each file as a key value, and each index points to the data file One or more files, the key value corresponding to the offset is an offset of the first file in the one or more files pointed by the key value, and the size value corresponding to the key value is the key value pointing The size of the first file in one or more files, N is a positive integer.

The above method also has the following characteristics:

The offset and size fields in the index file are aligned by 512 bytes.

The above method also has the following characteristics:

The generating an index file for indexing each file in the data file further includes:

The index of the index file is hierarchically stored according to a key value prefix, wherein a key value of an index stored in a layer corresponding to the key value prefix is a short key value truncating the key value prefix, wherein the key The value prefix has a byte length less than N.

The above method also has the following characteristics:

The offset of the index of the index file is an intra-layer offset of the offset of the index, and the number of bytes of the intra-layer offset is determined according to the layered maximum layer address space. of.

The above method also has the following characteristics:

The method further includes mapping all of the files in the data file to a Bloom filter such that when the file in the data file is read, the Bron filter is quickly searched to determine that the file is to be read. Whether the file may exist.

The computer readable storage medium provided by the embodiment of the present invention stores a computer program, and when the program is executed by the processor, the steps of the foregoing method are implemented.

A computer device provided by an embodiment of the present invention includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the processor implements the steps of the foregoing method when the program is executed.

The file storage and indexing device provided by the embodiment of the invention includes:

a data file storage module, configured to store a data file, wherein the data file is obtained by storing each file in alphabetical order according to an actual key value of the file;

An index file generating module, configured to generate an index file for indexing each file in the data file, wherein an index in the index file uses a first N bytes of an actual key value of each file as a key value, and each index Pointing to one or more files in the data file, the offset corresponding to the key value is an offset of a first file in one or more files pointed by the key value, and a size corresponding to the key value The value is the size of the first file in one or more files pointed to by the key value, and N is a positive integer.

The above device also has the following features:

The index file generating module is further configured to hierarchically store an index of the index file according to a key value prefix, wherein a key value of an index stored in a layer corresponding to the key value prefix is a prefix of the key value Short key value, wherein the key value prefix has a byte length less than N.

The above device also has the following features:

The device also includes:

a mapping module, configured to map all the files in the data file into the Bloom filter, so that when the file in the data file is read, by searching the Bloom filter to determine whether the file to be read is That may exist.

The method for reading a file in a file storage and indexing device provided by the present invention includes:

Querying an index corresponding to the first N bytes of the actual key value in the index file according to the first N bytes of the actual key value of the file to be read;

Matching, according to the actual key value, a file in one or more files pointed to by an index corresponding to a first N bytes of the actual key value;

The file is read when it matches a file whose key value is consistent with the actual key value.

The above method also has the following characteristics:

The index corresponding to the first N bytes of the actual key value in the index file according to the first N bytes of the actual key value of the file to be read includes:

Determining, according to the Bloom filter, whether a file to be read is likely to exist; if the result of the determination is possible, querying the index file according to the first N bytes of the actual key value of the file to be read The index corresponding to the first N bytes of the actual key value, otherwise the file is terminated.

Through the embodiment of the present invention, each file is stored in alphabetical order according to the actual key value of the file to obtain a data file; an index file for indexing each file in the data file is generated, wherein the index in the index file uses the actual key of each file. The first N bytes of the value are used as key values, and each index points to one or more files in the data file, and the offset corresponding to the key value is the offset of the first file in one or more files pointed to by the key value. The size corresponding to the key value is the size of the first file in one or more files pointed to by the key value, which solves the problem that the index scheme adopted by the Haystack system consumes a large amount of memory resources, and reduces the memory resources of the index system. Consumption.

DRAWINGS

The accompanying drawings are intended to provide a further understanding of the embodiments of the embodiments of the invention Improper limitations. In the drawing:

1 is a flow chart of a file storage and indexing method in accordance with an embodiment of the present invention;

2 is a structural block diagram of a file storage and indexing apparatus according to an embodiment of the present invention;

3 is a flowchart of a method of reading a file in a file storage and indexing device according to an embodiment of the present invention;

4 is a schematic diagram of a file storage and index structure in accordance with a preferred embodiment of the present invention;

5 is a flow chart of a method of reading a file in accordance with a preferred embodiment of the present invention;

6, FIG. 7, and FIG. 8 are schematic diagrams of index hierarchy according to a preferred embodiment of the present invention;

9 and 10 are diagrams showing a comparison of memory consumption of an indexing scheme in accordance with a preferred embodiment of the present invention.

detailed description

The embodiments of the present invention will be further described with reference to the drawings and specific embodiments.

Example 1

A file storage and indexing method is provided in this embodiment, and FIG. 1 is a flowchart of a file storage and indexing method according to an embodiment of the present invention. As shown in Figure 1, the process includes the following steps:

Step S101, storing each file in alphabetical order according to the actual key value of the file, to obtain a data file;

Step S102, generating an index file for indexing each file in the data file, wherein the index in the index file uses the first N bytes of the actual key value of each file as a key value, and each index points to one or more of the data files. File, the offset value corresponding to the key value is the offset value of the first file in one or more files pointed to by the key value, and the size value corresponding to the key value is the first one or more files pointed to by the key value. The size of the file, N is a positive integer.

In the above steps, since the actual key value of the file is no longer saved in the index, but only the first N bytes of the actual key value are saved, the size of the index file is reduced; at the same time, such an index no longer points to a file, but Point to the same one or more files of the first N bytes of the actual key value; in order to be able to locate the location of the file according to the offset in the index, store the file in the alphabetical order of the actual key value to the data file when the file is stored. One or more files in which the first N bytes of the actual key value are the same are stored in one continuous position, and an offset is used to indicate their storage location. It can be seen that after loading the index file generated in step S102 into the memory, the Haystack system of the related art will occupy less memory resources, which solves the problem that the index scheme adopted by the Haystack system consumes a large amount of memory resources. The problem is that the consumption of memory resources by the indexing system is reduced.

When indexing a certain file by using the index file generated in step S102, the index can no longer directly index to a certain file according to the index, but will index to a continuous file set; when it is necessary to accurately read a certain file, According to the actual key value of this file, it is possible to read the desired file by matching the files one by one in the file collection.

The offset field and the size field in the above index file are aligned by 512 bytes; that is, if a file is 1024 bytes in size and aligned in 512 bytes, 1024/512=2, the file size can be represented by 2. When the size is 2 in the index, the size of the file is 1024 bytes by multiplying 2 by 512 bytes; the previous need to save is 1024, now only need to save 2, save at least one byte; The number of bytes required for the offset and size fields can be calculated based on the actual size of the entire data file, thereby further reducing the number of bytes occupied by the index.

In order to further reduce the number of bytes occupied by the index, it is considered that the key value stored in the index file still has a possible row of key value prefixes. Therefore, it is also considered to layer the index in the index file according to the key value prefix. The storage, wherein the key value of the index stored in the layer corresponding to the key value prefix is a short key value of the truncated key value prefix, and the byte length of the key value prefix is less than N. In the case where the number of indexes in the hierarchy is larger, the number of bytes occupied by the layered index file will be smaller than the original index file.

After the index file is tiered, the offset of the index within each layer can be further optimized to reduce the number of bytes. Optionally, the offset of the index of the index file is an intra-layer offset of the offset of the index, and the number of bytes of the offset in the layer is determined according to the layered maximum layer address space. . Since the maximum layer address space must be smaller than the size of the entire data file, the number of bytes occupied by the intra-layer offset will also be less than the number of bytes occupied by the original offset in the offset range of the entire data file.

The Bloom filter is a binary vector data structure that has good spatial and temporal efficiency and is used to detect if an element is a member of a collection. If the test result is yes, the element is not necessarily in the set; but if the test result is no, the element must not be in the set. The advantage of the Bloom filter is that its insertion and query time are constant, and it does not save the element itself, but it has good security. In the embodiment of the present invention, since one index points to a plurality of files, it is necessary to utilize a Bloom filter to avoid waste of resources and time caused by queries for non-existing files by quickly searching for possible existence of files. Optionally, in this embodiment, all files in the data file are also mapped into the Bloom filter, so that when the file in the data file is read, it is possible to determine whether the file to be read is possible by quickly searching for the Bloom filter. presence.

In the embodiment of the present invention, the value of N is preferably 4. Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Example 2

In the embodiment, a file storage and indexing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.

2 is a structural block diagram of a file storage and indexing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes: a data file storage module 21 and an index file generating module 22, wherein

a data file storage module 21, configured to store data files, wherein the data files are obtained by storing the files in alphabetical order according to actual key values of the files;

The index file generating module 22 is coupled to the data file storage module 21 for generating an index file for indexing each file in the data file, wherein the index in the index file uses the first N bytes of the actual key value of each file as a key. Value, each index points to one or more files in the data file. The offset corresponding to the key value is the offset of the first file in one or more files pointed to by the key value, and the size value corresponding to the key value is the key. The size of the first file in one or more files pointed to by the value, N is a positive integer.

The index file generating module is further configured to hierarchically store the index of the index file according to the key value prefix, wherein the key value of the index stored in the layer corresponding to the key value prefix is a short key value of the prefix of the truncated key value, wherein the key value The prefix has a byte length less than N.

The offset of the index of the index file is the intra-layer offset of the offset range of the index, and the number of bytes of the intra-layer offset is determined according to the layered maximum layer address space.

The file storage and indexing device further includes: a mapping module, configured to map all the files in the data file to the Bloom filter, so that when the file in the data file is read, the Bron filter is searched to determine that the file is to be read. Whether the file may exist.

It should be noted that each of the above modules may be implemented by software or hardware. For the latter, the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.

In the embodiment of the present invention, the value of N is preferably 4.

Example 3

In the present embodiment, there is provided a method of reading a file in the above file storage and indexing device, and FIG. 3 is a flow chart of a method of reading a file in a file storage and indexing device according to an embodiment of the present invention, such as As shown in Figure 3, the process includes the following steps:

Step S301, querying an index corresponding to the first N bytes of the actual key value in the index file according to the first N bytes of the actual key value of the file to be read;

Step S302, according to the actual key value, matching the file in one or more files pointed to by the index corresponding to the first N bytes of the actual key value;

Step S303, when the file matching the key value and the actual key value is matched, the file is read.

Optionally, in step S301, before the index is queried, whether the file to be read may be determined according to the Bloom filter; if the result of the determination is possible, according to the actual key of the file to be read The first N bytes of the value query the index corresponding to the first N bytes of the actual key value in the index file, otherwise the file is terminated.

In the embodiment of the present invention, the value of N is preferably 4.

Example 4

In order to make the description of the embodiments of the present invention more clear, the following description and description are given in conjunction with the preferred embodiments.

In the preferred embodiment, a file storage and index structure and method are provided. FIG. 4 is a schematic diagram of a file storage and index structure according to a preferred embodiment of the present invention, as shown in FIG. 4, wherein the hierarchical file is stored in the memory. In the middle, the same key-value prefix is divided into one layer. Index files are used to locate small files. The data files are stored on disk, and each file pin is a small file.

5 is a flow chart of a method of reading a file according to a preferred embodiment of the present invention. FIG. 5 shows a specific location of a small file by matching an index prefix, and then viewing the file by reading the complete key value. Whether the key values match, if not matched, continue to search for the detailed flow of the next file pin.

The file storage and indexing scheme provided by the preferred embodiment includes the following steps: Step 1: compressing the prefix optimization, reducing the key value, the offset, and the size occupied space;

(1) Data file organization:

Similar to Facebook's Haystack, the system writes multiple small files into a single data file, each of which holds key-values, sizes, data, and more.

(2) Index file organization:

1) The index file only stores the first four bytes of the key value, not the full key value;

2) The offset and size fields in the index file are saved by 512 bytes, saving 1 byte; and the number of bytes used for the offset and size is calculated according to the actual size of the entire data file.

Step 2: The file pins are stored in order, and the location of the small files is located; the file pins in the data files are stored in alphabetical order according to the key values.

Due to the key value of the index file, only the first four bytes are saved. If the first four bytes of the small file key value are the same and the file pins are not stored sequentially, the specific positions of all the file pins scattered can not be found according to an offset. For example, the file key value read by the user is 0xabcdefacee, but since the key value in the index file only saves the first four bytes, it can only match the prefix 0xabcdefac, and the offset to be read cannot be located at this time.

In the preferred embodiment, the above problem is solved by storing the file pins sequentially: for example, the key value of the user reading the file is 0xabcdefacbb, and the prefix is 0xabcdefac, and the offset points to the file pin of 0xabcdefacaa, the first time. Match miss.

By storing the size in the header of the file pin, we can locate the 0xabcdefacbb location, match the correct file pin, and read the data to the user.

Step 3: Index layering optimization;

(1) Stratification scheme

Referring to FIG. 6, the index with the same key-value prefix in the index can be divided into one layer. The layering principle is that the number of files in each layer is controlled as much as possible to about 64, and the hierarchical level is selected according to the number of file pins to be stored in the layer. The level of hierarchy can be determined as needed, for example, an example of a hierarchical level is given below:

Level 0: no stratification;

Level 1: Select the first byte of the file pin key value for layering;

Level 2: Select the first two bytes of the file pin key value for layering;

The number of bytes used for the key-value prefix used for layering is less than the byte length of the key in the index.

(2) tiering reduces the number of occupied bytes of the key value

Referring to Figure 7, by layering, only one duplicate prefix is saved, saving the number of bytes of the key value.

(3) tiering reduces the number of occupied bytes of the offset

Referring to Figure 8, the offset before optimization is the address space of the entire data file. After optimization, the offset of the layer is offset in the entire data file, and the offset of the index under the layer only needs to be offset within the layer in the data file, which can be calculated according to the maximum layer address space. The number of bytes.

Moreover, in the preferred embodiment, access to the file is also avoided by the Bloom filter. In memory, map existing files to Bloom filters, only through a quick search.

It is possible to exclude files that do not exist. The time complexity is O(k), where k is the number of bits required for an element. Experience has shown that when k is 9.6, the false positive rate is 1%. If k is increased by 4.8, the false alarm rate will be reduced to 0.1%.

Advantageous effects of the preferred embodiment of the present invention will be described below with reference to Haystack.

(1) Comparison of memory savings brought by prefix compression

Referring to Fig. 9, the horizontal axis represents the number of files, the vertical axis represents the memory size required for the index file, the short dashed line represents the memory consumption of the conventional Haystack, and the long dashed line represents the memory consumption after the prefix compression by the embodiment of the present invention. It can be seen from FIG. 9 that in the case where the number of files is 1 billion, the memory used by the Haystack of the facabook is more than 26G, and the indexing scheme using the compression prefix provided by the preferred embodiment consumes more than 9G of memory, and the memory usage is reduced. 2/3.

(2) again through the index layering, the resulting memory savings comparison

Referring to FIG. 10, the horizontal axis represents the number of files, the vertical axis represents the memory size required for the index file, the short dashed line represents the memory consumption of the conventional Haystack, and the long dashed line represents the memory consumption after the prefix compression by the embodiment of the present invention, the solid line The memory consumption after prefix compression and index stratification is performed by the embodiment of the present invention. As can be seen from FIG. 10, after index layering, the 9G multi-memory consumption before optimization is further reduced to more than 4G, and one-half memory consumption is saved.

After testing the file storage and indexing scheme provided by the preferred embodiment, the overall performance of the small file is significantly improved, and the number of requests per second (RequestPerSecond, referred to as RPS) is more than doubled, and the input/output of the machine (Input/Output, referred to as For IO) usage has nearly doubled. At the same time, because the minimum memory unit is optimized, the fragmentation is reduced by 80%. Using this system, we can provide users with faster read and write services and save the resource consumption of the cluster.

Example 5

In the embodiment, software is provided for executing the technical solutions described in the above embodiments and preferred embodiments.

Example 6

This embodiment provides a storage medium. In this embodiment, the above storage medium may be configured to store program code for performing the following steps:

Step S102, generating an index file for indexing each file in the data file, wherein the index in the index file uses the first N bytes of the actual key value of each file as a key value, and each index points to one or more of the data files. File, the offset corresponding to the key value is the offset of the first file in one or more files pointed to by the key value, and the size corresponding to the key value is the first file in one or more files pointed to by the key value. The size value, N is a positive integer.

Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a removable hard disk. A variety of media that can store program code, such as a disk or an optical disk.

For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.

Example 7

Embodiments of the present invention also provide a storage medium. In this embodiment, the above storage medium may be configured to store program code for performing the following steps:

A person skilled in the art should understand that the technical solutions of the present invention may be modified or equivalent, without departing from the spirit and scope of the present invention, and should be included in the scope of the claims.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and functional blocks/units of the methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components work together. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on a computer readable medium, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or may Any other medium used to store the desired information and that can be accessed by the computer. Moreover, it is well known to those skilled in the art that communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .

Industrial applicability

This paper solves the problem that the indexing scheme adopted by the Haystack system consumes a large amount of memory resources and reduces the consumption of memory resources by the indexing system.

Claims

A file storage and indexing method, including:

Store each file in alphabetical order according to the actual key value of the file to obtain a data file;

Generating an index file for indexing each file in the data file, wherein an index in the index file uses a first N bytes of an actual key value of each file as a key value, and each index points to the data file One or more files, the key value corresponding to the offset is an offset of the first file in the one or more files pointed by the key value, and the size value corresponding to the key value is the key value pointing The size of the first file in one or more files, N is a positive integer.
The method of claim 1 wherein the offset field and the size field in the index file are aligned by 512 bytes.
The method of claim 1, wherein the generating an index file for indexing each file in the data file further comprises:

The index of the index file is hierarchically stored according to a key value prefix, wherein a key value of an index stored in a layer corresponding to the key value prefix is a short key value truncating the key value prefix, wherein the key The value prefix has a byte length less than N.
The method of claim 3, wherein

The offset of the index of the index file is an intra-layer offset of the offset of the index, and the number of bytes of the intra-layer offset is determined according to the layered maximum layer address space. of.
The method according to any one of claims 1 to 4, wherein the method further comprises:

All files in the data file are mapped into a Bloom filter such that when the file in the data file is read, it is determined whether the file to be read is likely to exist by quickly searching the Bloom filter.
A computer readable storage medium having stored thereon a computer program, the program being executed by a processor to perform the steps of the method of any one of claims 1 to 5.
A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the program to implement any one of claims 1 to 5 Steps of the method
A file storage and indexing device comprising:

a data file storage module, configured to store a data file, wherein the data file is obtained by storing each file in alphabetical order according to an actual key value of the file;

An index file generating module, configured to generate an index file for indexing each file in the data file, wherein an index in the index file uses a first N bytes of an actual key value of each file as a key value, and each index Pointing to one or more files in the data file, the offset corresponding to the key value is an offset of a first file in one or more files pointed by the key value, and a size corresponding to the key value The value is the size of the first file in one or more files pointed to by the key value, and N is a positive integer.
The apparatus according to claim 8, wherein the index file generating module is further configured to hierarchically store an index of the index file according to a key value prefix, wherein an index stored in a layer corresponding to the key value prefix The key value is a short key value that truncates the key value prefix, wherein the key value prefix has a byte length less than N.
The apparatus according to claim 9, wherein

The offset of the index of the index file is an intra-layer offset of the offset of the index, and the number of bytes of the intra-layer offset is determined according to the layered maximum layer address space. of.
The device according to any one of claims 8 to 10, wherein the device further comprises:

a mapping module, configured to map all the files in the data file into the Bloom filter, so that when the file in the data file is read, by searching the Bloom filter to determine whether the file to be read is That may exist.
[Correct according to Rule 26 01.02.2018]

A method of reading a file in the file storage and indexing device according to any one of claims 8 to 11, comprising:

Querying an index corresponding to the first N bytes of the actual key value in the index file according to the first N bytes of the actual key value of the file to be read;

Matching, according to the actual key value, a file in one or more files pointed to by an index corresponding to a first N bytes of the actual key value;

The file is read when it matches a file whose key value is consistent with the actual key value.
[Correct according to Rule 26 01.02.2018]

The method according to claim 11, wherein the index corresponding to the first N bytes of the actual key value in the index file according to the first N bytes of the actual key value of the file to be read includes:

Determining, according to the Bloom filter, whether a file to be read is likely to exist; if the result of the determination is possible, querying the index file according to the first N bytes of the actual key value of the file to be read The index corresponding to the first N bytes of the actual key value, otherwise the file is terminated.