CN115292247A - File reading method and device, electronic equipment and storage medium - Google Patents
File reading method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115292247A CN115292247A CN202211186653.4A CN202211186653A CN115292247A CN 115292247 A CN115292247 A CN 115292247A CN 202211186653 A CN202211186653 A CN 202211186653A CN 115292247 A CN115292247 A CN 115292247A
- Authority
- CN
- China
- Prior art keywords
- file
- read
- group
- target storage
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a file reading method, a file reading device, an electronic device and a storage medium, wherein redundant space in the last storage block in each group is removed through index table removal, then redundant space between continuous files to be read is removed through memory space, through the operation, when the files are copied into a memory, due to the fact that the continuous files to be read exist, compared with the prior art, the method and the device can reduce the number of state switching times and the frequency of hard disk magnetic head switching during reading, and therefore reading efficiency is improved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a file reading method and apparatus, an electronic device, and a storage medium.
Background
With the progress of digitization, the business system needs to store more and more files, and the files stored in the business system need to be copied and exchanged in large quantities, such as: and transferring the files stored in the hard disk of the business system to other storage positions or other equipment to realize further processing and treatment of the data and the like. When files in a hard disk of a service system are transferred, the files need to be read, the storage positions of two files to be read which need to be continuously read are far different when the files are stored in the hard disk, at the moment, a hard disk magnetic head needs to jump to finish addressing, when the number of the files which need to be continuously read is large, the hard disk magnetic head frequently jumps to perform addressing, so that the addressing time is long, and because the storage positions of the two files to be read which need to be continuously read are not continuous and cannot be read at one time, when each file to be read is read, a user mode needs to be switched to a kernel mode, a certain time needs to be consumed for each switching, and when the number of the files which need to be continuously read is large, the time needed for state switching is long. In summary, in any of the above cases, the reading efficiency of reading the file is affected, and thus, it takes much time to transfer the file.
Disclosure of Invention
In view of this, embodiments of the present application provide a file reading method and apparatus, an electronic device, and a storage medium, so as to improve reading efficiency.
In a first aspect, an embodiment of the present application provides a file reading method, where the method is applied in a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel state, the method includes:
traversing a plurality of files to be read, and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, wherein for each file information, the file information comprises a storage position of a target storage block occupied by the file to be read corresponding to the file information, a file size of the file to be read corresponding to the file information, a tail block position of a last target storage block occupied by the file to be read corresponding to the file information, and continuity between the target storage blocks occupied by the files to be read corresponding to the file information;
according to the storage position of a first target storage block occupied by a file to be read corresponding to each file information, sequencing each file information in the first index table according to the sequence of the storage positions to obtain a second index table;
according to the continuity between target storage blocks occupied by files to be read corresponding to the file information, the storage position of the target storage block occupied by the files to be read corresponding to the file information and the tail block position of the last target storage block occupied by the files to be read corresponding to the file information, performing first group division on the file information according to the sequence of the file information included in the second index table to obtain a plurality of first file groups, wherein for each occupied target storage block which is a discontinuous file to be read, the file information corresponding to the files to be read is taken as a first discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block occupied by the files to be read and the storage position of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the files to be read is taken as a second discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position occupied by the last target storage block of the files to be read and the tail block position of the next file to be read are taken as a continuous file group, and the file to be read corresponding to the next target storage block occupied by the file and the target storage block of the continuous file to be read, and the file to be read are taken as a continuous file group;
according to the file size of the file to be read corresponding to each piece of file information, performing second group division on continuous file groups in the plurality of first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is smaller than or equal to a preset threshold value, the continuous file group is used as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is larger than the preset threshold value, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is larger than or equal to the preset threshold value, the value of the file size of the file to be read corresponding to the file information included in each second file group minus the file size of the file to be read corresponding to the last file information included in the second file groups is larger than 1, and the positive integer of the threshold value of the file size of the file to be larger than 1;
according to the file size of the file to be read corresponding to each piece of file information and the storage position of a target storage block occupied by the file to be read corresponding to the file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, according to the initial position and the first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, a first reading range of the file information in the discontinuous file group is determined so as to take the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, a second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and a second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the files to be read corresponding to the file information included in the non-continuous file group The sum of the storage sizes of the target storage blocks except the last target storage block in the occupied target storage blocks and the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group are determined;
copying files to be read corresponding to the third file group in the hard disk to a second memory area of a memory of the service system according to the reading range corresponding to each third file group, so as to take the copy content corresponding to the third file group in the second memory area as a fourth file group;
for the files to be read corresponding to each second file group in the fourth file group, performing second redundancy removal on a redundancy space between the files to be read corresponding to the second file group according to the starting position of the target storage block occupied by each file to be read corresponding to the second file group and the third ending position of the target storage block occupied by each file to be read corresponding to the second file group to obtain a fifth file group, wherein the third ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by each file to be read corresponding to the second file group and the file size of each file to be read corresponding to the second file group;
and reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Optionally, the method further comprises:
and responding to the selection operation of the user on the file type of the file stored in the hard disk, and determining the file corresponding to the file type selected by the user as the file to be read.
Optionally, the method further comprises:
and sending the read files in a group form in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
Optionally, the method further comprises:
the files sent in parallel are stored in an asynchronous disk-dropping mode.
In a second aspect, an embodiment of the present application provides a file reading apparatus, where the apparatus is in a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel state, the apparatus includes:
the system comprises a traversing unit, a first storage unit and a second storage unit, wherein the traversing unit is used for traversing a plurality of files to be read and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, and for each file information, the file information comprises the storage position of a target storage block occupied by the file to be read corresponding to the file information, the file size of the file to be read corresponding to the file information, the tail block position of the last target storage block occupied by the file to be read corresponding to the file information and the continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
the sorting unit is used for sorting the file information in the first index table according to the storage position of a first target storage block occupied by the file to be read corresponding to the file information and the sequence of the storage positions to obtain a second index table;
a first grouping unit, configured to perform first grouping and division on file information according to a sequence of file information included in the second index table according to continuity between target storage blocks occupied by files to be read corresponding to the file information, where the file information includes a storage location of a target storage block occupied by the file to be read corresponding to the file information, and a tail block location of a last target storage block occupied by the file to be read corresponding to the file information, so as to obtain a plurality of first file groups, where, for a file to be read that is discontinuous in each occupied target storage block, the file information corresponding to the file to be read is used as a first discontinuous file group, for a file to be read that is continuous in each occupied target storage block, and when the tail block location of the last target storage block occupied by the file to be read and the storage location of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the file to be read is used as a second discontinuous file group, and for a file to be read that is continuous in each occupied target storage block, and the tail block location of the file to be read is used as a storage block occupied by the file and the next storage block corresponding to be read, and the file to be read is used as a continuous storage block;
a second grouping unit, configured to perform second group division on a continuous file group in the multiple first file groups according to the file sizes of the files to be read corresponding to the file information and according to the sequence of the file information included in the second index table, so as to obtain multiple second file groups, where, for each continuous file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, where, in the first N-1 second file groups, a sum of file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a value obtained by subtracting a size of a file to be read corresponding to last file information included in the file groups from the file information included in each second file group is greater than 1, and N is a positive integer;
a first processing unit, configured to perform first redundancy removal on the first discontinuous file group, the second discontinuous file group, and the second file group according to a file size of a file to be read corresponding to each piece of file information and a storage location of a target storage block occupied by the file information including the file to be read corresponding to the file information, so as to obtain a third file group, where for each of the first discontinuous file group and the second discontinuous file group, a first read range of file information in the discontinuous file group is determined according to an initial location and a first end location of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use the file information corresponding to the first read range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the last file information included in the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the corresponding files to be read is determined by the difference between the file size of the files to be read corresponding to the last file information of the file information included in the second file group;
a copying unit, configured to copy, according to a reading range corresponding to each third file group, a file to be read, corresponding to the third file group in the hard disk, to a second memory area of a memory of the service system, so as to use a copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit, configured to, for a to-be-read file corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of the target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
and the reading unit is used for reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Optionally, the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
Optionally, the apparatus further comprises:
and the sending unit is used for sending the read files in a group mode in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
Optionally, the files sent in parallel are stored in an asynchronous destaging form.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the file reading method according to any one of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the file reading method according to any one of the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the method, a file to be read is divided into a continuous file to be read and a discontinuous file to be read through an index table, then the larger continuous file to be read is divided into a plurality of groups according to a preset division rule, after the division is finished, a redundant space in the last storage block in each group is removed to obtain a reading range of each group, then the file in the hard disk is read into a memory according to the reading range of each group, and the redundant space of the last storage block in each group is removed, so that the reading efficiency is favorably improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a file reading method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a document reading apparatus according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted in advance that the storage block is a basic storage unit of the hard disk, and one hard disk can be divided into 2 n A file usually occupies one or more storage blocks, and a storage block only stores one file, for example: a file has a size of 2.5M and a storage block has a size of 1M, and the file needs to occupy 3 storage blocks, where the third storage block is not full, but the remaining space of the third storage block is not used for storing other files, that is: two or more files cannot be stored in the same storage block, and the storage locations of the storage blocks occupied by the files may be continuous or discontinuous, for example: the hard disk comprises 5 storage unitsStorage blocks, the storage locations of the 5 storage blocks are storage block 1, storage block 2, storage block 3, storage block 4, and storage block 5 in sequence, if the storage block occupied by the file is storage block 1, storage block 2, and storage block 3, the storage locations of the storage blocks occupied by the file are consecutive, or the file is called a consecutive file, if the storage block occupied by the file is storage block 1, storage block 2, and storage block 4, or storage block 1, storage block 3, and storage block 5, the storage locations of the storage blocks occupied by the file are discontinuous, or the file is called a discontinuous file, that is: when a file occupies a plurality of storage blocks, the file is a discontinuous file as long as two of the storage blocks occupied sequentially are discontinuous, and the sequential occupation means that when the storage blocks occupied by the file are a storage block 1, a storage block 2 and a storage block 4, the storage blocks occupied sequentially are the storage block 1, the storage block 2 and the storage block 4, wherein the storage block 2 and the storage block 4 are not continuous, so that the file is discontinuous.
When a file is stored in a hard disk, the occupied storage blocks may be discontinuous, and the storage blocks occupied by two files are also discontinuous, so that when a plurality of files are read, a hard disk magnetic head frequently jumps due to the storage position of the storage blocks, and after one file is read, the other file is discontinuous with the file, and frequent switching between a user mode and a kernel mode needs to be performed, so that the overall reading efficiency of the file is low.
In order to solve the above problems, the present application provides a file reading method, a file reading apparatus, an electronic device, and a storage medium, so as to improve the overall file reading efficiency.
Example one
Fig. 1 is a schematic flowchart of a file reading method provided in an embodiment of the present application, where the method is applied to a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel mode, as shown in fig. 1, the method includes the following steps:
Specifically, after a file is generated by a business system, the file is stored in a hard disk, at this time, the file occupies a certain number of storage blocks, after the file to be read is determined, in order to improve the reading efficiency of the file to be read, it is necessary to determine the relevant file information of the file to be read, for example, the storage position of a target storage block occupied by the file to be read, and which storage blocks are occupied by the file to be read can be determined through the file information; and the file size of the file to be read; and the tail block position of the last target storage block occupied by the file to be read, for example: a target storage block occupied by a file to be read is a storage block 1, a storage block 2 and a storage block 3, and the file to be read is stored according to the sequence of the storage block 1, the storage block 2 and the storage block 3, at this time, the storage space in the storage block 1 and the storage block 2 is occupied by the file to be read, if the size of the file to be read is smaller than the total size of the storage block 1, the storage block 2 and the storage block 3, the storage block 3 is not occupied, if the size of the file to be read is equal to the total size of the storage block 1, the storage block 2 and the storage block 3, the storage block 3 is occupied, no matter whether the storage block 3 is occupied, the storage block 3 is the last target storage block occupied by the file to be read, the tail block position refers to the position of the tail end position of the storage block 3 in the hard disk, or can be understood as a boundary in the range of the storage space of the storage block 3; and also obtaining the continuity between the target storage blocks occupied by the file to be read, namely: and if the file to be read occupies a plurality of target storage blocks, judging whether the target storage blocks are continuous or not.
After the four kinds of information are obtained, a file to be read can be described in detail through the four kinds of information, that is: the file information of the file to be read can be formed, and then the file information of the files to be read is stored in a first memory area of a memory of the service system in the form of a first index table, so that useless information in the files to be read is removed in the form of the index table.
And step 102, according to the storage position of the first target storage block occupied by the file to be read corresponding to each piece of file information, and according to the sequence of the storage positions, performing sorting on each piece of file information in the first index table to obtain a second index table.
Specifically, when reading a file to be read in a hard disk, it is fastest to read according to the sequence of storage blocks in the hard disk, for example: after a hard disk is divided according to the sequence of the storage block 1, the storage block 2, the storage block 3, the storage block 4, and the storage block 5, when a file in the five storage blocks is read, it is fastest to read according to the sequence of the storage block 1, the storage block 2, the storage block 3, the storage block 4, and the storage block 5, but since one file may occupy multiple storage blocks, file information may be sorted according to the storage location of a first target storage block occupied by the file to be read, for example: when a first file occupies the storage block 1, the storage block 3 and the storage block 5, and a second file occupies the storage block 2 and the storage block 4, the first target storage block occupied by the first file is the storage block 1, and the second target storage block occupied by the second file is the storage block 3, so that the arrangement sequence of the file information corresponding to the first file and the second file is the first file and the second file.
Specifically, after the second index table is obtained, the sequence of each file to be read may be determined, and in order to facilitate subsequent operations, it is necessary to distinguish whether each file to be read is continuous with other files, where if a plurality of target storage blocks occupied by one file to be read are discontinuous, the file to be read is taken as a first discontinuous file group, at this time, the first discontinuous file group only includes the file to be read, and if a plurality of target storage blocks occupied by one file to be read are continuous, but the storage location of the file to be read is discontinuous with the storage locations of the other files to be read, that is: if the storage block occupied by the file to be read in the hard disk is discontinuous with the storage block occupied by other files to be read in the hard disk, the file to be read is taken as a second discontinuous file group, and at this time, the second discontinuous file group only includes the file to be read, for example: the hard disk is divided according to the sequence of a storage block 1, a storage block 2, a storage block 3, a storage block 4 and a storage block 5, wherein the first file to be read occupies the storage block 1 and the storage block 2, if the second file to be read occupies the storage block 4 and the storage block 5, the first file to be read is taken as a second discontinuous file group, if the first file to be read occupies the storage block 1 and the storage block 2, and if the second file to be read occupies the storage block 3 and the storage block 4, the two files are taken as a continuous file group.
Another example is: the hard disk is divided according to the sequence of a storage block 1, a storage block 2, a storage block 3, a storage block 4 and a storage block 5, a first file to be read occupies the storage block 1, the storage block 2 and the storage block 4, a second file to be read occupies the storage block 5, the first file to be read is used as a first discontinuous file group, and the second file to be read is used as a second discontinuous file group.
Further, when determining a file group, it is necessary to first determine whether storage blocks occupied by the file to be read are continuous, if not, it is determined as a first discontinuous file group, if so, it is determined whether a storage block occupied by a next file to be read adjacent to the file to be read is continuous with a storage block occupied by the file to be read, if not, it is determined as a second discontinuous file group, if so, the two files to be read are determined as a continuous file group, after determining the continuous file group, it is continuously determined whether a next file to be read is continuous with the continuous file group, if so, the three file groups to be read are determined as a continuous file group, if not, the previous two continuous files to be read are determined as a continuous file group, then it is determined whether a file to be read adjacent to the next file to be read is continuous with the next file to be read and the next file to be read are continuous, and so on the basis that information of all files in the second index table including information of each file is sequentially divided into a first discontinuous file group, a second discontinuous file group, and a discontinuous file group.
Step 104, according to the file size of the file to be read corresponding to each piece of file information, performing second group division on a plurality of continuous file groups in the first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus the preset threshold corresponding to the file size of the last file information included in the second file groups is greater than 1, and the value of the file size of the file to be greater than N is an integer.
Specifically, after the first file group is determined, the sizes of some continuous file groups may be too large, so as to affect data transmission after reading, and therefore the continuous file groups need to be divided, because this division is to avoid occurrence of a larger continuous file group, if the size of one continuous file group is smaller than or equal to a preset threshold, the continuous file group is not divided, if the size of one continuous file group is larger than the preset threshold, the continuous file group needs to be divided, and when the continuous file group is divided, the division is performed with the size just larger than or equal to the preset threshold as a criterion, for example: the preset threshold value is 1M, the continuous file group comprises 3 files to be read, if the first file to be read is 0.9M, the second file to be read is 0.8M and the third file to be read is 0.7M, the first file to be read and the second file to be read are divided into a second file group, and the third file to be read is divided into another second file group; if the first file to be read is 1.1M, the second file to be read is 0.8M and the third file to be read is 0.7M, dividing the first file to be read into a second file group, and dividing the second file to be read and the third file to be read into another second file group; and if the first file to be read is 0.3M, the second file to be read is 0.5M and the third file to be read is 0.6M, taking the three files to be read as a second file group, wherein when the three files to be read are divided, the three files to be read are divided according to the sequence of the three files to be read in a second index table.
The sizes of the non-continuous file groups may have non-continuous file groups larger than a preset threshold.
Step 105, according to the file size of the file to be read corresponding to each piece of file information and the storage location of the target storage block occupied by the file to be read corresponding to each piece of file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, determining a first reading range of the file information in the discontinuous file group according to the initial location and the first end location of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, a second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and a second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the files to be read corresponding to the file information included in the non-continuous file group The sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the read file and the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group are determined.
Specifically, after the division in step 104, the obtained file group includes: the method comprises a first discontinuous file group, a second discontinuous file group and a second file group, wherein the first discontinuous file group and the second discontinuous file group only comprise a file to be read, the second file group may be composed of a plurality of continuous files to be read, and for the first discontinuous file group and the second discontinuous file group, redundant space exists in only the last target storage block in target storage blocks occupied by the files to be read in the two file groups, therefore, only the storage positions of the files to be read in the target storage blocks need to be determined for the first discontinuous file group and the second discontinuous file group, namely: when the first reading range is determined, a first ending position of the file to be read in the last target storage block can be determined according to the difference between the sum of the storage sizes of the target storage blocks, except the last target storage block, in the target storage blocks occupied by the file to be read and the file size of the file to be read, and then the first reading range can be determined according to the starting position and the first ending position of the target storage blocks occupied by the file to be read; for each second file group, the files to be read included in the second file group are consecutive, where the consecutive file groups include: if the second file group only includes one file to be read, the target storage blocks occupied by the file to be read are continuous, and if the second file group includes a plurality of files to be read, the target storage blocks occupied by the files to be read are continuous, at this time, all files to be read included in the second file group can be regarded as a whole, and then the storage position of the whole in the target storage blocks is determined, that is: when the second reading range is determined, a second ending position of the last file to be read in the whole in the last target storage block can be determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage block occupied by the last file to be read in the whole and the file size of the last file to be read in the whole, and then the second reading range can be determined according to the starting position and the second ending position of the target storage block occupied by the whole.
Specifically, after the first reading range and the second reading range are determined, when the file to be read stored in the hard disk is copied, the redundant space in the last target storage block in the file group is not read, and when the file to be read is read, due to the fact that continuous files to be read exist, compared with the prior art, the number of state switching times and the frequency of hard disk magnetic head switching can be reduced during reading, so that reading efficiency is improved.
Specifically, after the redundant space in the last target storage block is removed, the redundant space does not exist in the first discontinuous file group and the second discontinuous file group, and after the redundant space in the last target storage block is removed in the second file group, the redundant space still exists between two continuous files to be read, for example: when a second file group is composed of a first file to be read and a second file to be read, after the first redundant space is removed, only the redundant space in the second file to be read is removed, the redundant space still exists in the first file to be read, the remaining redundant space needs to be removed at this time, each file to be read in the second file group can be regarded as a whole at this time, then the space occupied by the file to be read in the target storage block is determined by using the starting position and the ending position of the target storage block occupied by the file to be read, so that the redundant space in the last target storage block occupied by the file to be read is removed, and the redundant space among the files to be read can be removed at this time.
And step 108, reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Specifically, all redundant spaces in one file group can be removed through the above operations, and since the second memory space includes all valid data of the file to be read, it is beneficial to improve the reading efficiency when reading the data in the second memory space.
In a possible embodiment, before performing step 101, a file corresponding to the file type selected by the user may be determined as the file to be read in response to a user selection operation on the file type of the file stored in the hard disk.
Specifically, when transferring a file, an option of a file type of the file stored in the hard disk may be displayed to the user, the user may select the option, and after the user finishes selecting, the file included in the file type selected by the user is used as the file to be read, and the processing shown in fig. 1 is performed.
In one possible embodiment, after step 108 is performed, the read files are sent in parallel in groups, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
Specifically, when data is transmitted, a plurality of data can be transmitted as long as the bandwidth is not exceeded, and in order to improve the data transmission efficiency, a plurality of file groups can be read at one time and then the read file groups are transmitted in parallel, so that the data transmission efficiency is improved.
In one possible embodiment, the files sent in parallel are stored in an asynchronous destage.
Specifically, after receiving the transmitted file, the received file may be stored while continuing to receive other files, instead of being stored together after receiving all files, which is beneficial to improving the storage efficiency.
Example two
Fig. 2 is a schematic structural diagram of a file reading apparatus provided in the second embodiment of the present application, where the apparatus is in a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel state, as shown in fig. 2, the apparatus includes:
the traversal unit 201 is configured to traverse a plurality of files to be read, and store an obtained first index table that includes file information corresponding to each of the files to be read into a first memory area of a memory of the service system, where for each piece of file information, the piece of file information includes a storage location of a target storage block occupied by the file to be read corresponding to the piece of file information, a file size of the file to be read corresponding to the piece of file information, a tail block location of a last target storage block occupied by the file to be read corresponding to the piece of file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the piece of file information;
a sorting unit 202, configured to perform, according to a storage position of a first target storage block occupied by a file to be read corresponding to each piece of file information, sorting on each piece of file information in the first index table according to a sequence of the storage positions, so as to obtain a second index table;
a first grouping unit 203, configured to perform, according to continuity between target storage blocks occupied by files to be read corresponding to the file information, where the file information includes a storage location of a target storage block occupied by the file to be read corresponding to the file information, and a tail block location of a last target storage block occupied by the file to be read corresponding to the file information, perform first group division on the file information according to a sequence of the file information included in the second index table, so as to obtain a plurality of first file groups, where, for each occupied target storage block being a discontinuous file to be read, the file information corresponding to the file to be read is used as a first discontinuous file group, for each occupied target storage block being a continuous file to be read, and when a tail block location of the last target storage block occupied by the file to be read and a storage location of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the file to be read is used as a second discontinuous file group, for each occupied target storage block being a continuous file, and the tail block location of the file to be read being a continuous file and the occupied by the next target storage block being a continuous file to be read, and the tail block location of the file being a storage block corresponding to be read and the file being a storage location of the file to be read, and the file being a storage block corresponding to be read;
a second grouping unit 204, configured to perform second group division on a plurality of continuous file groups in the first file groups according to the file sizes of the files to be read corresponding to the file information and according to the sequence of the file information included in the second index table, so as to obtain a plurality of second file groups, where for each continuous file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, and if a sum of file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, where in the first N-1 second file groups, a sum of file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a value obtained by subtracting a size of a file corresponding to the last file information included in the second file groups from the file size of the files to be read included in each second file group is greater than 1, and N is a positive integer;
a first processing unit 205, configured to perform, according to a file size of a file to be read corresponding to each piece of file information and a storage location of a target storage block occupied by the file to be read corresponding to each piece of file information, first redundancy removal on the first discontinuous file group, the second discontinuous file group, and the second file group to obtain a third file group, where, for each of the first discontinuous file group and each of the second discontinuous file group, a first read range of file information in the discontinuous file group is determined according to a start position and a first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use file information corresponding to the first read range as the third file group, the first end position is determined according to a difference between a storage size of a target storage block other than a last target storage block in the target storage blocks occupied by the file to be read corresponding to the file group included in the discontinuous file group and a start position and a first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, and the second read range of the second discontinuous file group includes a last target storage block corresponding to the start position of the second read information, and the second read range of the second file group includes a last target storage block corresponding to the read range of the file group, and the second read range of the file group, and the last target storage block corresponding to obtain a second read range of the second file group, and a second read range of the file group, the second ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in the target storage blocks occupied by the file to be read corresponding to the last file information of the file information included in the second file group and the file size of the file to be read corresponding to the last file information of the file information included in the second file group;
a copying unit 206, configured to copy, according to the reading range corresponding to each third file group, files to be read, corresponding to the third file group in the hard disk, to a second memory area of the memory of the service system, so as to use copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit 207, configured to, for a to-be-read file corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of a target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
the reading unit 208 is configured to read the fifth file group and a file group in the fourth file group except for the file to be read corresponding to the second file group.
In one possible embodiment, the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
In one possible embodiment, the apparatus further comprises:
and the sending unit is used for sending the read files in parallel in a group form, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
In one possible embodiment, the files sent in parallel are stored in an asynchronous destage.
For the principle description of the second embodiment, reference is made to the detailed description of the first embodiment, and the detailed description is omitted here.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, including: a processor 301, a storage medium 302 and a bus 303, wherein the storage medium 302 stores machine-readable instructions executable by the processor 301, when the electronic device executes the file reading method, the processor 301 communicates with the storage medium 302 through the bus 303, and the processor 301 executes the machine-readable instructions to perform the method according to the first embodiment.
Example four
The fourth embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method described in the first embodiment is performed.
The apparatus provided in the embodiments of the present application may be specific hardware on a device, or software or firmware installed on a device, etc. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some features, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A file reading method is applied to a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel mode, the method comprises the following steps:
traversing a plurality of files to be read, and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, wherein for each file information, the file information comprises a storage position of a target storage block occupied by the file to be read corresponding to the file information, a file size of the file to be read corresponding to the file information, a tail block position of a last target storage block occupied by the file to be read corresponding to the file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
according to the storage position of a first target storage block occupied by a file to be read corresponding to each file information, sequencing each file information in the first index table according to the sequence of the storage positions to obtain a second index table;
according to the continuity between target storage blocks occupied by files to be read corresponding to the file information, the file information comprises the storage positions of the target storage blocks occupied by the files to be read corresponding to the file information and the tail block position of the last target storage block occupied by the files to be read corresponding to the file information, first group division is carried out on the file information according to the sequence of the file information included in the second index table to obtain a plurality of first file groups, wherein for each occupied target storage block which is a discontinuous file to be read, the file information corresponding to the files to be read is taken as a first discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block occupied by the files to be read and the storage position of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the files to be read is taken as a second discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position occupied by the last target storage block of the files to be read and the tail block position of the file corresponding to be read are taken as a second discontinuous file group, and the file group corresponding to which the file information corresponding to be read is taken as a continuous file to be read and the next target storage block occupied by the file to be read;
according to the file size of the file to be read corresponding to each file information, performing second group division on continuous file groups in the plurality of first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is smaller than or equal to a preset threshold value, the continuous file group is taken as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is larger than the preset threshold value, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is larger than or equal to the preset threshold value, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus the file size of the file to be read corresponding to the last file information included in the second file groups is smaller than the preset threshold value, and the positive value of N is larger than 1;
according to the file size of the file to be read corresponding to each piece of file information and the storage position of a target storage block occupied by the file to be read corresponding to the file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, according to the initial position and the first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, a first reading range of the file information in the discontinuous file group is determined so as to take the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, a second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and a second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the files to be read corresponding to the file information included in the non-continuous file group The sum of the storage sizes of the target storage blocks except the last target storage block in the occupied target storage blocks is determined by the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group;
copying files to be read corresponding to the third file group in the hard disk to a second memory area of a memory of the service system according to the reading range corresponding to each third file group, so as to take the copy content corresponding to the third file group in the second memory area as a fourth file group;
for the files to be read corresponding to each second file group in the fourth file group, performing second redundancy removal on a redundancy space between the files to be read corresponding to the second file group according to the starting position of the target storage block occupied by each file to be read corresponding to the second file group and the third ending position of the target storage block occupied by each file to be read corresponding to the second file group to obtain a fifth file group, wherein the third ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by each file to be read corresponding to the second file group and the file size of each file to be read corresponding to the second file group;
and reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
2. The method of claim 1, wherein the method further comprises:
and responding to the selection operation of the user on the file type of the file stored in the hard disk, and determining the file corresponding to the file type selected by the user as the file to be read.
3. The method of claim 1, wherein the method further comprises:
and sending the read files in a group form in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
4. The method of claim 3, wherein the method further comprises:
the files sent in parallel are stored in an asynchronous disk-dropping mode.
5. A file reading apparatus, where the apparatus is in a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel state, the apparatus includes:
the system comprises a traversing unit, a first storage unit and a second storage unit, wherein the traversing unit is used for traversing a plurality of files to be read and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, and for each file information, the file information comprises the storage position of a target storage block occupied by the file to be read corresponding to the file information, the file size of the file to be read corresponding to the file information, the tail block position of the last target storage block occupied by the file to be read corresponding to the file information and the continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
the sorting unit is used for sorting the file information in the first index table according to the storage position of a first target storage block occupied by the file to be read corresponding to the file information and the sequence of the storage positions to obtain a second index table;
a first grouping unit, configured to perform first grouping and division on file information according to a sequence of file information included in the second index table according to continuity between target storage blocks occupied by files to be read corresponding to the file information, where the file information includes a storage location of a target storage block occupied by the file to be read corresponding to the file information, and a tail block location of a last target storage block occupied by the file to be read corresponding to the file information, so as to obtain a plurality of first file groups, where, for a file to be read that is discontinuous in each occupied target storage block, the file information corresponding to the file to be read is used as a first discontinuous file group, for a file to be read that is continuous in each occupied target storage block, and when the tail block location of the last target storage block occupied by the file to be read and the storage location of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the file to be read is used as a second discontinuous file group, and for a file to be read that is continuous in each occupied target storage block, and the tail block location of the file to be read is used as a storage block occupied by the file and the next storage block corresponding to be read, and the file to be read is used as a continuous storage block;
a second grouping unit, configured to perform second group division on a continuous file group in the multiple first file groups according to the file sizes of the files to be read corresponding to the file information and according to the sequence of the file information included in the second index table, so as to obtain multiple second file groups, where, for each continuous file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, where, in the first N-1 second file groups, a sum of file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a value obtained by subtracting a size of a file to be read corresponding to last file information included in the file groups from the file information included in each second file group is greater than 1, and N is a positive integer;
a first processing unit, configured to perform first redundancy removal on the first discontinuous file group, the second discontinuous file group, and the second file group according to a file size of a file to be read corresponding to each piece of file information and a storage location of a target storage block occupied by the file information including the file to be read corresponding to the file information, so as to obtain a third file group, where for each of the first discontinuous file group and the second discontinuous file group, a first read range of file information in the discontinuous file group is determined according to an initial location and a first end location of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use the file information corresponding to the first read range as the third file group, the first ending position is determined according to the difference between the storage size sum of the target storage blocks except the last target storage block in the target storage blocks occupied by the files to be read corresponding to the file information included by the non-continuous file group and the file size of the files to be read corresponding to the file information included by the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included by the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information included by the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is the last file information included by the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the corresponding files to be read is determined by the difference between the file size of the files to be read corresponding to the last file information of the file information included in the second file group;
a copying unit, configured to copy, according to a reading range corresponding to each third file group, a file to be read, corresponding to the third file group in the hard disk, to a second memory area of a memory of the service system, so as to use a copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit, configured to, for a to-be-read file corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of the target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
and the reading unit is used for reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
7. The apparatus of claim 5, wherein the apparatus further comprises:
and the sending unit is used for sending the read files in a group mode in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
8. The apparatus of claim 7, wherein the files sent in parallel are stored in an asynchronous destage.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions, when executed by the processor, performing the steps of the file reading method according to any one of claims 1 to 4.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the file reading method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211186653.4A CN115292247B (en) | 2022-09-28 | 2022-09-28 | File reading method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211186653.4A CN115292247B (en) | 2022-09-28 | 2022-09-28 | File reading method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115292247A true CN115292247A (en) | 2022-11-04 |
CN115292247B CN115292247B (en) | 2022-12-06 |
Family
ID=83833365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211186653.4A Active CN115292247B (en) | 2022-09-28 | 2022-09-28 | File reading method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115292247B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101233479A (en) * | 2005-08-03 | 2008-07-30 | 桑迪士克股份有限公司 | Management of memory blocks that directly store data files |
CN101968791A (en) * | 2010-08-10 | 2011-02-09 | 深圳市飘移网络技术有限公司 | Data storage method and device |
CN109726177A (en) * | 2018-12-29 | 2019-05-07 | 北京赛思信安技术股份有限公司 | A kind of mass file subregion indexing means based on HBase |
CN110647497A (en) * | 2019-07-19 | 2020-01-03 | 广东工业大学 | HDFS-based high-performance file storage and management system |
WO2021073111A1 (en) * | 2019-10-15 | 2021-04-22 | 平安科技(深圳)有限公司 | Distributed storage file reading and writing method, device and platform, and readable storage medium |
WO2021169113A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Data management method and apparatus, and computer device and storage medium |
-
2022
- 2022-09-28 CN CN202211186653.4A patent/CN115292247B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101233479A (en) * | 2005-08-03 | 2008-07-30 | 桑迪士克股份有限公司 | Management of memory blocks that directly store data files |
CN101968791A (en) * | 2010-08-10 | 2011-02-09 | 深圳市飘移网络技术有限公司 | Data storage method and device |
CN109726177A (en) * | 2018-12-29 | 2019-05-07 | 北京赛思信安技术股份有限公司 | A kind of mass file subregion indexing means based on HBase |
CN110647497A (en) * | 2019-07-19 | 2020-01-03 | 广东工业大学 | HDFS-based high-performance file storage and management system |
WO2021073111A1 (en) * | 2019-10-15 | 2021-04-22 | 平安科技(深圳)有限公司 | Distributed storage file reading and writing method, device and platform, and readable storage medium |
WO2021169113A1 (en) * | 2020-02-26 | 2021-09-02 | 平安科技(深圳)有限公司 | Data management method and apparatus, and computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115292247B (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2810171B1 (en) | Systems and methods for data chunk deduplication | |
CN109614377A (en) | File delet method, device, equipment and the storage medium of distributed file system | |
JP2008204206A (en) | Data distribution and storage system, data distribution method, device to be used for this and its program | |
EP2770446A1 (en) | Data processing method and device | |
EP2288975A2 (en) | Method for optimizing cleaning of maps in flashcopy cascades containing incremental maps | |
CN105787037B (en) | A kind of delet method and device of repeated data | |
CN105117351A (en) | Method and apparatus for writing data into cache | |
CN103677659A (en) | Information processing apparatus and copy control method | |
CN113254270B (en) | Self-recovery method, system and storage medium for storing cache hot spot data | |
CN115292247B (en) | File reading method and device, electronic equipment and storage medium | |
CN104077241B (en) | Cache life cycle algorithm switching handling method and device | |
CN107133334B (en) | Data synchronization method based on high-bandwidth storage system | |
CN111831691A (en) | Data reading and writing method and device, electronic equipment and storage medium | |
US20040123039A1 (en) | System and method for adatipvely loading input data into a multi-dimensional clustering table | |
CN109658985B (en) | Redundancy removal optimization method and system for gene reference sequence | |
CN111538677A (en) | Data processing method and device | |
US8341376B1 (en) | System, method, and computer program for repartitioning data based on access of the data | |
CN111782590A (en) | File reading method and device | |
CN115408342A (en) | File processing method and device and electronic equipment | |
CN110362769B (en) | Data processing method and device | |
CN108170372A (en) | data processing method and device based on cloud hard disk | |
CN102354302A (en) | Method and device for erasing disk | |
CN111177091B (en) | Video pre-distribution storage method, system and storage medium based on XFS file system | |
CN112131194A (en) | File storage control method and device of read-only file system and storage medium | |
CN115016740B (en) | Data recovery method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |