Embodiment
For descriptor and the video segment of realizing the association store video file, the embodiment of the invention provides a kind of descriptor of store video files and the method for video segment, referring to shown in Figure 1, comprises following key step:
S1, obtain the descriptor of video file from the website.
The descriptor of video file comprises at least: website, title, form and the size etc. of the link information of video file, content signature cid (Content Identification), ownership.
Wherein, the cid of video file obtains (repeating no more afterwards) by following manner: with the uniquely identified algorithm that can calculate file content video file is calculated, draw the cid of this video file.Concrete, the cid that calculates video file comprises following factor:
The uniquely identified algorithm that factor one, employing can be calculated file content calculates, for example the MD5 algorithm
(including but not limited to the MD5 algorithm).
The object that factor two, selection are calculated.Concrete, can calculate all binary data contents of video file; Also arbitrarily in the selecting video file at least two sections binary data contents calculate, and the length of every section binary data content choosing is any, for example can calculate, also can calculate video file start byte information, intermediary bytes information and trail byte information to video file start byte information, intermediary bytes information.
Factor three, the mode that the result after calculating is synthesized.When at least two sections binary data contents are calculated in any selecting video file, can adopt but be not limited to following mode the result who draws after calculating and synthesize: if adopt the MD5 algorithm, each the MD5 value that will draw after then will calculating joins end to end and synthesizes the cid of this video file; Also can calculate the result who draws after calculating with the MD5 algorithm again, draw the MD5 value, as the cid of this video file.
S2, according to the content signature cid of link information in the described descriptor of obtaining and video file, obtain the video segment of described video file.
Before obtaining video segment, calculate according to the method for the cid of the described calculating video file of preamble, draw the cid of this video file; Also can directly from the descriptor of storage, obtain the cid of video file.
Cid with video file is an index again, and the video segment quantity of judging this video file of having stored is then obtained the video segment of this video file less than preset threshold value according to the link information in the descriptor.
After obtaining video segment, calculate the image histogram of each frame in the video segment that obtains, judge that the image of each frame is particular color, then obtain video segment again.
The descriptor and the video segment of S3, the described video file that obtains by the cid association store of video file.
The descriptor and the video segment that further, can also represent the video file of storage to the user.
So far the key step flow process of the method for the embodiment of the invention is described.
The embodiment of the invention also provides a kind of descriptor of store video files and the system of video segment, referring to shown in Figure 2, comprising: web crawlers Spider, central processing server Hub and central data store server.
Wherein, web crawlers Spider is used for obtaining from the website descriptor of video file and the video segment that obtains this video file according to the cid of link information in the descriptor of video file and video file.In practical operation, can obtain the descriptor and the video segment of video file by a kind of Spider, also can obtain the descriptor and the video segment of video file by two kinds of Spider respectively.
Central processing server Hub is used for the descriptor output of video file that Spider is obtained;
The central data store server is used to receive the descriptor of the video file of Hub output, and the receiver, video fragment, and the cid by video file is with the descriptor and the video segment association store of video file.
Further, the system of the embodiment of the invention can also comprise and represents server, is used for representing to the user descriptor and the video segment of the video file of central data store server stores.
Further, the descriptor of the video file that obtains of web crawlers Spider comprises: website, title, form and the size etc. of the link information of video file, the cid of video file, ownership.
Spider obtains the cid (repeating no more afterwards) of video file by following manner: Spider calculates video file with the uniquely identified algorithm that can calculate file content, draws the cid of this video file.Concrete, the cid that Spider calculates video file comprises following factor:
Factor one, Spider adopt the uniquely identified algorithm that can calculate file content to calculate, for example MD5 algorithm (including but not limited to the MD5 algorithm).
Factor two, Spider select the object of calculating.Concrete, Spider can calculate all binary data contents of video file; Also arbitrarily in the selecting video file at least two sections binary data contents calculate, and the length of every section binary data content choosing is any, for example Spider can calculate video file start byte information, intermediary bytes information, also can calculate video file start byte information, intermediary bytes information and trail byte information.
The mode that factor three, Spider synthesize the result after calculating.When at least two sections binary data contents are calculated in any selecting video file of Spider, can adopt but be not limited to following mode the result who draws after calculating and synthesize: if Spider adopts the MD5 algorithm, each the MD5 value that will draw after then will calculating joins end to end and synthesizes the cid of this video file; Also can calculate the result who draws after calculating with the MD5 algorithm again, draw the MD5 value, as the cid of this video file.
Numerous Spider is distributing on the internet, can be with the one or more different video website that sets in advance as seed, therefrom obtain the descriptor of video file,, obtain the video segment of video file again according to the cid of link information that comprises in the descriptor and video file.
Be elaborated with specific embodiment below.
Method embodiment: present embodiment is an example with a Spider, supposes that this Spider is a seed with video website A.Flow process mainly comprises three parts: obtain descriptor, obtain video segment, association store descriptor and video segment, explanation respectively below.
Below descriptor is obtained in explanation.
Because the video file on the internet is more and more, for fear of to same video file repeat obtain, value based on each byte in the video file is constant, can by but be not limited to md5-challenge MD5 (Message-DigestAlgorithm5), respectively the initial 32k byte information of each video file, the 64k byte information of centre and the 32k byte information of ending are calculated, draw corresponding MD5 value, then with the synthetic new MD5 value of these three MD5 values, as the cid of this video file, with unique definite this video file of this cid.
When video website A obtains descriptor, obtain the page info of video website A earlier, the link information of video file on the page of parsing video website A, obtain the descriptor of the video file of this link information correspondence again, this descriptor comprises at least: website, title, form and the size etc. of the link information of this video file, the cid of this video file, ownership.
Below describe in detail and obtain video segment.
At first according to the method for the cid of the described calculating video file of preamble, the video file of the link information correspondence in the descriptor is calculated, draw the cid of this video file; Also can directly from the descriptor of obtaining, get the cid of video file.
Cid with video file is an index then, and the video segment quantity of judging the video file of having stored then continues follow-up flow process less than preset threshold value (this threshold value can be set according to actual needs in advance, is assumed to be 3 in the present embodiment).
Video file comprises file header, frame data and index three parts, and wherein file header is described the video file and the overall information of each stream wherein, as the wide height of file type, reproduction time, maximal rate, video image and the quantity of frame etc.; Frame data are chief components of video file, comprise the data of all frame of video and audio frame; Index is equivalent to a catalogue, has stored the timestamp of each key frame and the corresponding relation of the deviation post of this key frame in corresponding video file.File header, frame data and index identify with the fourCC FOURCC or the Globally Unique Identifier GUID (Globally Unique Identifier) of correspondence respectively.
When obtaining video segment,, sets the time period of video segment to be obtained, suppose in the present embodiment that this time period is from a second to b second earlier according to the information of file header in the video file.Read and the index of analysis video file, determine key frame of video and audio frequency key frame and these key frame the deviation post in video file of timestamp respectively in a second and b second according to analysis result.Owing to there is not picture to have only the video segment of sound to be difficult to be accepted by the user, in order to make the video segment that obtains hold susceptible to user acceptance, to the experience that the user brings, the timestamp of first frame of video to be obtained should be early than the timestamp of first audio frame to be obtained.
Again according to initial and the key frame of video that finishes and the timestamp and the deviation post of audio frequency key frame determined, intercept the frame data of video segment to be obtained, frame data are stored in each bag, and wherein the header packet information in each bag is indicated the size and the structure of this bag.
Because differences such as the size of frame, form, interception way are also different.The process of intercepting frame data is below described.
The intercepting frame data are that unit carries out with bag Packet (or piece Chunk), and possible each bag Packet (or piece Chunk) comprises one or more frames, also may be by the synthetic frame of a plurality of bag Packet (or piece Chunk).
Therefore the time span difference of the frame that different bag Packet (or piece Chunk) is comprised, begins intercepting from certain timestamp, may need bag Packet (or piece Chunk) is split, to form new bag Packet (or piece Chunk).
Because the size of the bag Packet (or piece Chunk) of some form must be fixed, therefore need fill (Padding) to this bag Packet (or piece Chunk) after splitting.
Be truncated to after the frame data, abandon except that video requency frame data and audio frame number other flow data according to this.
After intercepting the frame data of video segment to be obtained and storing in the bag, obtain the frame data of storage in all bags, with each the bag in the storage frame data as new frame data, revise the quantity of reproduction time and frame accordingly, corresponding new file header and new index are set, then a new video segment formed in new file header, frame data and index, be the video segment that obtains.
Then, calculate the image histogram of each frame in the video segment that obtains, judge that the image of each frame is particular color (as grey black), then obtain the video segment of this video file again.
Below describe association store descriptor and video segment in detail.
Because the video file quantity on the video website is growing in the internet at present, quality is uneven, in order to guarantee the unique of video file and the video segment that obtains high-quality, can manually give video website customization level as seed, other rule of custom web site level includes but not limited to: the higher rank of video website customization better to the video file quality, that prestige is higher and clicking rate is higher then customizes lower rank to other video website.
According to the rank of above-mentioned rules customization video website A, and be index, judge whether the current video description information of files is stored, if not, then directly store the current video description information of files with the cid of video file; If then handle in the following manner: during the website rank of the video file ownership of having stored being superior to of video website A, cover the descriptor of store video files with the current video description information of files.
Before the video segment that storage is obtained, calculate the image histogram of each frame in this video segment earlier, the image of judging each frame is not particular color (as grey black), then according to the method for the cid of the described calculating video file of preamble, this video segment is calculated, draw the cid of the MD5 value of this video segment as this video segment.
Cid with video segment is an index again, judges that this video segment is not stored, and then stores this video segment and cid thereof; Judge that the video segment that obtains is stored, then do not store other video segment that the video segment that obtains also can obtain this video file again, repeat above-mentioned flow process.
Be stored as association store in the present embodiment, concrete, descriptor and the video segment and the cid thereof of the cid association store video file by video file.
So far the descriptor of association store video file and video segment are described.
The user selects video file for convenience, can also represent the descriptor and the video segment of the video file of storage to the user.
The video file that has multiple form at present on the internet, flv (flash video) form for example, the flv form is a kind of new streaming media video form, the video file that adopts the flv form to generate is less, loading velocity is very fast, be fit to online playing, therefore present most of video website provide the video file of flv form.
Because the video segment on the internet is not of uniform size, and the network speed of each region is also different, and different user may be installed different video playback plug-in units, in order to guarantee video segment online playing smoothly, can also further handle flexibly according to the good and bad of video quality of current fragment, the form of the video segment that obtains is changed, and the reasonable size of control of video fragment.For example, the video segment of high-quality is compressed and the form conversion, the high-quality video fragment is automatically converted to the flv form of different compressibilitys, can keep original form and size the video segment of poor quality by specific program.
On video website (as Online Video preview website, video search engine website etc.), the descriptor of video file and the ways of presentation of video segment can comprise following several:
Ways of presentation one: on the page of website, represent the sectional drawing of the start frame of each video segment, when the user clicks this sectional drawing, can quick-downloading this video segment, and online playing.
Ways of presentation two: when certain video segment was arrived in user search, this video segment was play in samsara immediately on the page of website.
Ways of presentation three: when certain video segment was arrived in user search, the picture of some frame of this video segment was play in samsara on the page of website.
In the practical application, should select the ways of presentation of video segment neatly according to current network state.For example, the user network situation that video website can be returned according to its page, selecting suitably neatly, the video segment of size carries out online playing or download is provided.
Embodiment describes to the method, following descriptive system embodiment.
As mentioned before, in the system of the embodiment of the invention, can obtain the descriptor and the video segment of video file, also can obtain the descriptor and the video segment of video file respectively by two kinds of Spider by a kind of Spider.Describe with four specific embodiments below.
System embodiment 1: the system that comprises a kind of Spider.
When Spider receive that Hub sends obtain instruction the time, Spider obtains the descriptor of video file and outputs to the Hub from different websites, this descriptor comprises at least: the website of the link information of this video file, the cid of this video file, ownership (being video website A), title, form and size etc.Wherein, the cid of video file is calculated this video file by the method for Spider according to the cid of the described calculating video file of preamble.
Hub obtains the descriptor that it obtains from Spider, and be index with the cid of video file, judge that whether the current video description information of files is by the central data store server stores, if not, then directly indicate the current video description information of files of central data store server stores Hub output; If, then handle in the following manner: during the website rank of the video file ownership of having stored being superior to of video website A, indication central data store server covers the descriptor of store video files with the current video description information of files of Hub output.
Hub is an index with the cid of video file, in the central data store server, search the number of fragments of the video file corresponding with this cid, the quantity of the video segment that judgement has been stored whether (can set according to actual needs in advance less than preset threshold value by this threshold value, be assumed to be 3 in the present embodiment), if then return and obtain instruction to Spider; If not, then indicate Spider no longer to obtain the video segment of this video file.
Spider receives that Hub sends when obtaining instruction, according to the link information in the descriptor of the video file that has obtained, obtains the video segment of this video file.
After Spider has obtained satisfactory video segment, according to the computing method of the cid of the described video file of preamble, this video segment is calculated the cid of this video segment, and the video segment that obtains and the cid of this video segment are outputed to Hub.
Hub calculates the image histogram of each frame in the video segment, when the image of judging each frame is particular color (as grey black), then this video segment is filtered, and indication Spider obtains the video segment at an other place in this video file again, till finding satisfactory video segment.
Hub judges that the image of each frame in the video segment is not particular color (as grey black), then the cid with video segment is an index, judge when not storing this video segment in the central data store server, this video segment and cid thereof are outputed to the central data store server; Stored this video segment in the Hub judgement central data store server, then abandoned this video segment, indication Spider obtains the video segment at an other place in this video file again.
The central data store server is by the cid of video file, and the cid of descriptor, video segment and the video segment of Hub output is carried out association store.
Need afterwards when the user shows, by representing server represents descriptor from the video file of central data store server stores to the user and video segment.
System embodiment 2: another kind comprises the system of a kind of Spider.
When Spider receive that Hub sends obtain instruction the time, Spider obtains the descriptor of video file and outputs to Hub from different websites, this descriptor comprises at least: the website of the link information of this video file, the cid of this video file, ownership (being video website A), title, form and size etc.Wherein, the cid of video file is calculated this video file by the method for Spider according to the cid of the described calculating video file of preamble.
Hub obtains the descriptor that it obtains from Spider, and be index with the cid of video file, judge that whether the current video description information of files is by the central data store server stores, if not, then directly indicate the current video description information of files of central data store server stores Hub output; If, then handle in the following manner: during the website rank of the video file ownership of having stored being superior to of video website A, indication central data store server covers the descriptor of store video files with the current video description information of files of Hub output.
Hub is an index with the cid of video file, in the central data store server, search the number of fragments of the video file corresponding with this cid, the quantity of the video segment that judgement has been stored whether (can set according to actual needs in advance less than preset threshold value by this threshold value, be assumed to be 3 in the present embodiment), if then return and obtain instruction to Spider; If not, then indicate Spider no longer to obtain the video segment of this video file.
Spider receives that Hub sends when obtaining instruction, according to the link information in the descriptor of the video file that has obtained, obtains the video segment of this video file.
Spider calculates the image histogram of each frame in the video segment that obtains, when the image of judging each frame is particular color (as grey black), then this video segment is filtered, and obtain the video segment at an other place in this video file again, till finding satisfactory video segment.
After Spider has obtained satisfactory video segment, according to the computing method of the cid of the described video file of preamble, this video segment is calculated, draw the cid of this video segment, and the cid of this video segment is outputed to Hub.
Hub is an index with the cid of video segment then, judges that when not storing this video segment in the central data store server, indication Spider outputs to the central data store server with this video segment and cid thereof; Stored this video segment in the Hub judgement central data store server, then abandoned this video segment, indication Spider obtains the video segment at an other place in this video file again.
The central data store server carries out association store by the cid of video file with the descriptor of Hub output, the video segment of Spider output and the cid of video segment.
Need afterwards when the user shows, by representing server represents descriptor from the video file of central data store server stores to the user and video segment.
System embodiment 3: the system that comprises two kinds of Spider.
Comprise Text_spider (obtaining the spider of descriptor) and Video_spider (obtaining the spider of video segment) in the system of present embodiment.
Text_spider obtains the descriptor of video file and outputs to Hub from different websites, this descriptor comprises at least: the website of the link information of this video file, the cid of this video file, ownership (being video website A), title, form and size etc.Wherein, the cid of video file is calculated this video file by the method for Text_spider according to the cid of the described calculating video file of preamble.
After Hub obtains the descriptor of the video file that Text_spider obtains, cid with video file is an index, judge that whether the current video description information of files is by the central data store server stores, if not, then directly indicate the current video description information of files of central data store server stores Hub output; If, then handle in the following manner: during the website rank of the video file ownership of having stored being superior to of video website A, indication central data store server covers the descriptor of store video files with the current video description information of files of Hub output.
Video_spider is according to the link information in the descriptor of central data store server stores, calculate the method for video file cid according to Text_spider and calculate the cid (also can directly obtain the cid of central data store server stores) of the video file of this link information correspondence, and output to Hub.
Hub is an index with the cid of video file, in the central data store server, search the number of fragments of the video file corresponding with this cid, the quantity of the video segment that judgement has been stored whether (can set according to actual needs in advance less than preset threshold value by this threshold value, be assumed to be 3 in the present embodiment), if, then return and obtain instruction, if not, then indicate Video_spider no longer to obtain video segment in this video file to Video_spider.
Video_spider receives that Hub sends when obtaining instruction, obtains video segment according to the link information of this video file, and outputs to Hub.
The image histogram of each frame in the video segment that Hub calculating Video_spider obtains, when the image of judging each frame is particular color (as grey black), then this video segment is filtered, and indication Video_spider obtains the video segment at an other place in this video file again, till finding satisfactory video segment.
After Video_spider has obtained satisfactory video segment,, this video segment is calculated, draw the cid of this video segment, and the cid of this video segment is outputed to Hub according to the computing method of the cid of the described video file of preamble.
Hub is an index with the cid of video segment, judges when not storing this video segment in the central data store server, and this video segment and cid thereof are outputed to the central data store server; Hub judges when having stored this video segment in the central data store server, then indicates Video_spider to obtain the video segment at an other place in this video file again.
The central data store server is by the cid of video file, and the cid of descriptor, video segment and the video segment of Hub output is carried out association store.
Need afterwards when the user shows, by representing server represents descriptor from the video file of central data store server stores to the user and video segment.
System embodiment 4: another kind comprises the system of two kinds of Spider.
Comprise Text_spider (obtaining the spider of descriptor) and Video_spider (obtaining the spider of video segment) in the system of present embodiment.
Text_spider obtains the descriptor of video file and outputs to Hub from different websites, this descriptor comprises at least: the website of the link information of this video file, the cid of this video file, ownership (being video website A), title, form and size etc.Wherein, the cid of video file is calculated this video file by the method for Text_spider according to the cid of the described calculating video file of preamble.
After Hub obtains the descriptor of the video file that Text_spider obtains, cid with video file is an index, judge that whether the current video description information of files is by the central data store server stores, if not, then directly indicate the current video description information of files of central data store server stores Hub output; If, then handle in the following manner: during the website rank of the video file ownership of having stored being superior to of video website A, indication central data store server covers the descriptor of store video files with the current video description information of files of Hub output.
Video_spider is according to the link information in the descriptor of central data store server stores, calculate the method for video file cid according to Text_spider and calculate the cid (also can directly obtain the cid of central data store server stores) of the video file of this link information correspondence, and output to Hub.
Hub is an index with the cid of video file, in the central data store server, search the number of fragments of the video file corresponding with this cid, the quantity of the video segment that judgement has been stored whether (can set according to actual needs in advance less than preset threshold value by this threshold value, be assumed to be 3 in the present embodiment), if, then return and obtain instruction, if not, then indicate Video_spider no longer to obtain video segment in this video file to Video_spider.
Video_spider receives that Hub sends when obtaining instruction, obtains video segment according to the link information of this video file.
Video_spider calculates the image histogram of each frame in the video segment that obtains, when the image of judging each frame is particular color (as grey black), then this video segment is filtered, and obtain the video segment at an other place in this video file again, till finding satisfactory video segment.
After Video_spider has obtained satisfactory video segment,, this video segment is calculated, draw the cid of this video segment, and the cid of this video segment is outputed to Hub according to the computing method of the cid of the described video file of preamble.
Hub is an index with the cid of video segment, judges that when not storing this video segment in the central data store server, indication Video_spider outputs to the central data store server with this video segment and cid thereof; Hub judges when having stored this video segment in the central data store server, then indicates Video_spider to obtain the video segment at an other place in this video file again.
The central data store server carries out association store by the cid of video file with the descriptor of Hub output, the video segment of Video_spider output and the cid of video segment.
Need afterwards when the user shows, by representing server represents descriptor from the video file of central data store server stores to the user and video segment.
In sum, the embodiment of the invention is obtained the descriptor of video file from the website, at least the cid that comprises link information and video file in the descriptor, wherein the cid of video file can unique definite video file, obtain the video segment of video file again according to the descriptor obtained, and the descriptor and the video segment of the video file that obtains by the cid association store of video file.Therefore the embodiment of the invention can realize the descriptor and the video segment of association store video file.Further, the descriptor of the video file that embodiment of the invention storage is obtained from highest-ranking website, and the descriptor from the video file of storage to the user and the video segment that represent, because highest-ranking website provides the high-quality video file, therefore finally representing is high-quality video description information of files and video segment to the user.Further, the form of converting video fragment also gets up to present to the user with the descriptor and the video segment accurate match of video file, make the user can be before the foradownloaded video file, know the specifying information of video file in advance, for example, whether the content of video and descriptor mate, the quality of video quality etc., have brought convenience for user's download high-quality video file.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.