CN107977461A - A kind of video feature extraction method and device - Google Patents
A kind of video feature extraction method and device Download PDFInfo
- Publication number
- CN107977461A CN107977461A CN201711390947.8A CN201711390947A CN107977461A CN 107977461 A CN107977461 A CN 107977461A CN 201711390947 A CN201711390947 A CN 201711390947A CN 107977461 A CN107977461 A CN 107977461A
- Authority
- CN
- China
- Prior art keywords
- video
- feature
- hash
- frame
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the present application provides a kind of video feature extraction method and device, and the consecutive frame that the first sets of video frames and each video frame of extraction are obtained by extracting multiple video frame from target video obtains the second sets of video frames.Then, first sets of video frames and the second sets of video frames are combined to form the feature that contiguous frames extract set input depth convolutional network each video frame.Finally, the video features Hash layer that the feature access for extracting obtained each video frame is made of loss function and classification cross entropy loss function sigmoid activation primitives, contiguous frames is calculated, obtains the Hash feature of the target video.Make the feature representation of video more can reflecting video content information, and effectively raise the precision and utilization ratio of video features expression.
Description
Technical field
This application involves technical field of video processing, in particular to a kind of video feature extraction method and device.
Background technology
Hash coding is carried out to video can apply in the relevant neck of the videos such as video frequency searching, Video clustering, video compress
Domain.Whether the Hash feature that existing video Hash feature extracting method obtains effective, is determined by two factors, one because
Element is the whether effective representing video content of the video features extracted, and another factor is the accuracy of hash algorithm.Therefore, how
Design video Hash feature extracting method so that the feature representation of video more can reflecting video content information, lifted video
The precision and utilization rate of feature representation, are the big problems for needing to study at present.
The content of the invention
In view of this, the purpose of the application is to provide a kind of video feature extraction method and device, so that the spy of video
Sign expression more can reflecting video content information, and effectively raise video features expression precision and utilization ratio.
In order to achieve the above object, the embodiment of the present application adopts the following technical scheme that:
On the one hand, the application provides a kind of video feature extraction method, including:
Multiple video frame are extracted from target video to obtain the first sets of video frames and extract each video frame
Consecutive frame obtains the second sets of video frames;
First sets of video frames and the second sets of video frames are combined to form contiguous frames to set input depth convolution
Network extracts the feature of each video frame;
To extract the obtained feature of each video frame access by sigmoid activation primitives, contiguous frames to loss function, with
And the video features Hash layer that classification cross entropy loss function is formed is calculated, and obtains the Hash feature of the target video;
Wherein, the contiguous frames represent as follows to loss function:
Wherein, f1It is the contiguous frames to the first set of video in set
The feature representation of conjunction, f2Feature representation for the contiguous frames to the second sets of video frames in set, m for default constraint because
Son.
On the other hand, the application provides a kind of video feature extraction device, including:
Neighbouring frame extraction module, obtains the first sets of video frames and carries for extracting multiple video frame from target video
The consecutive frame of each video frame is taken to obtain the second sets of video frames;
Convolutional network processing module, for first sets of video frames and the second sets of video frames are combined to be formed it is neighbouring
Frame extracts set input depth convolutional network the feature of each video frame;And
Hash feature calculation module, letter is activated for accessing the feature for extracting obtained each video frame by sigmoid
The video features Hash layer that number, contiguous frames form loss function and classification cross entropy loss function calculates, and obtains institute
State the Hash feature of target video;
Wherein, the contiguous frames represent as follows to loss function:
Wherein, f1It is the contiguous frames to the first set of video in set
The feature representation of conjunction, f2Feature representation for the contiguous frames to the second sets of video frames in set, m for default constraint because
Son.
Compared to the prior art, video feature extraction method and device provided by the embodiments of the present application, for adjacent in video
The similar characteristic of nearly two field picture, design contiguous frames are to loss function so that and the feature representation difference of neighbouring interframe minimizes, so that
Make the feature representation of video more can reflecting video content information.Secondly, the classification based on video, it is proposed that a brand-new method
For finding for the maximum Hash coding site of current class feature representation contribution, video features expression is effectively raised
Precision and utilization ratio.
Brief description of the drawings
, below will be to needed in the embodiment attached in order to illustrate more clearly of the technical solution of the embodiment of the present application
Figure is briefly described, it will be appreciated that the following drawings illustrate only some embodiments of the application, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of flow chart of video feature extraction method provided by the embodiments of the present application.
Fig. 2 is the sub-process figure of the step S101 shown in Fig. 1.
Fig. 3 is the process schematic provided by the embodiments of the present application that video features are extracted by depth Hash network structure.
Fig. 4 be it is provided by the embodiments of the present application it is a kind of be intra-pair loss functions schematic diagram.
Fig. 5 is showing according to the Hamming distance of Hash feature progress similar video retrieval tasks provided by the embodiments of the present application
It is intended to.
Fig. 6 is the side of the video processing equipment provided by the embodiments of the present application for being used for realization above-mentioned video feature extraction method
Frame schematic diagram.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical solution in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is
The part of the embodiment of the application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real
Applying the component of example can be arranged and designed with a variety of configurations.
Therefore, below the detailed description of the embodiments herein to providing in the accompanying drawings be not intended to limit it is claimed
Scope of the present application, but be merely representative of the selected embodiment of the application.It is common based on the embodiment in the application, this area
Technical staff's all other embodiments obtained without creative efforts, belong to the model of the application protection
Enclose.
It should be noted that:Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
Referring to Fig. 1, it is a kind of flow chart of video feature extraction method provided by the embodiments of the present application.With reference to figure
1, each step S101-S103 of the video feature extraction method is described in detail.
Step S101, multiple video frame are extracted from target video and are obtained described in the first sets of video frames and extraction each
The consecutive frame of video frame obtains the second sets of video frames.
In detail, as shown in Fig. 2, step S101 can be realized by following sub-step S111 and S112.
In step S111, described first is obtained according to the multiple video frame of default frame pitch extraction from the target video
Sets of video frames.
In step S112, extracted from the target video according to the default frame pitch in the first sets of video frames often
The consecutive frame of a video frame forms second sets of video frames.
In an example, such as shown in Fig. 3, can be extracted first from the target video according to the default frame pitch
Go out four video frame such as A, B, C, D.First sets of video frames is formed by four video frame.Then, regarded from the target
Consecutive frame A ', B ', C ', the D ' for extracting described tetra- video frame of A, B, C, D in frequency again according to the default frame pitch form institute
State the second sets of video frames.For example, consecutive frame (the former frame of such as A of video frame A first according to the video frame A, can be found
Or a later frame) A ', then B, C, D consecutive frame B ', C ', D ' are found respectively according to the default frame pitch since A ', so that
Combination forms second sets of video frames.
Step S102, first sets of video frames and the second sets of video frames are combined to form contiguous frames to set input
Depth convolutional network extracts the feature of each video frame.Such as shown in Fig. 3, regarded obtaining first sets of video frames and second
After frequency frame set, first sets of video frames and the second sets of video frames will be combined to form neighbouring video frame pair, input is deep
Convolutional network is spent, carries out the extraction of video frame feature.
Step S103, the feature for extracting obtained each video frame is accessed by sigmoid activation primitives, contiguous frames to damage
Lose the video features Hash layer that function and classification cross entropy loss function are formed to be calculated, obtain the target video
Hash feature.
Wherein, the contiguous frames represent as follows to loss function:
Wherein, f1It is the contiguous frames to the first set of video in set
The feature representation of conjunction, f2Feature representation for the contiguous frames to the second sets of video frames in set, m for default constraint because
Son.
In the embodiment of the present invention, the contiguous frames are the letter specially designed according to Hash characteristic extraction procedure to loss function
Number, also referred to as intra-pair loss functions, by the feature representation between the function pair contiguous frames, one about in addition
Beam so that the feature representation between same video contiguous frames is as similar as possible.As shown in figure 4, it is intra-pair loss functions
Schematic diagram.Wherein, grid 1-8 represents the feature representation that the intervals of video arranged in order cuts frame, extracts neighbouring frame structure
The feature of each frame group is produced into intra-pair input networks, difference is allowed to as far as possible to using restraint between each pair feature
Small, because the content information of contiguous frames is all identical in video, and the image information of shallow-layer has difference, so by
Video features after intra-pair loss function constraints more can reflecting video content level information, and to shallow-layer information not
It is sensitive.
In detail, in step s 103, obtained each video frame can will be extracted by the sigmoid activation primitives
Feature Mapping to 0 to 1 section, the output of sigmoid activation primitives is mapped in binary coding according to given threshold
Form Hash codes and export.Wherein, the given threshold can be 0.5.
, can be by following in the application stage of Hash feature after said extracted obtains the Hash feature of target video
Mode carries out:
First, the Hash codes by sigmoid activation primitives output generate target two by a category mask matrix
System Hash codes, the category mask matrix are the matrixes of a M*N, and wherein M is the video classification number being previously obtained, and N is
Hash code length.Each video classification corresponds to the weight parameter of an a length of Hash code length, these weight parameters be used to obtain
The critical positions corresponding to the most important Hash feature of current video are taken, these critical positions are related with video generic, often
The corresponding critical positions of a classification are different.
Then, the classification belonging to institute's target video is obtained, and corresponding multiple weight parameters are obtained according to the category.Obtaining
After obtaining corresponding multiple weight parameters, it is ranked up after can multiple weight parameters be taken absolute value, is finally tied according to sequence
The corresponding position of Hash codes of predetermined number, the Kazakhstan as the target video in the Hash feature of the fruit acquisition target video
The target location of uncommon feature representation.
In detail, these above-mentioned weight parameters can be taken and descending sequence is carried out after value of fighting to the finish, wherein absolute value compared with
The Hash code position of big parameter is considered as the weight expressed by current video (such as the target video) Hash feature more
Want position (target location).Finally, it can set a special ratios according to ranking results and take out the preceding some of length-specific
Position critical positions, as the target location.For example, the desirable 20% corresponding Hash code position of weight parameter of being arranged in front is made
For the target location.
The shape of the classification matrix is classification number * Hash code lengths.For example, it is assumed that classification is 101, and Hash codes
Length is 512, then shape is exactly 101*512.So, pair taken out according to a certain classification (classification of such as described target video)
It is exactly 1*512 to answer weight parameter number.Can be according to the special ratios after carrying out absolute value sequence to this 512 parameters
Remove the corresponding target location of some weight parameters.
Finally, determined based on above-mentioned target location, in the application stage of video Hash feature, as shown in figure 5, can basis
The Hamming distance of Hash feature carries out the task of similar video retrieval, only corresponding in video to inquiring about being retrieved in data set
Most important Hash feature locations are retrieved, and can effectively improve retrieval precision.
For example, the mode of retrieval can be corresponded to by calculating target location described in the target video (inquiry video)
Hash feature and retrieval data set in each video in Hamming distance between Hash feature corresponding with the target location
From then according to the Hamming distance being calculated in the retrieval data Integrated query and the associated video of the target video.
Inventor verifies by real case, obtains the Hash codes of the corresponding category mask matrix generation of different weight coefficients
Accuracy rate (precision)-recall rate (recall) curve when being retrieved in UCF101 [1] sets of video data is analyzed,
It was found that when weight coefficient is 0.3~0.4, the retrieval effectiveness of Hash codes is best, far above the retrieval without using category mask matrix
Effect.It is also more excellent compared to the retrieval effectiveness without using category mask matrix when weight coefficient is other values.
As shown in fig. 6, it is the Video processing provided in an embodiment of the present invention for being used for realization the video feature extraction method
The schematic diagram of equipment 100.
The video processing equipment 100.The video processing equipment 100 may be, but not limited to, PC
(personal computer, PC), laptop, server etc. possess the computer equipment of video analysis and disposal ability.
The video processing equipment 100 further includes video feature extraction device 11, memory 12 and processor 13.This hair
In bright preferred embodiment, video feature extraction device 11 can be deposited including at least one in the form of software or firmware (firmware)
It is stored in the memory 12 or is solidificated in the operating system (operating system, OS) of the video processing equipment 100
In software function module.The processor 13 is used to perform the executable software module stored in the memory 12, such as
Software function module and computer program included by the video feature extraction device 11 etc..In the present embodiment, the video
Feature deriving means 11 can also be integrated in the operating system, the part as the operating system.Specifically, it is described
Video feature extraction device 11 includes neighbouring frame extraction module 111, convolutional network processing module 112 and Hash feature calculation mould
Block 113.It should be noted that in other embodiments, in the above-mentioned function module that the video feature extraction device 11 includes
A portion can also omit, or it can also include other more function modules.Below to above-mentioned each function module
It is described in detail.
The neighbouring frame extraction module 111 obtains the first sets of video frames for extracting multiple video frame from target video
And the consecutive frame of each video frame of extraction obtains the second sets of video frames.
In detail, the neighbouring frame extraction module 111 can be used for performing above-mentioned steps S101.The neighbouring frame extraction module
111 can extract multiple video frame first from the target video according to default frame pitch obtains first sets of video frames;
Then the consecutive frame of each video frame in the first sets of video frames is extracted according to the default frame pitch from the target video
Form second sets of video frames.
The convolutional network processing module 112 is used to first sets of video frames and the second sets of video frames combining shape
The feature of each video frame is extracted to set input depth convolutional network into contiguous frames.In detail, the convolutional network processing mould
Block 112 can be used for performing above-mentioned steps S102, on the detailed content of the module, can join the description to step S102.
The Hash feature calculation module 113 is used to access the feature for extracting obtained each video frame by sigmoid
The video features Hash layer that activation primitive, contiguous frames form loss function and classification cross entropy loss function calculates,
Obtain the Hash feature of the target video.
Wherein, the contiguous frames represent as follows to loss function:
Wherein, f1It is the contiguous frames to the first video in set
The feature representation of frame set, f2Feature representation for the contiguous frames to the second sets of video frames in set, m are default constraint
The factor.
In the present embodiment, the feature for extracting obtained each video frame can be reflected by the sigmoid activation primitives
0 to 1 section is mapped to, the output of sigmoid activation primitives is mapped in binary coding according to given threshold and forms Hash
Code simultaneously exports.Wherein, the given threshold can be 0.5.
In detail, the Hash feature calculation module 113 can be used for performing above-mentioned steps S103, on the detailed of the module
Content, can join the description to step S103.
In the present embodiment, then as shown in fig. 6, the video feature extraction device 11 can also include Hash characteristic key mould
Block 114, for the Hash codes that the sigmoid activation primitives export to be generated object binary by a category mask matrix
Hash codes, the category mask matrix are the matrixes of a M*N, and wherein M is the video classification number being previously obtained, and N is Hash
Code length, each classification correspond to the weight parameter of an a length of Hash code length;Then, the class according to belonging to the target video
Not, corresponding multiple weight parameters are obtained;Finally, after being ranked up after multiple weight parameters are taken absolute value, according to row
The corresponding position of Hash codes of predetermined number in the Hash feature of the sequence result acquisition target video, as the target video
Hash feature representation target location.
In addition, in the present embodiment, the video feature extraction device 11 can also include query video module 115, be used for
It is every in data set with retrieving by calculating the corresponding Hash feature in target location described in the target video (inquiry video)
Hamming distance in a video between Hash feature corresponding with the target location, then according to the Hamming distance being calculated
In the associated video of the retrieval data Integrated query and the target video.
In conclusion video feature extraction method and device provided by the embodiments of the present application, for contiguous frames figure in video
As similar characteristic, design contiguous frames are to loss function so that the feature representation difference of neighbouring interframe minimizes, so that video
Feature representation more can reflecting video content information.Secondly, the classification based on video, it is proposed that a brand-new method is used for looking for
To the precision for for the maximum Hash coding site of current class feature representation contribution, effectively raising video features expression and
Utilization ratio.
In the embodiment that the application provides, it will be appreciated that disclosed apparatus and method, can also be by others side
Formula is realized.Embodiments described above is only schematical, for example, the flow chart and block diagram in attached drawing are shown according to this
Apply for device, method and computer program product architectural framework in the cards, function and the operation of embodiment.At this point,
Each square frame in flow chart or block diagram can represent a part for a module, program segment or code, the module, program segment
Or the part protection one or more of code is used for realization the executable instruction of corresponding logic function.
Furthermore, it should also be noted that at some as in the implementation replaced, the function that is marked in square frame can also be with
Send out and occur different from the order marked in attached drawing.For example, two continuous square frames can essentially be performed in parallel, they have
When can also perform in the opposite order, this is depending on involved function.It should also be noted that in block diagram and/or flow chart
Each square frame and block diagram and/or the square frame in flow chart combination, the special of function or action as defined in performing can be used
Hardware based system realize, or can be realized with the combination of specialized hardware and computer instruction.
In addition, each function module in each embodiment of the application can integrate to form an independent portion
Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is used as independent production marketing or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, the technical solution of the application is substantially in other words
The part to contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
People's computer, server, or network equipment etc.) perform each embodiment the method for the application all or part of step.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), disk or CD etc. are various can be with the medium of store program codes.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those
Element, but also including other elements that are not explicitly listed, or further include as this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there are other identical element in process, method, article or equipment including the key element.
Finally, it should be noted that the foregoing is merely the application preferred embodiment, be not limited to this Shen
Please, for those skilled in the art, the application can have various modifications and variations.It is all in spirit herein and principle
Within, any modification, equivalent replacement, improvement and so on, should be included within the protection domain of the application.
Claims (10)
- A kind of 1. video feature extraction method, it is characterised in that this method includes:Multiple video frame are extracted from target video to obtain the first sets of video frames and extract the adjacent of each video frame Frame obtains the second sets of video frames;First sets of video frames and the second sets of video frames are combined to form contiguous frames to set input depth convolutional network Extract the feature of each video frame;The feature for extracting obtained each video frame is accessed by sigmoid activation primitives, contiguous frames to loss function and class The video features Hash layer that other cross entropy loss function is formed is calculated, and obtains the Hash feature of the target video;Wherein, the contiguous frames represent as follows to loss function:Wherein, f1It is the contiguous frames to the first sets of video frames in set Feature representation, f2Feature representation for the contiguous frames to the second sets of video frames in set, m are default constraint factor.
- 2. video feature extraction method as claimed in claim 1, it is characterised in that described that multiple regard is extracted from target video Frequency frame obtains the first sets of video frames and the consecutive frame of each video frame of extraction obtains the second sets of video frames, including:Multiple video frame are extracted according to default frame pitch obtain first sets of video frames from the target video;The consecutive frame of each video frame in the first sets of video frames is extracted according to the default frame pitch from the target video Form second sets of video frames.
- 3. video feature extraction method as claimed in claim 1, it is characterised in that described to extract obtained each video frame The video that is made of sigmoid activation primitives, contiguous frames to loss function and classification cross entropy loss function of feature access Feature Hash layer is calculated, and obtains the Hash feature of the target video, the step of include:The obtained Feature Mapping of each video frame can will be extracted to 0 to 1 section, root by the sigmoid activation primitives The output of sigmoid activation primitives is mapped in binary coding according to given threshold and forms Hash codes and exports.
- 4. video feature extraction method as claimed in claim 3, it is characterised in that this method further includes:The Hash codes of sigmoid activation primitives output are generated into object binary Hash by a category mask matrix Code, the category mask matrix is the matrix of a M*N, and wherein M is the video classification number being previously obtained, and N is Hash code length Degree, each classification correspond to the weight parameter of an a length of Hash code length;The classification belonging to the target video is obtained, and corresponding multiple weight parameters are obtained according to the category;It is ranked up after multiple weight parameters are taken absolute value;The corresponding position of Hash codes of predetermined number in the Hash feature of the target video is obtained according to ranking results, as institute State the target location of the Hash feature representation of target video.
- 5. the video feature extraction method as described in claim 1-4 any one, it is characterised in that this method further includes:Calculate the corresponding Hash feature in target location in the target video with retrieve data set in each video in it is described Hamming distance between the corresponding Hash feature in target location;AndAccording to the Hamming distance being calculated in the retrieval data Integrated query and the associated video of the target video.
- A kind of 6. video feature extraction device, it is characterised in that including:Neighbouring frame extraction module, for being extracted from target video, multiple video frame obtain the first sets of video frames and extraction is every The consecutive frame of a video frame obtains the second sets of video frames;Convolutional network processing module, for combining and to form contiguous frames pair first sets of video frames and the second sets of video frames Set input depth convolutional network extracts the feature of each video frame;AndHash feature calculation module, for accessing the feature for extracting obtained each video frame by sigmoid activation primitives, neighbour The video features Hash layer that nearly frame forms loss function and classification cross entropy loss function calculates, and obtains the mesh Mark the Hash feature of video;Wherein, the contiguous frames represent as follows to loss function:Wherein, f1It is the contiguous frames to the first sets of video frames in set Feature representation, f2Feature representation for the contiguous frames to the second sets of video frames in set, m are default constraint factor.
- 7. video feature extraction device as claimed in claim 6, it is characterised in that the neighbouring frame extraction module is by from institute State in target video and obtain first sets of video frames according to the multiple video frame of default frame pitch extraction;And by from described The consecutive frame for extracting each video frame in the first sets of video frames in target video according to the default frame pitch forms described the Two sets of video frames.
- 8. video feature extraction device as claimed in claim 6, it is characterised in that the Hash feature calculation module passes through institute The section of the obtained Feature Mapping of each video frame to 0 to 1 can will be extracted by stating sigmoid activation primitives, according to given threshold The output of sigmoid activation primitives is mapped in binary coding and forms Hash codes and exports.
- 9. video feature extraction device as claimed in claim 8, it is characterised in that further include:Hash characteristic key module, is used for:The Hash codes of sigmoid activation primitives output are generated into object binary Hash by a category mask matrix Code, the category mask matrix is the matrix of a M*N, and wherein M is the video classification number being previously obtained, and N is Hash code length Degree, each classification correspond to the weight parameter of an a length of Hash code length;The classification belonging to the target video is obtained, and corresponding multiple weight parameters are obtained according to the category;It is ranked up after multiple weight parameters are taken absolute value;The corresponding position of Hash codes of predetermined number in the Hash feature of the target video is obtained according to ranking results, as institute State the target location of the Hash feature representation of target video.
- 10. the video feature extraction device as described in claim 6-9 any one, it is characterised in that further include:Query video module, for calculating the corresponding Hash feature in target location in the target video with retrieving in data set Hamming distance in each video between Hash feature corresponding with the target location, and according to the Hamming distance being calculated In the associated video of the retrieval data Integrated query and the target video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711390947.8A CN107977461A (en) | 2017-12-21 | 2017-12-21 | A kind of video feature extraction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711390947.8A CN107977461A (en) | 2017-12-21 | 2017-12-21 | A kind of video feature extraction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107977461A true CN107977461A (en) | 2018-05-01 |
Family
ID=62007123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711390947.8A Pending CN107977461A (en) | 2017-12-21 | 2017-12-21 | A kind of video feature extraction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107977461A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110689023A (en) * | 2019-08-15 | 2020-01-14 | 平安科技(深圳)有限公司 | Reliable combination feature extraction method and device, computer equipment and storage medium |
CN111737519A (en) * | 2020-06-09 | 2020-10-02 | 北京奇艺世纪科技有限公司 | Method and device for identifying robot account, electronic equipment and computer-readable storage medium |
CN112560552A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Video classification method and device |
CN113343025A (en) * | 2021-08-05 | 2021-09-03 | 中南大学 | Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram |
TWI750498B (en) * | 2019-02-14 | 2021-12-21 | 開曼群島商創新先進技術有限公司 | Method and device for processing video stream |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101635843A (en) * | 2008-07-23 | 2010-01-27 | 北京大学 | Method and system for extracting, seeking and comparing visual patterns based on frame-to-frame variation characteristics |
US20160358036A1 (en) * | 2011-05-18 | 2016-12-08 | Microsoft Technology Licensing, Llc | Searching for Images by Video |
CN106407352A (en) * | 2016-09-06 | 2017-02-15 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Traffic image retrieval method based on depth learning |
US20170083770A1 (en) * | 2014-12-19 | 2017-03-23 | Amazon Technologies, Inc. | Video segmentation techniques |
CN106886768A (en) * | 2017-03-02 | 2017-06-23 | 杭州当虹科技有限公司 | A kind of video fingerprinting algorithms based on deep learning |
-
2017
- 2017-12-21 CN CN201711390947.8A patent/CN107977461A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101635843A (en) * | 2008-07-23 | 2010-01-27 | 北京大学 | Method and system for extracting, seeking and comparing visual patterns based on frame-to-frame variation characteristics |
US20160358036A1 (en) * | 2011-05-18 | 2016-12-08 | Microsoft Technology Licensing, Llc | Searching for Images by Video |
US20170083770A1 (en) * | 2014-12-19 | 2017-03-23 | Amazon Technologies, Inc. | Video segmentation techniques |
CN106407352A (en) * | 2016-09-06 | 2017-02-15 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Traffic image retrieval method based on depth learning |
CN106886768A (en) * | 2017-03-02 | 2017-06-23 | 杭州当虹科技有限公司 | A kind of video fingerprinting algorithms based on deep learning |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI750498B (en) * | 2019-02-14 | 2021-12-21 | 開曼群島商創新先進技術有限公司 | Method and device for processing video stream |
CN110689023A (en) * | 2019-08-15 | 2020-01-14 | 平安科技(深圳)有限公司 | Reliable combination feature extraction method and device, computer equipment and storage medium |
CN110689023B (en) * | 2019-08-15 | 2024-01-16 | 平安科技(深圳)有限公司 | Reliable combination feature extraction method, device, computer equipment and storage medium |
CN112560552A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Video classification method and device |
CN111737519A (en) * | 2020-06-09 | 2020-10-02 | 北京奇艺世纪科技有限公司 | Method and device for identifying robot account, electronic equipment and computer-readable storage medium |
CN111737519B (en) * | 2020-06-09 | 2023-10-03 | 北京奇艺世纪科技有限公司 | Method and device for identifying robot account, electronic equipment and computer readable storage medium |
CN113343025A (en) * | 2021-08-05 | 2021-09-03 | 中南大学 | Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram |
CN113343025B (en) * | 2021-08-05 | 2021-11-02 | 中南大学 | Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107977461A (en) | A kind of video feature extraction method and device | |
CN112434169B (en) | Knowledge graph construction method and system and computer equipment thereof | |
CN107895038B (en) | Link prediction relation recommendation method and device | |
CN106909643A (en) | The social media big data motif discovery method of knowledge based collection of illustrative plates | |
CN109166615B (en) | Medical CT image storage and retrieval method based on random forest hash | |
Goloboff et al. | Phylogenetic morphometrics (II): algorithms for landmark optimization | |
CN111325030A (en) | Text label construction method and device, computer equipment and storage medium | |
Winter et al. | Fast indexing strategies for robust image hashes | |
CN104156464A (en) | Micro-video retrieval method and device based on micro-video feature database | |
CN111143547A (en) | Big data display method based on knowledge graph | |
Wei et al. | Semantic pixel labelling in remote sensing images using a deep convolutional encoder-decoder model | |
Rusdi et al. | Reconstruction of medical images using artificial bee colony algorithm | |
CN110674301A (en) | Emotional tendency prediction method, device and system and storage medium | |
CN115577701A (en) | Risk behavior identification method, device, equipment and medium for big data security | |
CN113239268A (en) | Commodity recommendation method, device and system | |
CN107070932B (en) | Anonymous method for preventing label neighbor attack in social network dynamic release | |
CN113094465A (en) | Method and system for checking duplicate of design product | |
CN108090117A (en) | A kind of image search method and device, electronic equipment | |
CN108805280A (en) | A kind of method and apparatus of image retrieval | |
CN104657391A (en) | Page processing method and device | |
CN115238075B (en) | Text sentiment classification method based on hypergraph pooling | |
CN114943285B (en) | Intelligent auditing system for internet news content data | |
Rui et al. | Data Reconstruction based on supervised deep auto-encoder | |
Neelima et al. | Optimal clustering based outlier detection and cluster center initialization algorithm for effective tone mapping | |
CN116720009A (en) | Social robot detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180501 |