CN106708929A - Video program searching method and device - Google Patents
Video program searching method and device Download PDFInfo
- Publication number
- CN106708929A CN106708929A CN201611019485.4A CN201611019485A CN106708929A CN 106708929 A CN106708929 A CN 106708929A CN 201611019485 A CN201611019485 A CN 201611019485A CN 106708929 A CN106708929 A CN 106708929A
- Authority
- CN
- China
- Prior art keywords
- matrix
- vector
- description
- frequency program
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims abstract description 224
- 239000013598 vector Substances 0.000 claims abstract description 215
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 238000010276 construction Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 13
- 238000000354 decomposition reaction Methods 0.000 claims description 11
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000012549 training Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a video program searching method, which comprises the following steps: receiving a description entry for describing a video program and a video category to which the video program belongs, which are input by a user; selecting a potential semantic index model corresponding to the video category, and constructing a query vector of the description entry according to a construction mode of an index matrix of the semantic index model; calculating the cosine similarity of each column of vectors of the index matrix and the query vector according to the potential semantic index model; and sorting the cosine similarity obtained by calculation from large to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user. Correspondingly, the invention also discloses a video program searching device. By adopting the embodiment of the invention, the potential semantics of the document can be mined, and the accuracy and the searching efficiency of searching the video program are improved.
Description
Technical field
The present invention relates to computer realm, more particularly to video frequency program searching method and device.
Background technology
When variety show recommendation is done, ContentBase methods are a kind of important strategies, mainly by variety content
The similarity of description carries out cluster recommendation, and this method is clustered the close text of content, existing to be mainly based upon TF-
The Rocchio algorithms of IDF, Rocchio algorithms are theoretical from vector space model, vector space model Vector space
The basic thought of model is that a text is represented using vector, and processing procedure afterwards can just be converted into vector in space
Computing.The process of Rocchio Algorithm for Training, is exactly in fact the process for setting up category feature vector, for given one not
Know text, generate the vector of the text, then calculate the vectorial similarity with characteristic vector of all categories, finally by this article one's duty
To in the classification most like with it.
But use above-mentioned algorithm to exist with shortcoming:Rocchio algorithms cannot excavate the potential applications of document.2nd, it is false
If training data is absolutely correct, because whether it does not have any quantitative measurement sample containing noisy mechanism, thus also
Has no resistance to wrong data.
The content of the invention
The searching method and device of a kind of video frequency program that the embodiment of the present invention is proposed, can excavate the potential language of document
Justice, improves the degree of accuracy and the search efficiency of search video frequency program.
A kind of searching method of video frequency program provided in an embodiment of the present invention, including:
Receive the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
The Vector Space Model corresponding with the video classification is chosen, and according to the rope of the semantic indexing model
Draw the building mode of matrix, build the query vector of the description entry;Wherein, the Vector Space Model is to by retouching
The description constructed index matrix of document for stating the other video frequency program of same video class carries out singular value decomposition and obtains;
According to the Vector Space Model, each column vector and the query vector of the index matrix are calculated
Cosine similarity;
To calculating the sequence that the cosine similarity for obtaining carries out from big to small, and choose sequence number and belong to interval remaining of sequence
The corresponding video frequency program of column vector of string similarity is supplied to the user.
Further, the process for being built into index matrix by the description document of description video frequency program includes:It is crucial by i-th
The numerical value of i-th element that the word frequency that word occurs in j-th description document of video frequency program is arranged as the jth of index matrix;
The process for building the query vector of the description entry includes:I-th element for setting the query vector is represented
Keyword and the index matrix the keyword that represents of the i-th row element it is identical, and the corresponding keyword of i-th element is existed
In the description entry occur word frequency as i-th element of the query vector numerical value;Wherein, the query vector is
Column vector.
Further, the description document by describing the same other video frequency program of video class is built into the process of index matrix,
Specially:
The description same other video frequency program of video class for database purchase is described document, according to standard entry
Form, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video class
Other description document, a description document describes a video frequency program, the mutual not phase of video frequency program that different description documents is described
Together;
Call participle instrument;
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain first
Word collection;
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Its
In, the row order of the index matrix is carried out from high to low in total word frequency that the be described document occurs according to keyword
Arrangement, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out from high to low
Arrangement.
Further, the query vector for building the description entry, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;
Call participle instrument;
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the description entry is built
Query vector.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent
It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H
Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square
Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
It is described according to the Vector Space Model, calculate each column vector of the index matrix with it is described inquire about to
The cosine similarity of amount, specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by square
The preceding K of battle array T arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is by matrix D
Preceding K arranges the matrix to be formed;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposition square of the query vector
Battle array QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix jth row to
Cosine similarity between two row vectors of amount, as the index matrix HKJth column vector and the query vector Q more than
String similarity.
Further, the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with regarding belonging to the new video frequency program
The corresponding Vector Space Model of frequency classification is updated.
Correspondingly, the embodiment of the present invention provides a kind of searcher of video frequency program, including:
User profile receiver module, the description entry and the video section of the description video frequency program for receiving user input
Video classification belonging to mesh;
Query vector builds module, for choosing the Vector Space Model corresponding with the video classification, and root
According to the building mode of the index matrix of the semantic indexing model, the query vector of the description entry is built;Wherein, it is described latent
It is that the description constructed index matrix of document by describing the same other video frequency program of video class is entered in semantic indexing model
Row singular value decomposition and obtain;
Similarity calculation module, for according to the Vector Space Model, calculating each row of the index matrix
The vectorial cosine similarity with the query vector;
Video frequency program chooses module, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and chooses
The corresponding video frequency program of column vector that sequence number belongs to the interval cosine similarity of sequence is supplied to the user.
Further, the query vector build that module includes for being built according to the description document of description video frequency program
Into the unit of index matrix, specifically for:The word frequency that i-th keyword is occurred in j-th description document of video frequency program
The numerical value of i-th element arranged as the jth of index matrix;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for:
The keyword that the keyword of i-th element representative of the query vector is represented with the i-th row element of the index matrix is set
It is identical, and the word frequency that the corresponding keyword of i-th element is occurred in the description entry is used as the i-th of the query vector
The numerical value of individual element;Wherein, the query vector is column vector.
Further, the query vector builds module and includes for according to the same other video frequency program of video class of description
Description document is built into the unit of index matrix, specially:
First Format adjusting unit, for all of the same other video frequency program of video class of description for database purchase
Description document, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, it is described
Database purchase has the description document of various video classification, and a description document describes a video frequency program, different description texts
The video frequency program of shelves description is different;
First instrument call unit, for calling participle instrument;
First participle unit, for using the participle instrument to Format adjusting after described be described document entry
Participle is carried out, the first word collection is obtained;
First keyword extracting unit, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit, for what is occurred in each describes document according to each keyword for being extracted
Word frequency, index building matrix;Wherein, the row order of the index matrix is that occurred in the be described document according to keyword
The arrangement that carries out from high to low of total word frequency, the row order of the index matrix goes out according to keyword in each describes document
Existing word frequency carries out arrangement from high to low.
Further, the query vector builds module also includes the list of the query vector for building the description entry
Unit, specially:
Second Format adjusting unit, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit, for calling participle instrument;
Second participle unit, for using the participle instrument to Format adjusting after the description entry carry out participle,
Obtain the second word collection;
Second keyword extracting unit, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit, for the word occurred in the description entry according to each keyword for being extracted
Frequently, the query vector of the description entry is built.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent
It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H
Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square
Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
The similarity calculation module is specifically included:
Model revises unit, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*
DK T;Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKBe by the preceding K diagonal entry of matrix S formed to angular moment
Battle array, DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit, for the index matrix H for the revised Vector Space ModelK, calculate described in look into
Ask the transposed matrix Q of vectorTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained
Cosine similarity between two row vectors of the jth row vector of matrix, as the index matrix HKJth column vector with it is described
The cosine similarity of query vector Q.
Further, the searcher also includes:
Model modification module is pair new with described during for the description document for increasing the new video frequency program of description when database
Video frequency program belonging to the corresponding Vector Space Model of video classification be updated.
Implement the embodiment of the present invention, have the advantages that:
The searching method and device of video frequency program provided in an embodiment of the present invention, by calculate to search for the inquiry of video to
Amount and the cosine similarity of each column vector of the index matrix of Vector Space Model, can obtain the description of video to be searched for
Degree of correlation between the description document that each column vector of entry and index matrix is represented, numerical value is higher, then degree of correlation is got over
Height, and then the video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to latent
Semantic indexing model be according to description video frequency program description document build (training) into, the potential language of document can be excavated
Justice, improves the degree of accuracy of search video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, choosing
Select Vector Space Model corresponding with the video classification to be calculated, can further improve the effect of search video frequency program
Rate.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of one embodiment of the searching method of the video frequency program that the present invention is provided;
Fig. 2 is the structural representation of one embodiment of the searcher of the video frequency program that the present invention is provided;
Fig. 3 is the knot of one embodiment of the query vector structure module of the searcher of the video frequency program that the present invention is provided
Structure schematic diagram;
Fig. 4 is the structure of one embodiment of the similarity calculation module of the searcher of the video frequency program that the present invention is provided
Schematic diagram.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
It is the schematic flow sheet of one embodiment of the searching method of the video frequency program that the present invention is provided referring to Fig. 1;This is searched
Suo Fangfa, including step S1 to S4, specially:
S1, receives the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
S2, chooses the Vector Space Model corresponding with the video classification, and according to the semantic indexing model
Index matrix building mode, build it is described description entry query vector;Wherein, the Vector Space Model is right
Singular value decomposition is carried out by the description constructed index matrix of document for describing the same other video frequency program of video class and is obtained
's;The numerical value of i-th element of the jth row of the index matrix represents i-th keyword in j-th description text of video frequency program
The word frequency occurred in shelves;The query vector is column vector, the keyword that i-th element of the query vector is represented with it is described
The keyword that i-th row element of index matrix is represented is identical, and the numerical value of i-th element of the query vector represents described the
The word frequency that the corresponding keyword of i element occurs in the description entry;
S3, according to the Vector Space Model, calculate each column vector of the index matrix with it is described inquire about to
The cosine similarity of amount;
S4, to calculating the sequence that the cosine similarity for obtaining carries out from big to small, and choosing sequence number, to belong to sequence interval
The corresponding video frequency program of column vector of cosine similarity be supplied to the user.
It should be noted that the index matrix by calculating the query vector with Vector Space Model that to search for video
Each column vector cosine similarity, each column vector that can obtain description entry and the index matrix of video to be searched for represents
Description document between degree of correlation, numerical value is higher, then degree of correlation is higher, and then will to describe entry degree of correlation high with this
Description document corresponding to video program recommendation to user, and due to Vector Space Model be according to description video frequency program
Description document build (training) into, can excavate the potential applications of document, improve the degree of accuracy for searching for video frequency program.Separately
Outward, by the video classification belonging to the video frequency program of user input, potential applications rope corresponding with the video classification is selected
Draw model to be calculated, can further improve the efficiency of search video frequency program.Wherein, above-mentioned sequence interval is generally preferred to
10 sequences number being arranged in front.
Further, being built into according to the description document for describing the same other video frequency program of video class in above-mentioned steps S2
The process of index matrix, specially:
The description same other video frequency program of video class for database purchase is described document, according to standard entry
Form, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video class
Other description document, a description document describes a video frequency program, the mutual not phase of video frequency program that different description documents is described
Together;For the Format adjusting to entry, can be, but not limited to, the small letter in entry is unified into capitalization, to unnecessary in entry
Space is deleted, the punctuation mark in unified entry, be one kind etc. by full-shape form or the half width form unification of entry.
Call participle instrument;Preferably, the participle instrument is jieba participle instruments, but is not limited to this participle instrument.
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain first
Word collection;Participle instrument carries out the pattern of participle to description entry various, in addition to by normal participle pattern cutting, can be with
Continuing long word carries out cutting, improves recall rate, especially to short text, can cut out than being normally syncopated as more words, to follow-up
The degree of accuracy of output video frequency program have lifting effect.
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Its
In, the row order of the index matrix is carried out from high to low in total word frequency that the be described document occurs according to keyword
Arrangement, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out from high to low
Arrangement.
It should be noted that it is built-up previously according to the description document of database purchase to build above-mentioned index matrix
, building process need to be followed:The numerical value of i-th element of the jth row of index matrix represents i-th keyword in j-th video
The word frequency occurred in the description document of program.Wherein, the same key representated by all elements of the i-th row of index matrix
Word, and keyword representated by the element do not gone together differs.For example, it is assumed that all elements of the 1st row of index matrix are represented
Keyword A, the element of the 1st row of index matrix represents description document B, then the number of the element of the row of the 1st row the 1st of the index matrix
Value represents the probability that keyword A occurs in description document B.
Further, the query vector of the structure description entry in above-mentioned steps S2, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;For example, the small letter in entry is unified into
Capitalize, united to the punctuation mark in space deletion unnecessary in entry, unified entry, by the full-shape form or half width form of entry
One is one kind etc..
Call participle instrument;Preferably, the participle instrument is jieba participle instruments, but is not limited to this participle instrument.
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;Point
Word instrument has various to the pattern that description entry carries out participle, in addition to by normal participle pattern cutting, can also continue to long word
Cutting is carried out, recall rate is improved, especially to short text, can be cut out than being normally syncopated as more words, follow-up output is regarded
The degree of accuracy of frequency program has lifting effect.
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the description entry is built
Query vector.
It should be noted that when building the query vector of the description entry, it is to be ensured that i-th yuan of the query vector
The keyword that element is represented is identical with the keyword that the i-th row element of the index matrix of above-mentioned Vector Space Model is represented, and makes
Obtain comparison query vector has meaning with the cosine similarity of each column vector of index matrix.
In addition, the process for building vector also needs to follow following principle:The key that i-th element of the query vector is represented
The keyword that i-th row element of word and the index matrix is represented is identical, and i-th element of the query vector numerical value generation
The word frequency that the corresponding keyword of i-th element described in table occurs in the description entry;For example, assuming that index matrix
The all elements of 1 row represent keyword A, then the keyword that the element of the 1st row of query vector is represented is keyword A, then inquire about
The numerical value of the element of the 1st row of vector represents the word frequency that keyword A occurs in entry is described.
Further, the index matrix is H, then carry out that singular value decomposition obtained to the index matrix is described latent
It is in semantic indexing model:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are a left sides of the index matrix H
Singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal matrix, square
Each right singular vector for being classified as the index matrix H of battle array D;The query vector is Q;
The specific implementation process of above-mentioned steps S3 is specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by square
The preceding K of battle array T arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is by matrix D
Preceding K arranges the matrix to be formed;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposition square of the query vector
Battle array QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix jth row to
Cosine similarity between two row vectors of amount, as the index matrix HKJth column vector and the query vector Q more than
String similarity.
It should be noted that K values herein are a threshold value selections, can be selected according to actual conditions, decomposable process uses H
K orders, be that the singular value for making the preceding K maximum singular value of index matrix H later is all zero.It is above-mentioned to Vector Space Model
Revision, it is possible to increase recall precision.
Further, the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with regarding belonging to the new video frequency program
The corresponding Vector Space Model of frequency classification is updated.
It should be noted that because video frequency program can be ever-increasing, and the video frequency program that is newly increased for description is retouched
Stating document can also be continuously added in the middle of database, it is therefore desirable to be updated in semantic indexing model to lifting.
The searching method of video frequency program provided in an embodiment of the present invention, the query vector of video will be searched for by calculating and is dived
In the cosine similarity of each column vector of the index matrix of semantic indexing model, can obtain the description entry of video to be searched for
Degree of correlation between the description document that each column vector of index matrix is represented, numerical value is higher, then degree of correlation is higher, and then
Video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to potential applications rope
Draw model be according to description video frequency program description document build (training) into, the potential applications of document can be excavated, raising
Search for the degree of accuracy of video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, selects to be regarded with this
The corresponding Vector Space Model of frequency classification is calculated, and can further improve the efficiency of search video frequency program.
It is the structural representation of one embodiment of the searcher of the video frequency program that the present invention is provided refering to Fig. 2.This is searched
Rope device be able to carry out above-described embodiment offer video frequency program searching method whole flows, the searcher, including:
User profile receiver module 10, the description entry and the video of the description video frequency program for receiving user input
Video classification belonging to program;
Query vector builds module 20, for choosing the Vector Space Model corresponding with the video classification, and
The building mode of the index matrix according to the semantic indexing model, builds the query vector of the description entry;Wherein, it is described
Vector Space Model is to the index matrix constructed by describing the description document of the same other video frequency program of video class
Carry out singular value decomposition and obtain;
Similarity calculation module 30, for according to the Vector Space Model, calculating each of the index matrix
The cosine similarity of column vector and the query vector;
Video frequency program chooses module 40, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and selects
The corresponding video frequency program of column vector for taking the cosine similarity that sequence number belongs to sequence interval is supplied to the user.
Further, the query vector build that module includes for being built according to the description document of description video frequency program
Into the unit of index matrix, specifically for:The word frequency that i-th keyword is occurred in j-th description document of video frequency program
The numerical value of i-th element arranged as the jth of index matrix;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for:
The keyword that the keyword of i-th element representative of the query vector is represented with the i-th row element of the index matrix is set
It is identical, and the word frequency that the corresponding keyword of i-th element is occurred in the description entry is used as the i-th of the query vector
The numerical value of individual element;Wherein, the query vector is column vector.
Further, it is that the query vector of the searcher of the video frequency program that the present invention is provided builds module referring to Fig. 3
The structural representation of one embodiment, the query vector builds module 20 to be included for according to describing, same video class is other to be regarded
The description document of frequency program is built into the unit of index matrix, specially:
First Format adjusting unit 21, for the institute of the same other video frequency program of video class of description for database purchase
Document is described, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, institute
Stating database purchase has the description document of various video classification, and a description document describes a video frequency program, different descriptions
The video frequency program of document description is different;
First instrument call unit 22, for calling participle instrument;
First participle unit 23, for using the participle instrument to Format adjusting after described be described document word
Bar carries out participle, obtains the first word collection;
First keyword extracting unit 34, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit 25, for being occurred in each describes document according to each keyword for being extracted
Word frequency, index building matrix;Wherein, the row order of the index matrix is gone out in the be described document according to keyword
Existing total word frequency carries out arrangement from high to low, and the row order of the index matrix is according to keyword in each describes document
The word frequency of appearance carries out arrangement from high to low.
Further, the query vector builds module 20 and also includes for building the query vector for describing entry
Unit, specially:
Second Format adjusting unit 26, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit 27, for calling participle instrument;
Second participle unit 28, for using the participle instrument to Format adjusting after the description entry divided
Word, obtains the second word collection;
Second keyword extracting unit 29, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit 31, for what is occurred in the description entry according to each keyword for being extracted
Word frequency, builds the query vector of the description entry.
Further, it is the one of the similarity calculation module of the searcher of the video frequency program that the present invention is provided referring to Fig. 4
The structural representation of individual embodiment, the index matrix is H, then the institute that singular value decomposition is obtained is carried out to the index matrix
Stating Vector Space Model is:H=T*S*DT;Wherein, T is orthogonal matrix, and each row of matrix T are the index matrix H
Left singular vector;S is diagonal matrix, and the diagonal entry of matrix S is the singular value of the index matrix H;D is orthogonal moment
Battle array, each right singular vector for being classified as the index matrix H of matrix D;The query vector is Q;
The similarity calculation module 30 is specifically included:
Model revises unit 32, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*
SK*DK T;Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKBe by the preceding K diagonal entry of matrix S formed it is diagonal
Matrix, DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit 33, for the index matrix H for the revised Vector Space ModelK, calculate described
The transposed matrix Q of query vectorTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKMultiplication institute
Cosine similarity between two row vectors of the jth row vector for obtaining matrix, as the index matrix HKJth column vector and institute
State the cosine similarity of query vector Q.
Further, the searcher also includes:
Model modification module 50, during for the description document for increasing the new video frequency program of description when database, pair with it is described
The corresponding Vector Space Model of video classification belonging to new video frequency program is updated.
The searcher of video frequency program provided in an embodiment of the present invention, the query vector of video will be searched for by calculating and is dived
In the cosine similarity of each column vector of the index matrix of semantic indexing model, can obtain the description entry of video to be searched for
Degree of correlation between the description document that each column vector of index matrix is represented, numerical value is higher, then degree of correlation is higher, and then
Video program recommendation corresponding to entry degree of correlation description document high will be described with this to user, and due to potential applications rope
Draw model be according to description video frequency program description document build (training) into, the potential applications of document can be excavated, raising
Search for the degree of accuracy of video frequency program.In addition, the video classification belonging to the video frequency program for passing through user input, selects to be regarded with this
The corresponding Vector Space Model of frequency classification is calculated, and can further improve the efficiency of search video frequency program.
One of ordinary skill in the art will appreciate that all or part of flow in realizing above-described embodiment method, can be
The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium
In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The above is the preferred embodiment of the present invention, it is noted that for those skilled in the art
For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (12)
1. a kind of searching method of video frequency program, it is characterised in that including:
Receive the video classification belonging to the description entry and the video frequency program of the description video frequency program of user input;
The Vector Space Model corresponding with the video classification is chosen, and according to the index square of the semantic indexing model
The building mode of battle array, builds the query vector of the description entry;Wherein, the Vector Space Model is to same by description
The description constructed index matrix of document of the other video frequency program of one video class carries out singular value decomposition and obtains;
According to the Vector Space Model, each column vector of the index matrix and the cosine of the query vector are calculated
Similarity;
Cosine similarity to calculating acquisition carries out sequence from big to small, and chooses the cosine phase that sequence number belongs to sequence interval
The user is supplied to like the corresponding video frequency program of column vector of degree.
2. the searching method of video frequency program as claimed in claim 1, it is characterised in that
The process for being built into index matrix by the description document of description video frequency program includes:By i-th keyword in j-th video
The numerical value of i-th element that the word frequency occurred in the description document of program is arranged as the jth of index matrix;
The process for building the query vector of the description entry includes:The pass that i-th element of the query vector is represented is set
The keyword that i-th row element of keyword and the index matrix is represented is identical, and by the corresponding keyword of i-th element described
The word frequency of appearance in entry is described as the numerical value of i-th element of the query vector;Wherein, the query vector for row to
Amount.
3. the searching method of video frequency program as claimed in claim 1 or 2, it is characterised in that other by describing same video class
The description document of video frequency program is built into the process of index matrix, specially:
The description same other video frequency program of video class for database purchase is described document, according to standard words grid
Formula, Format adjusting is carried out to the entry that the be described document is included;Wherein, the database purchase has various video classification
Description document, a description document describes a video frequency program, and the video frequency program of different description document descriptions is different;
Call participle instrument;
Using the participle instrument to Format adjusting after the entry of described be described document carry out participle, obtain the first word
Collection;
Concentrated from first word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in each describes document, index building matrix;Wherein, institute
The row order for stating index matrix is to carry out row from high to low in total word frequency that the be described document occurs according to keyword
Row, the row word frequency that is occurred in each describes document according to keyword of order of the index matrix carries out row from high to low
Row.
4. the searching method of video frequency program as claimed in claim 1 or 2, it is characterised in that the structure description entry
Query vector, specially:
According to standard words wiht strip-lattice type, Format adjusting is carried out to the description entry;
Call participle instrument;
Using the participle instrument to Format adjusting after the description entry carry out participle, obtain the second word collection;
Concentrated from second word according to TF-IDF algorithms and extract keyword;
According to the word frequency that each keyword for being extracted occurs in the description entry, the inquiry of the description entry is built
Vector.
5. the searching method of video frequency program as claimed in claim 3, it is characterised in that the index matrix is H, then to described
Index matrix carries out the Vector Space Model that singular value decomposition obtained:H=T*S*DT;Wherein, T is orthogonal moment
Battle array, each row of matrix T are the left singular vectors of the index matrix H;S is diagonal matrix, and the diagonal entry of matrix S is institute
State the singular value of index matrix H;D is orthogonal matrix, each right singular vector for being classified as the index matrix H of matrix D;It is described
Query vector is Q;
It is described according to the Vector Space Model, calculate each column vector and the query vector of the index matrix
Cosine similarity, specially:
Choose TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;Wherein, TKIt is by matrix T
Preceding K arranges the matrix to be formed, SKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S, DKIt is to be arranged by the preceding K of matrix D
The matrix of formation;The maximum sequence number for including interval more than the sequence of the numerical value of K;
For the index matrix H of the revised Vector Space ModelK, calculate the transposed matrix Q of the query vectorT
With the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied the jth row vector of gained matrix
Cosine similarity between two row vectors, as the index matrix HKJth column vector and the query vector Q cosine phase
Like degree.
6. the searching method of video frequency program as claimed in claim 1, it is characterised in that the searching method also includes:
When database increases the description document of the new video frequency program of description, pair with the video class belonging to the new video frequency program
Not corresponding Vector Space Model is updated.
7. a kind of searcher of video frequency program, it is characterised in that including:
User profile receiver module, description entry and video frequency program institute for receiving the description video frequency program of user input
The video classification of category;
Query vector builds module, for choosing the Vector Space Model corresponding with the video classification, and according to institute
The building mode of the index matrix of predicate justice index model, builds the query vector of the description entry;Wherein, the potential language
Adopted index model is that the description constructed index matrix of document by describing the same other video frequency program of video class is carried out very
What different value was decomposed and obtained;
Similarity calculation module, for according to the Vector Space Model, calculating each column vector of the index matrix
With the cosine similarity of the query vector;
Video frequency program chooses module, for carrying out sequence from big to small to the cosine similarity for calculating acquisition, and chooses sequence
The corresponding video frequency program of column vector for number belonging to the interval cosine similarity of sequence is supplied to the user.
8. the searcher of video frequency program as claimed in claim 7, it is characterised in that
The query vector build that module includes for being built into index matrix according to the description document of description video frequency program
Unit, specifically for:Using the word frequency of i-th keyword appearance in j-th description document of video frequency program as index matrix
Jth row i-th element numerical value;
The query vector builds the unit of the query vector for building description entry that module includes, specifically for:Set
The keyword that i-th element of the query vector is represented is identical with the keyword that the i-th row element of the index matrix is represented,
And the word frequency for occurring the corresponding keyword of i-th element in the description entry is used as i-th yuan of the query vector
The numerical value of element;Wherein, the query vector is column vector.
9. the searcher of video frequency program as claimed in claim 7 or 8, it is characterised in that the query vector builds module
Including the unit for being built into index matrix according to the description document for describing the same other video frequency program of video class, specially:
First Format adjusting unit, for being described for the same other video frequency program of video class of description for database purchase
Document, according to standard words wiht strip-lattice type, Format adjusting is carried out to the entry that the be described document is included;Wherein, the data
Stock contains the description document of various video classification, and a description document describes a video frequency program, and different description documents are retouched
The video frequency program stated is different;
First instrument call unit, for calling participle instrument;
First participle unit, for using the participle instrument to Format adjusting after the entry of described be described document carry out
Participle, obtains the first word collection;
First keyword extracting unit, keyword is extracted for being concentrated from first word according to TF-IDF algorithms;
Index matrix construction unit, for the word occurred in each describes document according to each keyword for being extracted
Frequently, index building matrix;Wherein, the row order of the index matrix is that occurred in the be described document according to keyword
Total word frequency carries out arrangement from high to low, and the row order of the index matrix occurs according to keyword in each describes document
The word frequency arrangement that carries out from high to low.
10. the searcher of video frequency program as claimed in claim 7 or 8, it is characterised in that the query vector builds module
Also include the unit of the query vector for building the description entry, specially:
Second Format adjusting unit, for according to standard words wiht strip-lattice type, Format adjusting being carried out to the description entry;
Second instrument call unit, for calling participle instrument;
Second participle unit, for using the participle instrument to Format adjusting after the description entry carry out participle, obtain
Second word collection;
Second keyword extracting unit, keyword is extracted for being concentrated from second word according to TF-IDF algorithms;
Query vector construction unit, for the word frequency occurred in the description entry according to each keyword for being extracted,
Build the query vector of the description entry.
The searcher of 11. video frequency programs as claimed in claim 9, it is characterised in that the index matrix is H, then to institute
State index matrix and carry out the Vector Space Model that singular value decomposition obtained and be:H=T*S*DT;Wherein, T is orthogonal
Matrix, each row of matrix T are the left singular vectors of the index matrix H;S is diagonal matrix, and the diagonal entry of matrix S is
The singular value of the index matrix H;D is orthogonal matrix, each right singular vector for being classified as the index matrix H of matrix D;Institute
Query vector is stated for Q;
The similarity calculation module is specifically included:
Model revises unit, for choosing TK、SKAnd DKMatrix, it is H to revise the Vector Space ModelK=TK*SK*DK T;
Wherein, TKIt is that the matrix for being formed, S are arranged by the preceding K of matrix TKIt is the diagonal matrix formed by the preceding K diagonal entry of matrix S,
DKIt is that the matrix for being formed is arranged by the preceding K of matrix D;The maximum sequence number for including interval more than the sequence of the numerical value of K;
Computing unit, for the index matrix H for the revised Vector Space ModelK, calculate the query vector
Transposed matrix QTWith the matrix TKRow vector and the matrix D obtained by multiplicationKWith the matrix SKBe multiplied gained matrix
Cosine similarity between two row vectors of jth row vector, as the index matrix HKJth column vector with it is described inquire about to
Measure the cosine similarity of Q.
The searcher of 12. video frequency programs as claimed in claim 7, it is characterised in that the searcher also includes:
Model modification module, during for the description document for increasing the new video frequency program of description when database, pair regards with described new
The corresponding Vector Space Model of video classification belonging to frequency program is updated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611019485.4A CN106708929B (en) | 2016-11-18 | 2016-11-18 | Video program searching method and device |
PCT/CN2016/113642 WO2018090468A1 (en) | 2016-11-18 | 2016-12-30 | Method and device for searching for video program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611019485.4A CN106708929B (en) | 2016-11-18 | 2016-11-18 | Video program searching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106708929A true CN106708929A (en) | 2017-05-24 |
CN106708929B CN106708929B (en) | 2020-06-26 |
Family
ID=58939942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611019485.4A Active CN106708929B (en) | 2016-11-18 | 2016-11-18 | Video program searching method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106708929B (en) |
WO (1) | WO2018090468A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416026A (en) * | 2018-03-09 | 2018-08-17 | 腾讯科技(深圳)有限公司 | Index generation method, content search method, device and equipment |
CN109918616A (en) * | 2019-01-23 | 2019-06-21 | 中国人民解放军军事科学院系统工程研究院 | A kind of visual media processing method based on the enhancing of semantic indexing precision |
CN110555127A (en) * | 2018-03-30 | 2019-12-10 | 优酷网络技术(北京)有限公司 | Multimedia content generation method and device |
CN111177512A (en) * | 2019-12-24 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | Scientific and technological achievement missing processing method and device based on big data |
CN111651635A (en) * | 2020-05-28 | 2020-09-11 | 拾音智能科技有限公司 | Video retrieval method based on natural language description |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984851B (en) * | 2020-09-03 | 2023-11-14 | 深圳平安智慧医健科技有限公司 | Medical data searching method, device, electronic device and storage medium |
CN113094703B (en) * | 2021-03-11 | 2024-06-21 | 北京六方云信息技术有限公司 | Output content filtering method and system for web intrusion detection |
CN114564496B (en) * | 2022-03-01 | 2023-09-19 | 北京有竹居网络技术有限公司 | Content recommendation method and device |
CN118364090B (en) * | 2024-06-19 | 2024-08-27 | 西安羚控电子科技有限公司 | Rapid generation method and device for designed scheme |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101527815A (en) * | 2008-03-06 | 2009-09-09 | 株式会社东芝 | Program recommending apparatus and method |
CN103559196A (en) * | 2013-09-23 | 2014-02-05 | 浙江大学 | Video retrieval method based on multi-core canonical correlation analysis |
CN104199933A (en) * | 2014-09-04 | 2014-12-10 | 华中科技大学 | Multi-modal information fusion football video event detection and semantic annotation method |
CN104657376A (en) * | 2013-11-20 | 2015-05-27 | 航天信息股份有限公司 | Searching method and searching device for video programs based on program relationship |
CN105653690A (en) * | 2015-12-30 | 2016-06-08 | 武汉大学 | Video big data rapid searching method and system constrained by abnormal behavior early-warning information |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6189002B1 (en) * | 1998-12-14 | 2001-02-13 | Dolphin Search | Process and system for retrieval of documents using context-relevant semantic profiles |
CN103152618B (en) * | 2011-12-07 | 2017-11-17 | 北京四达时代软件技术股份有限公司 | Value added service of digital television content recommendation method and device |
-
2016
- 2016-11-18 CN CN201611019485.4A patent/CN106708929B/en active Active
- 2016-12-30 WO PCT/CN2016/113642 patent/WO2018090468A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101527815A (en) * | 2008-03-06 | 2009-09-09 | 株式会社东芝 | Program recommending apparatus and method |
CN103559196A (en) * | 2013-09-23 | 2014-02-05 | 浙江大学 | Video retrieval method based on multi-core canonical correlation analysis |
CN104657376A (en) * | 2013-11-20 | 2015-05-27 | 航天信息股份有限公司 | Searching method and searching device for video programs based on program relationship |
CN104199933A (en) * | 2014-09-04 | 2014-12-10 | 华中科技大学 | Multi-modal information fusion football video event detection and semantic annotation method |
CN105653690A (en) * | 2015-12-30 | 2016-06-08 | 武汉大学 | Video big data rapid searching method and system constrained by abnormal behavior early-warning information |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416026A (en) * | 2018-03-09 | 2018-08-17 | 腾讯科技(深圳)有限公司 | Index generation method, content search method, device and equipment |
CN110555127A (en) * | 2018-03-30 | 2019-12-10 | 优酷网络技术(北京)有限公司 | Multimedia content generation method and device |
CN109918616A (en) * | 2019-01-23 | 2019-06-21 | 中国人民解放军军事科学院系统工程研究院 | A kind of visual media processing method based on the enhancing of semantic indexing precision |
CN111177512A (en) * | 2019-12-24 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | Scientific and technological achievement missing processing method and device based on big data |
CN111651635A (en) * | 2020-05-28 | 2020-09-11 | 拾音智能科技有限公司 | Video retrieval method based on natural language description |
CN111651635B (en) * | 2020-05-28 | 2023-04-28 | 拾音智能科技有限公司 | Video retrieval method based on natural language description |
Also Published As
Publication number | Publication date |
---|---|
WO2018090468A1 (en) | 2018-05-24 |
CN106708929B (en) | 2020-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106708929A (en) | Video program searching method and device | |
Zhang et al. | Ad hoc table retrieval using semantic similarity | |
CN101223525B (en) | Relationship networks | |
CN105045875B (en) | Personalized search and device | |
KR101190230B1 (en) | Phrase identification in an information retrieval system | |
CN101582080B (en) | Web image clustering method based on image and text relevant mining | |
Sarawagi et al. | Open-domain quantity queries on web tables: annotation, response, and consensus models | |
CN103425687A (en) | Retrieval method and system based on queries | |
US8515684B2 (en) | System and method for identifying similar molecules | |
US20090119281A1 (en) | Granular knowledge based search engine | |
CN106547864B (en) | A kind of Personalized search based on query expansion | |
CN102456016B (en) | Method and device for sequencing search results | |
CN112988980B (en) | Target product query method and device, computer equipment and storage medium | |
CN106372073A (en) | Mathematical formula retrieval method and apparatus | |
CN110083683B (en) | Entity semantic annotation method based on random walk | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN104281565B (en) | Semantic dictionary construction method and device | |
CN106570196A (en) | Video program searching method and device | |
CN113190593A (en) | Search recommendation method based on digital human knowledge graph | |
CN107436955A (en) | A kind of English word relatedness computation method and apparatus based on Wikipedia Concept Vectors | |
CN105404677A (en) | Tree structure based retrieval method | |
CN111753514A (en) | Automatic generation method and device of patent application text | |
CN105426490A (en) | Tree structure based indexing method | |
Ibrahim et al. | Exquisite: explaining quantities in text | |
Phan et al. | Automated data extraction from the web with conditional models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |