CN106708929B

CN106708929B - Video program searching method and device

Info

Publication number: CN106708929B
Application number: CN201611019485.4A
Authority: CN
Inventors: 李贤�
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2016-11-18
Filing date: 2016-11-18
Publication date: 2020-06-26
Anticipated expiration: 2036-11-18
Also published as: WO2018090468A1; CN106708929A

Abstract

The invention discloses a video program searching method, which comprises the following steps: receiving a description entry for describing a video program and a video category to which the video program belongs, which are input by a user; selecting a potential semantic index model corresponding to the video category, and constructing a query vector of the description entry according to a construction mode of an index matrix of the semantic index model; calculating the cosine similarity of each column of vectors of the index matrix and the query vector according to the potential semantic index model; and sorting the cosine similarity obtained by calculation from large to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user. Correspondingly, the invention also discloses a video program searching device. By adopting the embodiment of the invention, the potential semantics of the document can be mined, and the accuracy and the searching efficiency of searching the video program are improved.

Description

Video program searching method and device

Technical Field

The present invention relates to the field of computers, and in particular, to a method and an apparatus for searching for a video program.

Background

When the comprehensive art program is recommended, the ContentBase method is an important strategy, mainly clustering recommendation is carried out through the similarity of comprehensive art content description, the method clusters texts with similar contents, the existing Rocchio algorithm mainly based on TF-IDF is derived from a Vector space model theory, the basic idea of a Vector space model is to use a Vector to represent one text, and the subsequent processing process can be converted into operation of the Vector in the space. The Rocchio algorithm training process is a process of establishing a category feature vector, generating a vector of a given unknown text, then calculating the similarity of the vector and each category feature vector, and finally classifying the text into the most similar category.

However, the adoption of the algorithm has the following defects: the Rocchio algorithm cannot mine the underlying semantics of the document. Second, it assumes that the training data is absolutely correct, since it does not have any mechanism to quantitatively measure whether the sample contains noise, and is thus not resistant to erroneous data.

Disclosure of Invention

The method and the device for searching the video program provided by the embodiment of the invention can dig out the potential semantics of the document and improve the accuracy and the searching efficiency of searching the video program.

The method for searching the video program provided by the embodiment of the invention comprises the following steps:

receiving a description entry for describing a video program and a video category to which the video program belongs, which are input by a user;

selecting a potential semantic index model corresponding to the video category, and constructing a query vector of the description entry according to a construction mode of an index matrix of the semantic index model; the latent semantic index model is obtained by performing singular value decomposition on an index matrix constructed by description documents of video programs describing the same video category;

calculating the cosine similarity of each column of vectors of the index matrix and the query vector according to the potential semantic index model;

and sorting the cosine similarity obtained by calculation from large to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user.

Further, the process of constructing the index matrix from the description documents describing the video programs includes: taking the word frequency of the ith keyword appearing in the description document of the jth video program as the numerical value of the ith element of the jth column of the index matrix;

the process of constructing the query vector describing the entry comprises: setting a keyword represented by an ith element of the query vector to be the same as a keyword represented by an ith row element of the index matrix, and taking a word frequency of the keyword corresponding to the ith element appearing in the description entry as a numerical value of the ith element of the query vector; wherein the query vector is a column vector.

Further, a process of constructing an index matrix from the description documents describing the video programs of the same video category specifically includes:

for all description documents which are stored in a database and describe video programs of the same video category, carrying out format adjustment on terms contained in all the description documents according to a standard term format; the database stores description documents of various video categories, one description document describes one video program, and the video programs described by different description documents are different from each other;

calling a word segmentation tool;

utilizing the word segmentation tool to segment the entries of all the description documents after format adjustment to obtain a first word set;

extracting keywords from the first set of words according to a TF-IDF algorithm;

constructing an index matrix according to the word frequency of each extracted keyword in each description document; the row sequence of the index matrix is arranged from high to low according to the total word frequency of the keywords appearing in all the description documents, and the column sequence of the index matrix is arranged from high to low according to the word frequency of the keywords appearing in each description document.

Further, the constructing the query vector describing the entry specifically includes:

according to the standard entry format, carrying out format adjustment on the description entries;

calling a word segmentation tool;

utilizing the word segmentation tool to segment the description entries after the format adjustment to obtain a second word set;

extracting keywords from the second set of words according to a TF-IDF algorithm;

and constructing a query vector of the description entries according to the word frequency of each extracted keyword appearing in the description entries.

Further, if the index matrix is H, the latent semantic index model obtained by performing singular value decomposition on the index matrix is: h ═ T ═ S ^ D^T(ii) a Wherein T is an orthogonal matrix, and each column of the matrix T is a left singular vector of the index matrix H; s is a diagonal matrix, and diagonal elements of the matrix S are singular values of the index matrix H; d is an orthogonal matrix, and each column of the matrix D is a right singular vector of the index matrix H; the query vector is Q;

calculating the cosine similarity between each column of vectors of the index matrix and the query vector according to the potential semantic index model, specifically:

selecting T_K、S_KAnd D_KMatrix, revising the latent semantic index model to H_K＝T_K*S_K*D_K ^T(ii) a Wherein, T_KIs a matrix formed by the first K columns of the matrix T, S_KFor a diagonal matrix formed by the first K diagonal elements of the matrix S, D_KIs a matrix formed by the first K columns of the matrix D; the numerical value of K is larger than the maximum sorting number contained in the sorting interval;

index matrix H for the revised potential semantic index model_KComputing a transposed matrix Q of the query vector^TAnd the matrix T_KMultiplying the resulting row vector with said matrix D_KAnd the matrix S_KThe cosine similarity between two lines of vectors of the jth line vector of the multiplied matrix is taken as the index matrix H_KAnd the cosine similarity of the jth column vector of (a) and the query vector Q.

Further, the search method further comprises:

when a description document describing a new video program is added to the database, a potential semantic index model corresponding to a video category to which the new video program belongs is updated.

Accordingly, an embodiment of the present invention provides a video program search apparatus, including:

the user information receiving module is used for receiving a description entry which is input by a user and used for describing a video program and a video category to which the video program belongs;

the query vector construction module is used for selecting a potential semantic index model corresponding to the video category and constructing the query vector describing the entries according to the construction mode of an index matrix of the semantic index model; the latent semantic index model is obtained by performing singular value decomposition on an index matrix constructed by description documents of video programs describing the same video category;

the similarity calculation module is used for calculating the cosine similarity between each column of vectors of the index matrix and the query vector according to the potential semantic index model;

and the video program selecting module is used for sorting the cosine similarity obtained by calculation from large to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user.

Further, the query vector construction module includes a unit configured to construct an index matrix according to the description document describing the video program, and is specifically configured to: taking the word frequency of the ith keyword appearing in the description document of the jth video program as the numerical value of the ith element of the jth column of the index matrix;

the unit for constructing a query vector describing the entry, which is included in the query vector construction module, is specifically configured to: setting a keyword represented by an ith element of the query vector to be the same as a keyword represented by an ith row element of the index matrix, and taking a word frequency of the keyword corresponding to the ith element appearing in the description entry as a numerical value of the ith element of the query vector; wherein the query vector is a column vector.

Further, the query vector construction module includes a unit configured to construct an index matrix according to description documents describing video programs of the same video category, specifically:

the first format adjusting unit is used for adjusting the formats of all the entries contained in all the description documents which are stored in the database and describe the video programs of the same video category according to the standard entry formats; the database stores description documents of various video categories, one description document describes one video program, and the video programs described by different description documents are different from each other;

the first tool calling unit is used for calling the word segmentation tool;

the first word segmentation unit is used for performing word segmentation on the entries of all the description documents after format adjustment by using the word segmentation tool to obtain a first word set;

a first keyword extraction unit for extracting keywords from the first word set according to a TF-IDF algorithm;

the index matrix construction unit is used for constructing an index matrix according to the word frequency of each extracted keyword in each description document; the row sequence of the index matrix is arranged from high to low according to the total word frequency of the keywords appearing in all the description documents, and the column sequence of the index matrix is arranged from high to low according to the word frequency of the keywords appearing in each description document.

Further, the query vector construction module further includes a unit configured to construct the query vector describing the entry, specifically:

the second format adjusting unit is used for carrying out format adjustment on the description entries according to the standard entry format;

the second tool calling unit is used for calling the word segmentation tool;

the second word segmentation unit is used for segmenting the description entries with the adjusted formats by using the word segmentation tool to obtain a second word set;

a second keyword extraction unit for extracting keywords from the second word set according to a TF-IDF algorithm;

and the query vector construction unit is used for constructing the query vector of the description entries according to the word frequency of each extracted keyword appearing in the description entries.

the similarity calculation module specifically includes:

a model revision unit for selecting T_K、S_KAnd D_KMatrix, revising the latent semantic index model to H_K＝T_K*S_K*D_K ^T(ii) a Wherein, T_KBeing formed by a matrix TMatrix formed by first K columns, S_KFor a diagonal matrix formed by the first K diagonal elements of the matrix S, D_KIs a matrix formed by the first K columns of the matrix D; the numerical value of K is larger than the maximum sorting number contained in the sorting interval;

a computing unit for computing an index matrix H for the revised latent semantic index model_KComputing a transposed matrix Q of the query vector^TAnd the matrix T_KMultiplying the resulting row vector with said matrix D_KAnd the matrix S_KThe cosine similarity between two lines of vectors of the jth line vector of the multiplied matrix is taken as the index matrix H_KAnd the cosine similarity of the jth column vector of (a) and the query vector Q.

Further, the search device further includes:

and the model updating module is used for updating the potential semantic index model corresponding to the video category to which the new video program belongs when the description document describing the new video program is added in the database.

The embodiment of the invention has the following beneficial effects:

according to the video program searching method and device provided by the embodiment of the invention, the degree of correlation between the description entries of the video to be searched and the description documents represented by each column vector of the index matrix of the potential semantic index model can be obtained by calculating the cosine similarity between the query vector of the video to be searched and each column vector of the index matrix of the potential semantic index model, the higher the numerical value is, the higher the degree of correlation is, and further the video program corresponding to the description documents with the high degree of correlation with the description entries is recommended to the user. In addition, the video category to which the video program belongs is input by the user, and the potential semantic index model corresponding to the video category is selected for calculation, so that the efficiency of searching for the video program can be further improved.

Drawings

Fig. 1 is a schematic flowchart of an embodiment of a video program searching method provided by the present invention;

fig. 2 is a schematic structural diagram of an embodiment of a video program search apparatus provided in the present invention;

fig. 3 is a schematic structural diagram of an embodiment of a query vector construction module of the video program search apparatus provided in the present invention;

fig. 4 is a schematic structural diagram of a similarity calculation module of a video program search apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of an embodiment of a video program searching method provided by the present invention; the searching method comprises steps S1-S4, and specifically comprises the following steps:

s1, receiving a description entry for describing a video program and a video category to which the video program belongs, which are input by a user;

s2, selecting a potential semantic index model corresponding to the video category, and constructing the query vector of the description entry according to the construction mode of an index matrix of the semantic index model; the latent semantic index model is obtained by performing singular value decomposition on an index matrix constructed by description documents of video programs describing the same video category; the value of the ith element in the jth column of the index matrix represents the word frequency of the ith keyword appearing in the description document of the jth video program; the query vector is a column vector, a keyword represented by an ith element of the query vector is the same as a keyword represented by an ith row element of the index matrix, and a numerical value of the ith element of the query vector represents a word frequency of the keyword corresponding to the ith element appearing in the description entry;

s3, calculating the cosine similarity of each column vector of the index matrix and the query vector according to the potential semantic index model;

and S4, sorting the cosine similarity obtained by calculation from big to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user.

It should be noted that by calculating the cosine similarity between the query vector of the video to be searched and each column of vectors of the index matrix of the potential semantic index model, the degree of correlation between the description terms of the video to be searched and the description documents represented by each column of vectors of the index matrix can be obtained, the higher the numerical value is, the higher the degree of correlation is, and further the video program corresponding to the description documents with the high degree of correlation with the description terms is recommended to the user, and because the potential semantic index model is constructed (trained) according to the description documents describing the video program, the potential semantics of the documents can be mined, and the accuracy of searching the video program is improved. In addition, the video category to which the video program belongs is input by the user, and the potential semantic index model corresponding to the video category is selected for calculation, so that the efficiency of searching for the video program can be further improved. In general, the above sort section is preferably arranged with the top 10 sort numbers.

Further, the process of constructing the index matrix according to the description document describing the video program of the same video category in step S2 includes:

for all description documents which are stored in a database and describe video programs of the same video category, carrying out format adjustment on terms contained in all the description documents according to a standard term format; the database stores description documents of various video categories, one description document describes one video program, and the video programs described by different description documents are different from each other; for the format adjustment of the entries, but not limited to, unifying the lower case in the entry into the upper case, deleting the redundant blank space in the entry, unifying punctuation marks in the entry, unifying the full angle format or the half angle format of the entry into one, and the like.

Calling a word segmentation tool; preferably, the word segmentation tool is a jieba word segmentation tool, but is not limited to this word segmentation tool.

Utilizing the word segmentation tool to segment the entries of all the description documents after format adjustment to obtain a first word set; the word segmentation tool has various word segmentation modes for describing entries, can continue to segment long words except for segmenting according to a normal word segmentation mode, improves recall rate, can segment more words than the normal segmentation particularly for short texts, and has an effect of improving the accuracy of subsequent output video programs.

It should be noted that, the index matrix is constructed in advance according to the description documents stored in the database, and the construction process is to follow: the value of the ith element in the jth column of the index matrix represents the word frequency of the ith keyword appearing in the description document of the jth video program. All elements in the ith row of the index matrix represent the same keyword, and the keywords represented by the elements in different rows are different. For example, assuming that all elements in row 1 of the index matrix represent the keyword a and the elements in column 1 of the index matrix represent the descriptive document B, the numerical value of the elements in row 1 and column 1 of the index matrix represents the probability of the keyword a appearing in the descriptive document B.

Further, the constructing the query vector describing the entry in step S2 specifically includes:

according to the standard entry format, carrying out format adjustment on the description entries; for example, unifying the lower case of a term into upper case, deleting the extra space in a term, unifying punctuation marks in a term, unifying the full-angle format or half-angle format of a term into one, and the like.

Utilizing the word segmentation tool to segment the description entries after the format adjustment to obtain a second word set; the word segmentation tool has various word segmentation modes for describing entries, can continue to segment long words except for segmenting according to a normal word segmentation mode, improves recall rate, can segment more words than the normal segmentation particularly for short texts, and has an effect of improving the accuracy of subsequent output video programs.

It should be noted that, when constructing the query vector describing the entry, it is to be ensured that the keyword represented by the i-th element of the query vector is the same as the keyword represented by the i-th row element of the index matrix of the latent semantic index model, so that the comparison of the cosine similarity of the query vector and each column of vectors of the index matrix has significance.

In addition, the process of constructing the vector still follows the following principle: the keywords represented by the ith element of the query vector are the same as the keywords represented by the ith row element of the index matrix, and the numerical value of the ith element of the query vector represents the word frequency of the keywords corresponding to the ith element in the description entry; for example, assuming that all elements in row 1 of the index matrix represent keyword a, the keyword represented by the elements in row 1 of the query vector is keyword a, and the numerical value of the elements in row 1 of the query vector represents the word frequency of keyword a appearing in the description entry.

Further, if the index matrix is H, the latent semantic index model obtained by performing singular value decomposition on the index matrix is: h ═ T ═ S ^ D^T(ii) a Wherein T isAn orthogonal matrix, each column of matrix T being a left singular vector of the index matrix H; s is a diagonal matrix, and diagonal elements of the matrix S are singular values of the index matrix H; d is an orthogonal matrix, and each column of the matrix D is a right singular vector of the index matrix H; the query vector is Q;

the specific implementation process of step S3 is specifically:

It should be noted that the K value here is a threshold value, and may be selected according to actual conditions, and the decomposition process adopts K rank of H, so that singular values after the first K maximum singular values of the index matrix H are all zero. The revision of the potential semantic index model can improve the retrieval efficiency.

Further, the search method further comprises:

It should be noted that, as video programs are continuously added, and description documents describing the newly added video programs are also continuously added to the database, the semantic index model needs to be updated.

According to the video program searching method provided by the embodiment of the invention, the degree of correlation between the description vocabulary entry of the video to be searched and the description document represented by each column vector of the index matrix of the potential semantic index model can be obtained by calculating the cosine similarity between the query vector of the video to be searched and each column vector of the index matrix of the potential semantic index model, the higher the numerical value is, the higher the degree of correlation is, and further the video program corresponding to the description document with the high degree of correlation with the description vocabulary entry is recommended to the user, and because the potential semantic index model is constructed (trained) according to the description document describing the video program, the potential semantics of the document can be mined, and the accuracy of searching the video program is improved. In addition, the video category to which the video program belongs is input by the user, and the potential semantic index model corresponding to the video category is selected for calculation, so that the efficiency of searching for the video program can be further improved.

Fig. 2 is a schematic structural diagram of an embodiment of a video program search apparatus according to the present invention. The search apparatus can execute all the processes of the video program search method provided by the above embodiment, and the search apparatus includes:

a user information receiving module 10, configured to receive a description entry describing a video program and a video category to which the video program belongs, where the description entry is input by a user;

a query vector construction module 20, configured to select a potential semantic index model corresponding to the video category, and construct a query vector describing the entry according to a construction manner of an index matrix of the semantic index model; the latent semantic index model is obtained by performing singular value decomposition on an index matrix constructed by description documents of video programs describing the same video category;

a similarity calculation module 30, configured to calculate a cosine similarity between each column of vectors of the index matrix and the query vector according to the latent semantic index model;

and the video program selecting module 40 is configured to sort the cosine similarity obtained through calculation from large to small, and select a video program corresponding to the column vector of the cosine similarity whose ranking number belongs to the sorting interval to provide to the user.

Further, referring to fig. 3, it is a schematic structural diagram of an embodiment of a query vector constructing module of a video program search apparatus provided in the present invention, where the query vector constructing module 20 includes a unit for constructing an index matrix according to description documents describing video programs of the same video category, specifically:

a first format adjusting unit 21, configured to perform format adjustment on entries included in all description documents describing video programs of the same video category, which are stored in a database, according to a standard entry format; the database stores description documents of various video categories, one description document describes one video program, and the video programs described by different description documents are different from each other;

a first tool calling unit 22 for calling a word segmentation tool;

the first word segmentation unit 23 is configured to perform word segmentation on the entries of all the description documents after format adjustment by using the word segmentation tool to obtain a first word set;

a first keyword extraction unit 34 for extracting keywords from the first word set according to a TF-IDF algorithm;

an index matrix constructing unit 25, configured to construct an index matrix according to the word frequency of each extracted keyword appearing in each description document; the row sequence of the index matrix is arranged from high to low according to the total word frequency of the keywords appearing in all the description documents, and the column sequence of the index matrix is arranged from high to low according to the word frequency of the keywords appearing in each description document.

Further, the query vector construction module 20 further includes a unit for constructing the query vector describing the entry, specifically:

a second format adjusting unit 26, configured to perform format adjustment on the description entries according to a standard entry format;

a second tool calling unit 27 for calling a word segmentation tool;

a second word segmentation unit 28, configured to perform word segmentation on the description entry with the adjusted format by using the word segmentation tool, so as to obtain a second word set;

a second keyword extraction unit 29 for extracting keywords from the second word set according to a TF-IDF algorithm;

a query vector construction unit 31, configured to construct a query vector of the description entries according to the word frequency of each extracted keyword appearing in the description entries.

Further, referring to fig. 4, which is a schematic structural diagram of an embodiment of a similarity calculation module of a video program search apparatus provided by the present invention, where the index matrix is H, the latent semantic index model obtained by performing singular value decomposition on the index matrix is: h ═ T ═ S ^ D^T(ii) a Wherein T is an orthogonal matrix, and each column of the matrix T is a left singular vector of the index matrix H; s is a diagonal matrix, and diagonal elements of the matrix S are singular values of the index matrix H; d is an orthogonal matrix, and each column of the matrix D is a right singular vector of the index matrix H; the query vector is Q;

the similarity calculation module 30 specifically includes:

a model revision unit 32 for selecting T_K、S_KAnd D_KMatrix, revising the latent semantic index model to H_K＝T_K*S_K*D_K ^T(ii) a Wherein, T_KIs a matrix formed by the first K columns of the matrix T, S_KFor a diagonal matrix formed by the first K diagonal elements of the matrix S, D_KIs a matrix formed by the first K columns of the matrix D; the numerical value of K is larger than the maximum sorting number contained in the sorting interval;

a computing unit 33 for computing an index matrix H for the revised potential semantic index model_KComputing a transposed matrix Q of the query vector^TAnd the matrix T_KMultiplying the resulting row vector with said matrix D_KAnd the matrix S_KThe cosine similarity between two lines of vectors of the jth line vector of the multiplied matrix is taken as the index matrix H_KAnd the cosine similarity of the jth column vector of (a) and the query vector Q.

Further, the search device further includes:

and the model updating module 50 is used for updating the potential semantic index model corresponding to the video category to which the new video program belongs when the description document describing the new video program is added to the database.

The video program searching device provided by the embodiment of the invention can obtain the degree of correlation between the description vocabulary entry of the video to be searched and the description document represented by each column vector of the index matrix of the potential semantic index model by calculating the cosine similarity between the query vector of the video to be searched and each column vector of the index matrix of the potential semantic index model, wherein the higher the numerical value is, the higher the degree of correlation is, and further recommend the video program corresponding to the description document with the high degree of correlation with the description vocabulary entry to the user. In addition, the video category to which the video program belongs is input by the user, and the potential semantic index model corresponding to the video category is selected for calculation, so that the efficiency of searching for the video program can be further improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for searching for a video program, comprising:

sorting the cosine similarity obtained by calculation from big to small, and selecting the video program corresponding to the column vector of the cosine similarity with the sorting sequence number belonging to the sorting interval to provide for the user;

the process of constructing the index matrix by the description document describing the video program comprises the following steps: taking the word frequency of the ith keyword appearing in the description document of the jth video program as the numerical value of the ith element of the jth column of the index matrix;

2. The method for searching for video programs according to claim 1, wherein the process of constructing the index matrix from the description documents describing the video programs of the same video category comprises:

calling a word segmentation tool;

3. The method for searching for a video program according to claim 1, wherein the constructing of the query vector describing the entry specifically comprises:

calling a word segmentation tool;

4. The method for searching for video programs according to claim 2, wherein if the index matrix is H, the latent semantic index model obtained by singular value decomposition of the index matrix is: h ═ T ═ S ^ D^T(ii) a Wherein T is an orthogonal matrix, and each column of the matrix T is a left singular vector of the index matrix H; s is a diagonal matrix, and diagonal elements of the matrix S are singular values of the index matrix H; d is an orthogonal matrix, and each column of the matrix D is a right singular vector of the index matrix H; the query vector is Q;

5. The method for searching for a video program according to claim 1, wherein the method for searching for a video program further comprises:

6. An apparatus for searching a video program, comprising:

the video program selecting module is used for sorting the cosine similarity obtained by calculation from large to small and selecting the video program corresponding to the column vector of the cosine similarity with the sorting number belonging to the sorting interval to provide for the user;

the query vector construction module includes a unit configured to construct an index matrix according to a description document describing a video program, and is specifically configured to: taking the word frequency of the ith keyword appearing in the description document of the jth video program as the numerical value of the ith element of the jth column of the index matrix;

7. The apparatus for searching for video programs according to claim 6, wherein the query vector construction module comprises a unit configured to construct an index matrix according to the description documents describing the video programs of the same video category, specifically:

the first tool calling unit is used for calling the word segmentation tool;

8. The apparatus for searching for a video program according to claim 6, wherein the query vector construction module further comprises a unit for constructing the query vector describing the entry, specifically:

the second tool calling unit is used for calling the word segmentation tool;

9. The apparatus for searching for video programs according to claim 7, wherein if the index matrix is H, the latent semantic index model obtained by singular value decomposition of the index matrix is: h ═ T ═ S ^ D^T(ii) a Wherein T is an orthogonal matrix, and each column of the matrix T is a left singular vector of the index matrix H; s is a diagonal matrix, and diagonal elements of the matrix S are singular values of the index matrix H; d is an orthogonal matrix, and each column of the matrix D is a right singular vector of the index matrix H; the query vector is Q;

the similarity calculation module specifically includes:

a model revision unit for selecting T_K、S_KAnd D_KMatrix, revising the latent semantic index model to H_K＝T_K*S_K*D_K ^T(ii) a Wherein, T_KIs a matrix formed by the first K columns of the matrix T, S_KFor a diagonal matrix formed by the first K diagonal elements of the matrix S, D_KIs a matrix formed by the first K columns of the matrix D; the numerical value of K is larger than the maximum sorting number contained in the sorting interval;

10. The apparatus for searching for a video program according to claim 6, wherein said searching means further comprises: