CN108241874B

CN108241874B - Video text area localization method based on BP neural network and spectrum analysis

Info

Publication number: CN108241874B
Application number: CN201810148366.1A
Authority: CN
Inventors: 霍华; 吕靖; 李宁波; 常国沁
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2018-02-13
Filing date: 2018-02-13
Publication date: 2020-12-18
Anticipated expiration: 2038-02-13
Also published as: CN108241874A

Abstract

The video text area localization method based on BP neural network and spectrum analysis, by constructing a BP neural network to classify the pixels in the video frame, and processing the pixels classified into the text class by a distance-based clustering algorithm to obtain candidates In the text area, after converting the candidate area to the frequency domain through fast Fourier transform, the neural network is constructed again, the candidate area is classified based on the spectrogram, and the candidate area classified as the false positive class is filtered out, thereby filtering out the false positive. The invention has the beneficial effects that the positioning accuracy is high and the practicability is more comprehensive.

Description

Video text area localization method based on BP neural network and spectrum analysis

技术领域technical field

本发明涉及图像处理技术领域，具体地说是基于BP神经网络和频谱分析的视频文字区域定位方法。The invention relates to the technical field of image processing, in particular to a video text area positioning method based on BP neural network and spectrum analysis.

背景技术Background technique

随着多媒体数据呈爆炸性增长，文本、图像、语音、视频等各种形式的多媒体信息都将被传入网上，人们正在飞速进入信息化社会。多媒体信息在网络以及通信中的应用也越来越广泛，尤其以视频数据为代表，该类数据成为人们分享信息的重要资源。视频数据以其丰富、直观而具体的信息表达形式成为最重要的信息载体，为人们传递着大量的信息和知识。其中新闻视频作为视频数据中有代表性的一种媒体，在视频资源中占有重要比例，相对于文本新闻，视频新闻生动、直观、易于理解、信息量大，广泛地受到人们的关注。由于新闻视频的特殊性，绝大部分高层语义都在文本字幕中，而音频和图像特征几乎被文本特征所包含，因此对新闻视频中字幕区域的定位提取就尤为重要。With the explosive growth of multimedia data, various forms of multimedia information such as text, images, voice, and video will be transmitted to the Internet, and people are rapidly entering the information society. The application of multimedia information in network and communication is also more and more extensive, especially represented by video data, which has become an important resource for people to share information. Video data has become the most important information carrier with its rich, intuitive and specific information expression, conveying a lot of information and knowledge to people. Among them, news video, as a representative medium of video data, occupies an important proportion of video resources. Compared with text news, video news is vivid, intuitive, easy to understand, and has a large amount of information, which has attracted people's attention. Due to the particularity of news videos, most of the high-level semantics are in text subtitles, and audio and image features are almost included in text features, so the location and extraction of subtitle regions in news videos is particularly important.

由于文本的颜色、大小、字体和位置是可变的，因此很难找到一种通用的方法将其与背景分离。文本位置方法可以分为两大类：基于区域的方法和基于纹理的方法。这些方法各有优缺点，针对不同的情况选用合适的方法才可以有很好的效果。但大多数方法定位后都包含很多假阳性区域，降低了文本定位的准确率。因此本发明提出一种新的方法，基于BP神经网络和频谱分析来对新闻视频字幕区域进行有效地定位并对定位后的假阳性区域进行滤除。Since the color, size, font and position of the text is variable, it is difficult to find a general way to separate it from the background. Text location methods can be divided into two broad categories: region-based methods and texture-based methods. These methods have their own advantages and disadvantages, and the appropriate method can be used for different situations to have good results. However, most methods contain many false positive regions after localization, which reduces the accuracy of text localization. Therefore, the present invention proposes a new method, which is based on BP neural network and spectrum analysis to effectively locate the subtitle region of news video and filter out the located false positive region.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提供基于BP神经网络和频谱分析的视频文字区域定位方法，解决目前现有方法中存在假阳性的问题，提高算法定位的准确率。The technical problem to be solved by the present invention is to provide a video text area localization method based on BP neural network and spectrum analysis, to solve the problem of false positives in the current existing methods, and to improve the accuracy of algorithm localization.

本发明为解决上述技术问题所采用的技术方案是：基于BP神经网络和频谱分析的视频文字区域定位方法，包括以下步骤：The technical scheme adopted by the present invention to solve the above-mentioned technical problems is: a method for locating video text region based on BP neural network and spectrum analysis, comprising the following steps:

步骤1、提取新闻视频帧，并将所提取的新闻视频帧转化为灰度图像；Step 1. Extract news video frames, and convert the extracted news video frames into grayscale images;

步骤2、构建BP神经网络作为分类器，将每个图像内的所有像素点进行分类，获得分类为文字类的像素点；Step 2. Build a BP neural network as a classifier, classify all pixels in each image, and obtain pixels classified as text;

步骤3、对步骤2获得的分类为文字类的像素点进行基于距离的聚类处理，获得候选文字区域；Step 3. Perform distance-based clustering processing on the pixels classified as text classes obtained in step 2 to obtain candidate text regions;

步骤4、对步骤3中获得的候选文字区域进行快速傅里叶变换得到频谱图；Step 4, perform fast Fourier transform on the candidate text area obtained in step 3 to obtain a spectrogram;

步骤5、再次构建BP神经网络作为分类器，将候选文字区域分类，滤除假阳性区域。Step 5: Build the BP neural network again as a classifier, classify the candidate text regions, and filter out false positive regions.

本发明所述步骤2构建BP神经网络作为分类器，将每个图像内的所有像素点进行分类的具体方法为：The step 2 of the present invention constructs a BP neural network as a classifier, and the specific method for classifying all the pixels in each image is as follows:

步骤2.1、对图像内的所有像素点进行角点检测，将被判定为角点的像素点赋予特征值1，非角点赋予特征值0；Step 2.1. Perform corner detection on all pixels in the image, and assign a feature value of 1 to a pixel determined to be a corner, and a feature value of 0 to non-corner points;

步骤2.2、依次将每个像素点作为中心像素点，取其M*M大小的邻域窗口作为特征窗口；Step 2.2. Take each pixel as the center pixel in turn, and take its M*M-sized neighborhood window as the feature window;

步骤2.3、构建神经网络，将所取窗口内所有像素点的灰度值以及角点判定值作为神经网络的输入，输入层结点的个数m设置为M*M*2个，输出层结点个数n设置为2；Step 2.3, build a neural network, take the gray value of all pixels in the selected window and the corner point judgment value as the input of the neural network, set the number m of nodes in the input layer to M*M*2, and set the number of nodes in the output layer as M*M*2. The number of points n is set to 2;

步骤2.4、设置隐含层结点，隐含层结点的个数N的设置由公式(1)或(2)计算得到：Step 2.4. Set the hidden layer nodes. The setting of the number N of hidden layer nodes is calculated by formula (1) or (2):

其中，N表示所设置隐含层结点的个数，m和n分别表示输入层和输出层结点的个数，a为常数；Among them, N represents the number of hidden layer nodes set, m and n represent the number of input layer and output layer nodes, respectively, a is a constant;

步骤2.5、输出层2个结点分别代表文字类和非文字类，输出为一个包含两个浮点值的向量，标定样本时，将属于文字类的像素点标定为(1,0)，将非文字类的像素点标定为(0,1)；Step 2.5. The two nodes in the output layer represent text classes and non-text classes respectively, and the output is a vector containing two floating-point values. When calibrating the sample, the pixels belonging to the text class are demarcated as (1,0), and the Non-text pixels are marked as (0,1);

步骤2.6、对神经网络进行训练和测试，测试样本的像素点的输出向量中，如果第一个值大于第二个值，那么该像素点被分类为文字类，如果第二个值大于第一个值，那么该像素点被分类为非文字类，最后将所有被判定为文字类的像素点进行标记。Step 2.6. Train and test the neural network. In the output vector of the pixel of the test sample, if the first value is greater than the second value, the pixel is classified as a text class, if the second value is greater than the first value. value, then the pixel is classified as a non-text class, and finally all pixels that are determined to be a text class are marked.

本发明所述步骤3对分类为文字类的像素点进行基于距离的聚类处理，获得候选文字区域的具体方法为：The step 3 of the present invention performs distance-based clustering processing on the pixels classified as text, and the specific method for obtaining candidate text regions is as follows:

步骤3.1、设置距离阈值d₁，在所有被分类为文字类的像素点中，随机选择一个像素点P₁作为基本像素点，并计算P₁和其它被分类为文字类的像素点之间的欧氏距离，并将欧氏距离小于d₁的像素点加入P₁的集合G₁，直到所有满足条件的像素点都被找到，然后将G₁中除P₁外的其他像素点依次作为基本像素点进行同样的操作，直到没有新的像素点加入该集合，则集合G₁将被分类为K₁类；Step 3.1. Set the distance threshold d ₁ . Among all the pixels classified as text, randomly select a pixel P ₁ as the basic pixel, and calculate the distance between P ₁ and other pixels classified as text. Euclidean distance, and add the pixels whose Euclidean distance is less than d ₁ to the set G ₁ of P ₁ until all the pixels that meet the conditions are found, and then take other pixels in G ₁ except P ₁ as the basic Pixels perform the same operation until no new pixels are added to the set, then the set G ₁ will be classified as K ₁ ;

步骤3.2、对所有被分类为文字类的除K₁类外的其他像素点依次重复以上操作，直到所有文字类像素点被分类，得到所有的类K_t，t≥1；Step 3.2. Repeat the above operation for all other pixels classified as text class except K ₁ class until all text class pixels are classified, and obtain all classes K _t , t≥1;

步骤3.3、将所有包含过少像素点的类清除；Step 3.3. Clear all classes that contain too few pixels;

步骤3.4、做出每个类的最小外接矩形，即获得了候选文字区域。Step 3.4, make the minimum circumscribed rectangle of each class, that is, obtain the candidate text area.

本发明所述步骤4对获得的候选文字区域进行快速傅里叶变换得到频谱图的具体方法为：The specific method for obtaining a spectrogram by performing fast Fourier transform on the obtained candidate text region in step 4 of the present invention is:

步骤4.1、对候选文字区域进行图像的二值化；Step 4.1, perform image binarization on the candidate text area;

步骤4.2、将二值化后的图像进行竖直方向上的灰度投影；Step 4.2, perform grayscale projection on the binarized image in the vertical direction;

步骤4.3、对投影后的函数进行快速傅里叶变换，将时间域转换到频率域，得到频谱图。Step 4.3: Perform fast Fourier transform on the projected function, convert the time domain to the frequency domain, and obtain a spectrogram.

本发明所述步骤5将候选文字区域分类，滤除假阳性区域的具体方法为：步骤5.1、在频谱图中，选取平均文字宽度的2到3倍大小的特征窗口，此窗口不包含频率1；Step 5 of the present invention classifies candidate text regions, and the specific method for filtering out false positive regions is as follows: Step 5.1. In the spectrogram, select a feature window with a size of 2 to 3 times the average text width, and this window does not contain frequency 1 ;

步骤5.2、构建BP神经网络，将所取窗口范围内频率所对应的的幅值以及平均文字宽度附近频率内最高幅值所处的频率值作为神经网络的输入值，设置输出层结点数为2；Step 5.2, build a BP neural network, take the amplitude corresponding to the frequency in the selected window range and the frequency value of the highest amplitude in the frequency near the average text width as the input value of the neural network, and set the number of nodes in the output layer to 2 ;

步骤5.3、使用公式(1)或公式(2)进行隐含层结点的选取；Step 5.3. Use formula (1) or formula (2) to select hidden layer nodes;

步骤5.4、2个输出层结点数分别代表真阳性区域(含有文字的候选文字区域)和假阳性区域(不含有文字的候选文字区域)，输出为一个包含两个浮点值的向量，标定样本时，将真阳性文字区域标定为(1,0)，将假阳性区域标定为(0,1)；Step 5.4, the number of 2 output layer nodes respectively represent the true positive area (candidate text area containing text) and false positive area (candidate text area without text), the output is a vector containing two floating-point values, and the sample is calibrated When , the true positive text area is marked as (1,0), and the false positive area is marked as (0,1);

步骤5.5、对神经网络进行训练和测试，测试样本的候选文字区域的输出向量中，如果第一个值大于第二个值，那么该候选区域将被分类为真阳性类，如果第二个值大于第一个值，那么该候选区域将被分类为假阳性类并予以滤除；Step 5.5. Train and test the neural network. In the output vector of the candidate text area of the test sample, if the first value is greater than the second value, the candidate area will be classified as a true positive class. If the second value is greater than the first value, then the candidate region will be classified as a false positive class and filtered out;

步骤5.6、滤除假阳性区域后所剩的真阳性候选文字区域即为最终文字定位区域。Step 5.6: The true positive candidate text region remaining after filtering out the false positive region is the final text positioning region.

本发明所述步骤2.1中进行角点检测的方法为Harris角点检测法。The method for corner detection in step 2.1 of the present invention is the Harris corner detection method.

本发明所述步骤2.4中常数a的取值为1-10。The value of the constant a in step 2.4 of the present invention is 1-10.

本发明所述步骤3.3中被清除的类的像素点的个数小于20个。The number of pixel points of the class to be cleared in step 3.3 of the present invention is less than 20.

本发明的有益效果是：本发明所提供的的方法通过构建BP神经网络对视频帧中的像素点进行分类，对分类到文字类的像素点进行一个基于距离的聚类算法的处理后得到候选文本区域，将候选区域通过快速傅里叶转换到频率域后，再次构建神经网络，基于频谱图对候选区域进行分类，滤除分类为假阳性类的候选区域，从而滤除假阳性，提高算法的准确率，使对于视频文字区域的定位更加准确，实用性更加全面。The beneficial effects of the present invention are: the method provided by the present invention classifies the pixel points in the video frame by constructing a BP neural network, and performs a distance-based clustering algorithm on the pixel points classified into the text category to obtain candidates. In the text area, after the candidate area is converted to the frequency domain by fast Fourier, the neural network is constructed again, the candidate area is classified based on the spectrogram, and the candidate area classified as false positive class is filtered out, so as to filter out the false positive and improve the algorithm. The accuracy rate of the video text area is more accurate and the practicability is more comprehensive.

附图说明Description of drawings

图1为本发明所涉及定位方法的流程示意图；1 is a schematic flowchart of a positioning method according to the present invention;

图2为本发明所构建的BP神经网络模型；Fig. 2 is the BP neural network model constructed by the present invention;

图3为本发明基于距离的广度优先聚类算法图；3 is a diagram of a distance-based breadth-first clustering algorithm of the present invention;

图4为本发明候选文字区域定位例图；Fig. 4 is the candidate text area positioning example diagram of the present invention;

图5为本发明含有假阳性的候选文字区域定位示例图；FIG. 5 is an example diagram of the location of candidate text regions containing false positives according to the present invention;

图6为本发明候选文本区域中真阳性例图；Fig. 6 is a true positive example diagram in the candidate text area of the present invention;

图7为本发明候选文本区域中假阳性例图；Fig. 7 is a false positive example diagram in the candidate text area of the present invention;

图8为图6真阳性例图的频谱图；Fig. 8 is the spectrogram of the true positive example graph of Fig. 6;

图9为图7假阳性例图的频谱图；Fig. 9 is the spectrogram of the false positive example graph of Fig. 7;

图10为图5经过假阳性滤除后的效果图；Fig. 10 is an effect diagram after filtering out false positives in Fig. 5;

图11为本发明与现有方法在不同情况下定位结果对比图。FIG. 11 is a comparison diagram of the positioning results of the present invention and the existing method under different conditions.

具体实施方式Detailed ways

下面结合说明书附图对本发明的具体实施方式(实施例)进行描述，使本领域的技术人员能够更好地理解本发明。The specific embodiments (embodiments) of the present invention will be described below with reference to the accompanying drawings, so that those skilled in the art can better understand the present invention.

图1为本发明方法的流程图。基于BP神经网络和频谱分析的视频文字区域定位方法，具体分为以下步骤：Figure 1 is a flow chart of the method of the present invention. The video text area localization method based on BP neural network and spectrum analysis is divided into the following steps:

步骤1：构建如图2所示的BP神经网络模型作为分类器，选用5*5大小的邻域窗口作为特征窗口，设置输入层结点数为50个，隐含层结点数为17个，输出层结点数为2个，对数据集内的视频帧像素点进行人工标注，并进行神经网络的训练和测试。Step 1: Build the BP neural network model as shown in Figure 2 as the classifier, select the 5*5 size neighborhood window as the feature window, set the number of input layer nodes to 50, the number of hidden layer nodes to 17, and output The number of layer nodes is 2, and the video frame pixels in the dataset are manually marked, and the neural network is trained and tested.

如图3所示，被分类为文字类的像素点被标记为蓝色方便展示。As shown in Figure 3, the pixels classified as text are marked in blue for easy display.

步骤2：对步骤1处理后所得到的被分类为文字类的像素点进行基于距离的广度优先聚类，图3为算法示意图。Step 2: Perform distance-based breadth-first clustering on the pixels classified as text categories obtained after the processing in Step 1. FIG. 3 is a schematic diagram of the algorithm.

图4展示了经过聚类后，例图的候选文本区域定位效果。Figure 4 shows the localization effect of candidate text regions in the example image after clustering.

步骤3：由于步骤1不能完全的将文字类和非文字类像素点无误的正确分类，所以在一些情况下会生成假阳性区域，图5为一个具有多个假阳性候选文字区域的定位结果图。Step 3: Since step 1 cannot completely and correctly classify text and non-text pixels, false positive regions will be generated in some cases. Figure 5 is a localization result map with multiple false positive candidate text regions .

将所有得到的候选文本区域二值化后在竖直方向进行灰度投影，并进行傅里叶变换后得到频谱图，图6和图7为真阳性区域和假阳性区域的示例图。图8为图真阳性例图的频谱图，图9为图7假阳性例图的频谱图，两者对比发现真阳性区域和假阳性区域的频谱图有明显不同。After binarizing all the obtained candidate text regions, perform grayscale projection in the vertical direction, and perform Fourier transform to obtain a spectrogram. Figures 6 and 7 are example diagrams of true positive regions and false positive regions. Figure 8 is the spectrogram of the true positive example, and Figure 9 is the spectrogram of the false positive example of Figure 7. Comparing the two, it is found that the spectrograms of the true positive area and the false positive area are significantly different.

再次构建BP神经网络作为分类器，在候选区域的频谱图中选取频率2-频率60的窗口为特征窗口，宽度不足60的候选文本区域将不足的频率所对应的幅值全部设置为0，设置输入层结点数为60，隐含层结点数为45，输出层结点数为2。对所有产生的候选文本区域进行人工标注，并进行神经网络的训练和测试。Build the BP neural network as the classifier again, select the frequency 2-frequency 60 window as the feature window in the spectrogram of the candidate area, and set the amplitude corresponding to the insufficient frequency to 0 for the candidate text area with a width of less than 60. The number of nodes in the input layer is 60, the number of nodes in the hidden layer is 45, and the number of nodes in the output layer is 2. All generated candidate text regions are manually annotated, and the neural network is trained and tested.

步骤4：将分类为假阳性类的候选文字区域全部滤除，保留下来的真阳性类候选文字区域即为最终文字定位区域。图10为经过假阳性滤除后的图5中例图的最终定位结果。Step 4: All candidate text regions classified as false positives are filtered out, and the remaining true positive candidate text regions are the final text positioning regions. Figure 10 is the final positioning result of the example image in Figure 5 after false positive filtering.

步骤5：图11为本方法与现有的基于角点的方法和基于边缘的方法在一些不同情况下的定位结果对比，发现此方法的适用性更全面，对于多种不同的定位情况都有不错的定位效果。Step 5: Figure 11 compares the localization results of this method with the existing corner-based methods and edge-based methods in some different situations. It is found that this method is more comprehensive in applicability, and has a variety of different localization situations. Good positioning effect.

本发明使用Harris角点检测法对图片内的所有像素点进行角点检测，但不仅限于Harris算法，也可以用其他的角点检测算法。由于Harris算法能够在图像发生灰度变化、旋转和干扰噪声等情况下检测兴趣点，有较好的抗噪能力，而且兼顾了效率和精度两方面的要求，误检测率低，角点提取的可靠性高，因此本发明使用此方法来进行角点检测。The present invention uses the Harris corner detection method to perform corner detection on all pixels in the picture, but is not limited to the Harris algorithm, and other corner detection algorithms can also be used. Because the Harris algorithm can detect interest points in the case of grayscale changes, rotation and interference noise in the image, it has good anti-noise ability, and takes into account the requirements of both efficiency and accuracy, the false detection rate is low, and the corner extraction The reliability is high, so the present invention uses this method for corner detection.

以上所述为本发明提供的一种基于BP神经网络和频谱分析的新闻视频文字区域定位方法。但本发明的保护范围并不局限于此，任何熟悉本领域的技术人员在本发明思想的基础上均会有改变，综上所述，本说明书内容不应理解为对本发明的限制。The above is a method for locating the text region of news video based on BP neural network and spectrum analysis provided by the present invention. However, the protection scope of the present invention is not limited to this, and any person skilled in the art will make changes based on the idea of the present invention. To sum up, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. the video text area localization method based on BP neural network and spectrum analysis, is characterized in that: may further comprise the steps:

Step 1. Extract news video frames, and convert the extracted news video frames into grayscale images;

Step 2. Build a BP neural network as a classifier, classify all pixels in each image, and obtain pixels classified as text;

Step 3. Perform distance-based clustering processing on the pixels classified as text classes obtained in step 2 to obtain candidate text regions;

Step 4, perform fast Fourier transform on the candidate text area obtained in step 3 to obtain a spectrogram;

Step 5. Build the BP neural network again as a classifier, classify the candidate text regions, and filter out the false positive regions. The specific method is as follows:

Step 5.1. In the spectrogram, select a feature window with a size of 2 to 3 times the average text width, and this window does not contain frequency 1;

Step 5.2, build a BP neural network, take the amplitude corresponding to the frequency in the selected window range and the frequency value of the highest amplitude in the frequency near the average text width as the input value of the BP neural network, and set the number of output layer nodes as 2;

Step 5.3. Use formula (1) or formula (2) to select hidden layer nodes:

Among them, N represents the number of hidden layer nodes set, m and n represent the number of input layer and output layer nodes, respectively, a is a constant;

In step 5.4, the two output layer nodes represent the true positive area and the false positive area respectively, and the output is a vector containing two floating-point values. The positive area is marked as (0,1);

Step 5.5. Train and test the BP neural network. In the output vector of the candidate text area of the test sample, if the first value is greater than the second value, then the candidate text area will be classified as a true positive class. If the value is greater than the first value, the candidate text area will be classified as false positive and filtered out;

Step 5.6: The true positive candidate text region remaining after filtering out the false positive region is the final text positioning region.

2. the video text area localization method based on BP neural network and spectrum analysis according to claim 1, is characterized in that: described step 2 builds BP neural network as classifier, and all pixel points in each image are classified The specific method is:

Step 2.1. Perform corner detection on all pixels in the image, and assign a feature value of 1 to a pixel determined to be a corner, and a feature value of 0 to non-corner points;

Step 2.2. Take each pixel as the center pixel in turn, and take its M*M-sized neighborhood window as the feature window;

Step 2.3. Build a BP neural network, take the gray value of all pixels in the window and the judgment value of the corner points as the input of the BP neural network, set the number m of nodes in the input layer to M*M*2, and output The number of layer nodes n is set to 2;

Step 2.4. Set the hidden layer nodes. The setting of the number N of hidden layer nodes is calculated by formula (1) or (2):

Step 2.5. The two nodes in the output layer represent text classes and non-text classes respectively, and the output is a vector containing two floating-point values. When calibrating the sample, the pixels belonging to the text class are demarcated as (1,0), and the Non-text pixels are marked as (0,1);

Step 2.6. Train and test the BP neural network. In the output vector of the pixel point of the test sample, if the first value is greater than the second value, then the pixel is classified as a text class. If the second value is greater than the first value A value, then the pixel is classified as a non-text class, and finally all pixels that are determined to be a text class are marked.

3. the video text area positioning method based on BP neural network and spectrum analysis according to claim 1, is characterized in that: described step 3 carries out distance-based clustering processing to the pixel points classified as text class, obtains candidate text The specific methods of the area are:

Step 3.1. Set the distance threshold d ₁ . Among all the pixels classified as text, randomly select a pixel P ₁ as the basic pixel, and calculate the distance between P ₁ and other pixels classified as text. Euclidean distance, and add the pixels whose Euclidean distance is less than d ₁ to the set G ₁ of P ₁ until all the pixels that meet the conditions are found, and then take other pixels in G ₁ except P ₁ as the basic Pixels perform the same operation until no new pixels are added to the set, then the set G ₁ will be classified as K ₁ ;

Step 3.2. Repeat the above operation for all other pixels classified as text classes except K ₁ class until all text class pixels are classified, and get all classes K ₁ , K ₂ , ..., K _t , t ≥1;

Step 3.3. Clear all classes containing less than 20 pixels;

Step 3.4, make the minimum circumscribed rectangle of each class, that is, obtain the candidate text area.

4. the video character area positioning method based on BP neural network and spectrum analysis according to claim 1, is characterized in that: the concrete method that described step 4 carries out fast Fourier transform to the candidate character area that obtains and obtains spectrogram is: :

Step 4.1, perform image binarization on the candidate text area;

Step 4.2, perform grayscale projection on the binarized image in the vertical direction;

Step 4.3: Perform fast Fourier transform on the projected function, convert the time domain to the frequency domain, and obtain a spectrogram.

5. The video text region localization method based on BP neural network and spectrum analysis according to claim 2, wherein the method for corner detection in the step 2.1 is Harris corner detection method.

6 . The method for locating video text regions based on BP neural network and spectrum analysis according to claim 2 , wherein the value of the constant a in the step 2.4 is 1-10. 7 .