CN109284751A

CN109284751A - A non-text filtering method based on spectral analysis and SVM for text localization

Info

Publication number: CN109284751A
Application number: CN201811281682.2A
Authority: CN
Inventors: 霍华; 桂洋; 吕靖; 杜琰
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2019-01-29

Abstract

The non-textual filtering method of text location based on spectrum analysis and SVM, comprising the following steps: file building data set is stored in the picture for needing to carry out text location and is used；Primary Location is carried out using picture of the method for candidate edge recombination and marginal classification to storage, obtains Primary Location region；By Primary Location region binaryzation, and Gray Projection in the vertical direction；The one-dimension array obtained after step 3 Gray Projection is transformed into frequency domain by Fast Fourier Transform (FFT) and obtains the spectrogram in Primary Location region, the characteristics of spectrogram is analyzed；Non-textual filter out is carried out to Primary Location region using SVM classifier.The invention has the advantages that: the accuracy rate of text location is higher.

Description

The non-textual filtering method of text location based on spectrum analysis and SVM

Technical field

The present invention relates to intelligent information processing technology fields, are related to the relevant technologies such as image procossing and artificial intelligence, specifically Ground say be the text location based on spectrum analysis and SVM non-textual filtering method.

Background technique

With the fast development of people's daily life, either people's mobile phone, camera shooting natural scene image, also It is the picture seen when browsing Web news web page, they all include a large amount of highly important text informations.When we think at once The text conversion in image at editable text information, then the text location in image becomes essential premise Link.So the text location in image efficiently and is accurately become extremely important to obtain these text informations just.

But due to the continuous development of information, the content and quality of picture also change constantly.Due to the size of picture, packet Various fonts containing text, line orientations, different lighting conditions, the factors such as weak character and complicated background in picture are quasi- True is still a challenge and a vital task by the text location in picture.However text is positioned as needing in picture The worker for big data analysis data are largely grabbed from network provides convenience, pushes the development of correlative study, is simultaneously Many forward position focus such as artificial intelligence provide relevant technical support.

Existing text location method has much, but the non-textual method filtered out is not very much, to sum up usually Two classes can be divided into: the method based on geometric attribute and text structure and the method based on machine learning classification, to be based on geometry The method of attribute and text structure is most widely used.In method wherein based on geometric attribute and design feature, by text The structural analysis of stroke is combined with color assignment, and non-textual region is eliminated in the methods of wiping out background interference etc..This kind of side Method is all simple and is easy to carry out, but the effect when disturbing factor is excessively complicated is not very good.

With the continuous development of research method, start the recurrent neural network for proposing that there is long-term short-term memory (LSTM) (RNN) to be identified by the feature that CNN is extracted from entire picture, this method be pass through explore contextual information and avoid by Mistake caused by dividing.There are also proposing that removing algorithm according to node is decomposed into simple assemblies for complex assemblies, then non-intersecting Skeleton combination on spliced corresponding text filed to obtain, finally use the categorizing system based on HMM to carry out false positive It filters out.Although these method accuracys rate are higher, carry out excessively complexity, and do not have wide applicability.

Summary of the invention

Non-textual technical problem to be solved by the invention is to provide the text location based on spectrum analysis and SVM filters out Method solves the problems, such as that current localization method effect is undesirable, improves the accuracy rate of text location.

Used technical solution is the present invention to solve above-mentioned technical problem: the text based on spectrum analysis and SVM is fixed The non-textual filtering method of position, comprising the following steps:

Step 1 is stored in file building data set use to the picture for needing to carry out text location；

Step 2 carries out Primary Location using the picture that the method for candidate edge recombination and marginal classification stores step 1, obtains Primary Location region；

Step 3, by Primary Location region binaryzation, and Gray Projection in the vertical direction；

Step 4, by Fast Fourier Transform (FFT) by the one-dimension array obtained after step 3 Gray Projection be transformed into frequency domain obtain just The spectrogram for walking localization region, analyzes the characteristics of spectrogram；

Step 5 carries out non-textual filter out to Primary Location region using SVM classifier；

Step 6 calculates locating accuracy.

The specific of Gray Projection is carried out to Primary Location region binaryzation in step 3 of the present invention and on numerical value direction Step are as follows:

The Primary Location area image that step 2 obtains is converted into the gray level image convenient for image procossing by step 3.1；

Step 3.2 carries out noise-filtering to the gray level image that step 3.1 obtains；

Step 3.3 is analyzed and processed the processed gray level image of step 3.2 to obtain its global threshold T, then by grayscale image Pixel of the gray value less than T value of the point after threshold process is set as 1 as in, otherwise the value of the point is set as after threshold process 0；

Step 3.4, the projection that the binary image obtained after step 3.3 processing is made to vertical direction, the black to add up on each column Pixel obtains its number.

Frequency domain is converted it to by Fast Fourier Transform (FFT) in step 4 of the present invention, the characteristics of to spectrogram into Row analysis method particularly includes:

The one-dimension array obtained after Gray Projection on vertical direction is carried out Fast Fourier Transform (FFT) by step 4.1, makes rows of week Phase signal is transformed into frequency domain and obtains the spectrogram in Primary Location region；

Step 4.2 obtains the spectrogram in whole Primary Location regions by above step, including text filed spectrogram With non-textual region spectrogram；

Spectrogram and corresponding Primary Location region are put on serial number by step 4.3, and a large amount of spectrogram is analyzed and summarized Non-textual and text difference characteristic parameters of spectra can be most represented out.

The non-textual specific method filtered out is carried out to Primary Location region using SVM classifier in step 5 of the present invention Are as follows:

Step 5.1, selected SVM classifier are as training tool；

Amplitude corresponding to frequency and frequency 12 arrive the magnitude peak between frequency 31 in step 5.2, selection frequency 2 to frequency n It is inputted as input feature vector value；

Step 5.3 sets the output of SVM as 2 one-dimension arrays；

Step 5.4, the functional relation for being determined by experiment output result.

The beneficial effects of the present invention are: present invention effectively removes leave after existing text location method Primary Location Non-legible region improves the accuracy rate of text location；This method applicability is than wide, not only to picture in news web page Non-legible filtration result is fine, also has preferable promotion to the non-legible filtration result of the picture in natural scene.

Detailed description of the invention

Fig. 1 is the flow chart of non-textual filtering method according to the present invention；

Fig. 2 is the original image in Primary Location region of the present invention；

Fig. 3 is the present invention image later to Primary Location area grayscaleization；

Fig. 4 is grayscale image of the present invention in vertical direction projection；

Fig. 5 is non-textual region of the invention and corresponding spectrogram；

Fig. 6 is the text filed and corresponding spectrogram of the present invention；

Fig. 7 is non-textual region of the invention and corresponding output result figure group；

Fig. 8 is the text filed and corresponding output result figure group of the present invention.

Specific embodiment

A specific embodiment of the invention (embodiment) is described with reference to the accompanying drawings of the specification, makes the skill of this field Art personnel better understood when the present invention.

The non-textual filtering method of text location based on spectrum analysis and SVM, comprising the following steps:

Step S1: it carries out encoding automatic refreshing Website page by Python and intercepts news picture automatically, obtain a certain amount of Picture is as first data set.The racing data collection (ICDAR2011) of downloading natural scene picture, which is stored in file, to be made For second data set.Two datasets are carried out respectively it is non-legible filter out, to obtain in news web page picture and natural field The non-legible filtration result of picture in scape.

Step S2: recombinating picture using candidate edge and the method for marginal classification carries out Primary Location, and it is tentatively fixed to obtain Position region.

Step S3: by the grayscale image binaryzation in Primary Location region, and Gray Projection in the vertical direction, specifically include with Lower step:

Step S31: Primary Location region is carried out first, in accordance with formula Gray=R × 0.299+G × 0.587+B × 0.114 The color image being collected into is converted into the gray level image convenient for image procossing by the color image gray processing of news or natural scene As shown in Figure 3.

Step S32: since the unnecessary information in gray level image will affect the corresponding spectrogram knot in Primary Location region Fruit, so to carry out noise-filtering to gray level image.

Step S33: setting global threshold T, then carry out thresholding to original image: all gray values are less than the pixel of T, The value of the point is 1 after threshold process；Conversely, all gray values are greater than the pixel of T, the value of the point is 0 after threshold process.

Step S34: the width of the bianry image in Primary Location region is set as W, a height of H, we make vertical direction to the region Projection, formula are as follows:

V (j) indicates that jth arranges the sum of upper black pixel point in formula, and NewPixel (i, j) indicates the value of each pixel, if The point is white then NewPixel (i, j)=0, and NewPixel (i, j)=1:j indicates pixel place if the point is black Column, and 0≤j < W:i indicates the row where the pixel.The grayscale image of vertical direction projection is as shown in Figure 4.

Step S4: converting it to frequency domain by Fast Fourier Transform (FFT), analyzes the characteristics of spectrogram, specifically Method are as follows:

Step S41: the one-dimension array obtained after Gray Projection on vertical direction is subjected to Fast Fourier Transform (FFT), makes rows of week Phase signal is transformed into frequency domain and obtains the spectrogram in Primary Location region.

Step S42: obtaining the spectrogram in whole Primary Location regions by above step, including non-textual region and Its spectrogram (as shown in Figure 5) and the text filed spectrogram (as shown in Figure 6) with it.

Step S43: analyzing the spectrogram in a large amount of non-textual region, and a large amount of text filed spectrogram, It was found that frequency text filed and non-textual difference from 0 to 1 is little；Have from frequency 2 to amplitude frequency 30 more apparent floating Dynamic, the frequency for summing up letter signal is approximately equal to the physical width of text.

Step S431: making marks to Primary Location region, and true positives zone marker is (1,0), by false positive zone marker For (0,1), Experimental comparison sums up the characteristic parameters of spectra that can most represent false positive and non-false positive difference.

Step S5: use SVM classifier as training tool, selecting frequency 2 to frequency n, (the two of average text width arrive Three times) amplitude and frequency 12 corresponding to interior frequency to the magnitude peak between frequency 31 be also used as input feature vector value to input, and it is defeated Out it is 2 one-dimension arrays:

(y₁1,y₂1,…,y_i1), (y₁2,y₂2,…,y_i2)；

The corresponding output result in text filed and non-textual region is respectively such as Fig. 7, shown in Fig. 8.It is determined by experiment output result Functional relation, such as export result meet w_i=y_i1-y_i2 > Δ t be it is text filed, meet w_i=y_i1-y₂1≤Δ t is then non- It is text filed.

Step S6: accuracy rate, recall rate and F measure formulas after calculating non-textual filter out are respectively as follows:

P in formula, R, F are accuracy rate, recall rate and F value respectively；n₁It is the quantity for the text block being correctly detecting, m₁It is non-text The quantity of this block, m₂It is the quantity for losing text block.

Claims

1. the non-text filtering method based on the character location of spectrum analysis and SVM, is characterized in that: comprise the following steps:

Step 1. Save the pictures that need text positioning into the file to construct a data set;

Step 2. Use the method of candidate edge reorganization and edge classification to perform preliminary positioning on the picture stored in step 1 to obtain a preliminary positioning area;

Step 3. Binarize the preliminary positioning area and project the grayscale in the vertical direction;

Step 4. Convert the one-dimensional array obtained after the grayscale projection in step 3 into the frequency domain through fast Fourier transform to obtain a spectrogram of the preliminary positioning area, and analyze the characteristics of the spectrogram;

Step 5. Use the SVM classifier to filter out the non-text of the preliminary positioning area;

Step 6: Calculate the positioning accuracy.

2. the non-text filtering method based on the character location of spectrum analysis and SVM according to claim 1, it is characterized in that: in described step 3, to preliminary location area binarization and carry out grayscale projection in numerical value direction. The specific steps are:

Step 3.1, convert the preliminary positioning area image obtained in step 2 into a grayscale image that is convenient for image processing;

Step 3.2, perform noise filtering on the grayscale image obtained in step 3.1;

Step 3.3, analyze and process the grayscale image processed in step 3.2 to obtain its global threshold T, and then set the value of the pixel whose grayscale value is less than T in the grayscale image to 1 after thresholding, and vice versa. After processing, the value of this point is set to 0;

Step 3.4: Project the binarized image obtained after the processing in step 3.3 in the vertical direction, and accumulate the black pixels on each column to obtain the number thereof.

3. the non-text filtering method based on the character location of spectrum analysis and SVM according to claim 1, is characterized in that: in described step 4, it is converted into frequency domain by fast Fourier transform, to the frequency domain of spectrogram. The specific methods for analyzing the characteristics are as follows:

Step 4.1. Perform fast Fourier transform on the one-dimensional array obtained after grayscale projection in the vertical direction, so that the periodic signal in a row is converted into the frequency domain to obtain the spectrogram of the preliminary positioning area;

Step 4.2, obtain the spectrograms of all the preliminary positioning areas through the above steps, including the spectrograms of the text area and the non-text area spectrograms;

Step 4.3: Mark the spectrogram and the corresponding preliminary positioning area with serial numbers, analyze a large number of spectrograms, and summarize the spectral feature values that can best represent the difference between non-text and text.

4. the non-text filtering method based on the character location of spectrum analysis and SVM according to claim 1, it is characterized in that: in described step 5, use SVM classifier to carry out the concrete method of non-text filtering to preliminary positioning area as :

Step 5.1, select the SVM classifier as the training method;

Step 5.2, select the amplitude corresponding to the frequency in the frequency 2 to the frequency n and the amplitude peak value between the frequency 12 and the frequency 31 as the input characteristic value input;

Step 5.3, set the output of SVM as 2 one-dimensional arrays;

Step 5.4: Determine the functional relationship of the output results through experiments.