CN109284751A - A non-text filtering method based on spectral analysis and SVM for text localization - Google Patents
A non-text filtering method based on spectral analysis and SVM for text localization Download PDFInfo
- Publication number
- CN109284751A CN109284751A CN201811281682.2A CN201811281682A CN109284751A CN 109284751 A CN109284751 A CN 109284751A CN 201811281682 A CN201811281682 A CN 201811281682A CN 109284751 A CN109284751 A CN 109284751A
- Authority
- CN
- China
- Prior art keywords
- text
- svm
- spectrogram
- frequency
- preliminary positioning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000001914 filtration Methods 0.000 title claims abstract description 18
- 238000010183 spectrum analysis Methods 0.000 title claims abstract description 11
- 230000004807 localization Effects 0.000 title description 3
- 238000012545 processing Methods 0.000 claims description 5
- 238000003491 array Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims 1
- 230000008521 reorganization Effects 0.000 claims 1
- 230000003595 spectral effect Effects 0.000 claims 1
- 230000006798 recombination Effects 0.000 abstract description 2
- 238000005215 recombination Methods 0.000 abstract description 2
- 238000011161 development Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The non-textual filtering method of text location based on spectrum analysis and SVM, comprising the following steps: file building data set is stored in the picture for needing to carry out text location and is used;Primary Location is carried out using picture of the method for candidate edge recombination and marginal classification to storage, obtains Primary Location region;By Primary Location region binaryzation, and Gray Projection in the vertical direction;The one-dimension array obtained after step 3 Gray Projection is transformed into frequency domain by Fast Fourier Transform (FFT) and obtains the spectrogram in Primary Location region, the characteristics of spectrogram is analyzed;Non-textual filter out is carried out to Primary Location region using SVM classifier.The invention has the advantages that: the accuracy rate of text location is higher.
Description
Technical field
The present invention relates to intelligent information processing technology fields, are related to the relevant technologies such as image procossing and artificial intelligence, specifically
Ground say be the text location based on spectrum analysis and SVM non-textual filtering method.
Background technique
With the fast development of people's daily life, either people's mobile phone, camera shooting natural scene image, also
It is the picture seen when browsing Web news web page, they all include a large amount of highly important text informations.When we think at once
The text conversion in image at editable text information, then the text location in image becomes essential premise
Link.So the text location in image efficiently and is accurately become extremely important to obtain these text informations just.
But due to the continuous development of information, the content and quality of picture also change constantly.Due to the size of picture, packet
Various fonts containing text, line orientations, different lighting conditions, the factors such as weak character and complicated background in picture are quasi-
True is still a challenge and a vital task by the text location in picture.However text is positioned as needing in picture
The worker for big data analysis data are largely grabbed from network provides convenience, pushes the development of correlative study, is simultaneously
Many forward position focus such as artificial intelligence provide relevant technical support.
Existing text location method has much, but the non-textual method filtered out is not very much, to sum up usually
Two classes can be divided into: the method based on geometric attribute and text structure and the method based on machine learning classification, to be based on geometry
The method of attribute and text structure is most widely used.In method wherein based on geometric attribute and design feature, by text
The structural analysis of stroke is combined with color assignment, and non-textual region is eliminated in the methods of wiping out background interference etc..This kind of side
Method is all simple and is easy to carry out, but the effect when disturbing factor is excessively complicated is not very good.
With the continuous development of research method, start the recurrent neural network for proposing that there is long-term short-term memory (LSTM)
(RNN) to be identified by the feature that CNN is extracted from entire picture, this method be pass through explore contextual information and avoid by
Mistake caused by dividing.There are also proposing that removing algorithm according to node is decomposed into simple assemblies for complex assemblies, then non-intersecting
Skeleton combination on spliced corresponding text filed to obtain, finally use the categorizing system based on HMM to carry out false positive
It filters out.Although these method accuracys rate are higher, carry out excessively complexity, and do not have wide applicability.
Summary of the invention
Non-textual technical problem to be solved by the invention is to provide the text location based on spectrum analysis and SVM filters out
Method solves the problems, such as that current localization method effect is undesirable, improves the accuracy rate of text location.
Used technical solution is the present invention to solve above-mentioned technical problem: the text based on spectrum analysis and SVM is fixed
The non-textual filtering method of position, comprising the following steps:
Step 1 is stored in file building data set use to the picture for needing to carry out text location;
Step 2 carries out Primary Location using the picture that the method for candidate edge recombination and marginal classification stores step 1, obtains
Primary Location region;
Step 3, by Primary Location region binaryzation, and Gray Projection in the vertical direction;
Step 4, by Fast Fourier Transform (FFT) by the one-dimension array obtained after step 3 Gray Projection be transformed into frequency domain obtain just
The spectrogram for walking localization region, analyzes the characteristics of spectrogram;
Step 5 carries out non-textual filter out to Primary Location region using SVM classifier;
Step 6 calculates locating accuracy.
The specific of Gray Projection is carried out to Primary Location region binaryzation in step 3 of the present invention and on numerical value direction
Step are as follows:
The Primary Location area image that step 2 obtains is converted into the gray level image convenient for image procossing by step 3.1;
Step 3.2 carries out noise-filtering to the gray level image that step 3.1 obtains;
Step 3.3 is analyzed and processed the processed gray level image of step 3.2 to obtain its global threshold T, then by grayscale image
Pixel of the gray value less than T value of the point after threshold process is set as 1 as in, otherwise the value of the point is set as after threshold process
0;
Step 3.4, the projection that the binary image obtained after step 3.3 processing is made to vertical direction, the black to add up on each column
Pixel obtains its number.
Frequency domain is converted it to by Fast Fourier Transform (FFT) in step 4 of the present invention, the characteristics of to spectrogram into
Row analysis method particularly includes:
The one-dimension array obtained after Gray Projection on vertical direction is carried out Fast Fourier Transform (FFT) by step 4.1, makes rows of week
Phase signal is transformed into frequency domain and obtains the spectrogram in Primary Location region;
Step 4.2 obtains the spectrogram in whole Primary Location regions by above step, including text filed spectrogram
With non-textual region spectrogram;
Spectrogram and corresponding Primary Location region are put on serial number by step 4.3, and a large amount of spectrogram is analyzed and summarized
Non-textual and text difference characteristic parameters of spectra can be most represented out.
The non-textual specific method filtered out is carried out to Primary Location region using SVM classifier in step 5 of the present invention
Are as follows:
Step 5.1, selected SVM classifier are as training tool;
Amplitude corresponding to frequency and frequency 12 arrive the magnitude peak between frequency 31 in step 5.2, selection frequency 2 to frequency n
It is inputted as input feature vector value;
Step 5.3 sets the output of SVM as 2 one-dimension arrays;
Step 5.4, the functional relation for being determined by experiment output result.
The beneficial effects of the present invention are: present invention effectively removes leave after existing text location method Primary Location
Non-legible region improves the accuracy rate of text location;This method applicability is than wide, not only to picture in news web page
Non-legible filtration result is fine, also has preferable promotion to the non-legible filtration result of the picture in natural scene.
Detailed description of the invention
Fig. 1 is the flow chart of non-textual filtering method according to the present invention;
Fig. 2 is the original image in Primary Location region of the present invention;
Fig. 3 is the present invention image later to Primary Location area grayscaleization;
Fig. 4 is grayscale image of the present invention in vertical direction projection;
Fig. 5 is non-textual region of the invention and corresponding spectrogram;
Fig. 6 is the text filed and corresponding spectrogram of the present invention;
Fig. 7 is non-textual region of the invention and corresponding output result figure group;
Fig. 8 is the text filed and corresponding output result figure group of the present invention.
Specific embodiment
A specific embodiment of the invention (embodiment) is described with reference to the accompanying drawings of the specification, makes the skill of this field
Art personnel better understood when the present invention.
The non-textual filtering method of text location based on spectrum analysis and SVM, comprising the following steps:
Step S1: it carries out encoding automatic refreshing Website page by Python and intercepts news picture automatically, obtain a certain amount of
Picture is as first data set.The racing data collection (ICDAR2011) of downloading natural scene picture, which is stored in file, to be made
For second data set.Two datasets are carried out respectively it is non-legible filter out, to obtain in news web page picture and natural field
The non-legible filtration result of picture in scape.
Step S2: recombinating picture using candidate edge and the method for marginal classification carries out Primary Location, and it is tentatively fixed to obtain
Position region.
Step S3: by the grayscale image binaryzation in Primary Location region, and Gray Projection in the vertical direction, specifically include with
Lower step:
Step S31: Primary Location region is carried out first, in accordance with formula Gray=R × 0.299+G × 0.587+B × 0.114
The color image being collected into is converted into the gray level image convenient for image procossing by the color image gray processing of news or natural scene
As shown in Figure 3.
Step S32: since the unnecessary information in gray level image will affect the corresponding spectrogram knot in Primary Location region
Fruit, so to carry out noise-filtering to gray level image.
Step S33: setting global threshold T, then carry out thresholding to original image: all gray values are less than the pixel of T,
The value of the point is 1 after threshold process;Conversely, all gray values are greater than the pixel of T, the value of the point is 0 after threshold process.
Step S34: the width of the bianry image in Primary Location region is set as W, a height of H, we make vertical direction to the region
Projection, formula are as follows:
V (j) indicates that jth arranges the sum of upper black pixel point in formula, and NewPixel (i, j) indicates the value of each pixel, if
The point is white then NewPixel (i, j)=0, and NewPixel (i, j)=1:j indicates pixel place if the point is black
Column, and 0≤j < W:i indicates the row where the pixel.The grayscale image of vertical direction projection is as shown in Figure 4.
Step S4: converting it to frequency domain by Fast Fourier Transform (FFT), analyzes the characteristics of spectrogram, specifically
Method are as follows:
Step S41: the one-dimension array obtained after Gray Projection on vertical direction is subjected to Fast Fourier Transform (FFT), makes rows of week
Phase signal is transformed into frequency domain and obtains the spectrogram in Primary Location region.
Step S42: obtaining the spectrogram in whole Primary Location regions by above step, including non-textual region and
Its spectrogram (as shown in Figure 5) and the text filed spectrogram (as shown in Figure 6) with it.
Step S43: analyzing the spectrogram in a large amount of non-textual region, and a large amount of text filed spectrogram,
It was found that frequency text filed and non-textual difference from 0 to 1 is little;Have from frequency 2 to amplitude frequency 30 more apparent floating
Dynamic, the frequency for summing up letter signal is approximately equal to the physical width of text.
Step S431: making marks to Primary Location region, and true positives zone marker is (1,0), by false positive zone marker
For (0,1), Experimental comparison sums up the characteristic parameters of spectra that can most represent false positive and non-false positive difference.
Step S5: use SVM classifier as training tool, selecting frequency 2 to frequency n, (the two of average text width arrive
Three times) amplitude and frequency 12 corresponding to interior frequency to the magnitude peak between frequency 31 be also used as input feature vector value to input, and it is defeated
Out it is 2 one-dimension arrays:
(y11,y21,…,yi1), (y12,y22,…,yi2);
The corresponding output result in text filed and non-textual region is respectively such as Fig. 7, shown in Fig. 8.It is determined by experiment output result
Functional relation, such as export result meet wi=yi1-yi2 > Δ t be it is text filed, meet wi=yi1-y21≤Δ t is then non-
It is text filed.
Step S6: accuracy rate, recall rate and F measure formulas after calculating non-textual filter out are respectively as follows:
P in formula, R, F are accuracy rate, recall rate and F value respectively;n1It is the quantity for the text block being correctly detecting, m1It is non-text
The quantity of this block, m2It is the quantity for losing text block.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811281682.2A CN109284751A (en) | 2018-10-31 | 2018-10-31 | A non-text filtering method based on spectral analysis and SVM for text localization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811281682.2A CN109284751A (en) | 2018-10-31 | 2018-10-31 | A non-text filtering method based on spectral analysis and SVM for text localization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109284751A true CN109284751A (en) | 2019-01-29 |
Family
ID=65174664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811281682.2A Pending CN109284751A (en) | 2018-10-31 | 2018-10-31 | A non-text filtering method based on spectral analysis and SVM for text localization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284751A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000060673A (en) * | 1999-03-18 | 2000-10-16 | 이준환 | Method of extracting caption regions and recognizing character from compressed news video image |
CN102144236A (en) * | 2008-09-03 | 2011-08-03 | 索尼公司 | Text localization for image and video OCR |
CN102208023A (en) * | 2011-01-23 | 2011-10-05 | 浙江大学 | Method for recognizing and designing video captions based on edge information and distribution entropy |
CN102547147A (en) * | 2011-12-28 | 2012-07-04 | 上海聚力传媒技术有限公司 | Method for realizing enhancement processing for subtitle texts in video images and device |
CN103136523A (en) * | 2012-11-29 | 2013-06-05 | 浙江大学 | Arbitrary direction text line detection method in natural image |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
CN107145888A (en) * | 2017-05-17 | 2017-09-08 | 重庆邮电大学 | Method for real-time translation of video subtitles |
CN107945125A (en) * | 2017-11-17 | 2018-04-20 | 福州大学 | It is a kind of to merge spectrum estimation method and the fuzzy image processing method of convolutional neural networks |
CN108241874A (en) * | 2018-02-13 | 2018-07-03 | 河南科技大学 | Video Text Region Location Method Based on BP Neural Network and Spectrum Analysis |
-
2018
- 2018-10-31 CN CN201811281682.2A patent/CN109284751A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000060673A (en) * | 1999-03-18 | 2000-10-16 | 이준환 | Method of extracting caption regions and recognizing character from compressed news video image |
CN102144236A (en) * | 2008-09-03 | 2011-08-03 | 索尼公司 | Text localization for image and video OCR |
CN102208023A (en) * | 2011-01-23 | 2011-10-05 | 浙江大学 | Method for recognizing and designing video captions based on edge information and distribution entropy |
CN102547147A (en) * | 2011-12-28 | 2012-07-04 | 上海聚力传媒技术有限公司 | Method for realizing enhancement processing for subtitle texts in video images and device |
CN103136523A (en) * | 2012-11-29 | 2013-06-05 | 浙江大学 | Arbitrary direction text line detection method in natural image |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
CN107145888A (en) * | 2017-05-17 | 2017-09-08 | 重庆邮电大学 | Method for real-time translation of video subtitles |
CN107945125A (en) * | 2017-11-17 | 2018-04-20 | 福州大学 | It is a kind of to merge spectrum estimation method and the fuzzy image processing method of convolutional neural networks |
CN108241874A (en) * | 2018-02-13 | 2018-07-03 | 河南科技大学 | Video Text Region Location Method Based on BP Neural Network and Spectrum Analysis |
Non-Patent Citations (5)
Title |
---|
CHONG YU ET AL: "Text detection and recognition in natural scene with edge analysis", 《IET COMPUTER VISION》 * |
MOHAMMAD KHODADADI AZADBONI ET AL: "Text detection and character extraction in color images using FFT domain filtering and SVM classification", 《6TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST)》 * |
PALAIAHNAKOTE SHIVAKUMARA ET AL: "New Fourier-Statistical Features in RGB Space for Video Text Detection", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 * |
傅泽田等: "《面向移动终端的农业信息智能获取》", 30 September 2015 * |
孙红星等: "基于小波变换和SVM的文本区域定位", 《东北大学学报(自然科学版)》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Peyrard et al. | ICDAR2015 competition on text image super-resolution | |
WO2018145470A1 (en) | Image detection method and device | |
CN106228159A (en) | A kind of gauge table meter copying device based on image recognition and method thereof | |
CN106548169B (en) | Fuzzy literal Enhancement Method and device based on deep neural network | |
CN112232371B (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN105718866B (en) | A kind of detection of sensation target and recognition methods | |
CN112862849B (en) | Image segmentation and full convolution neural network-based field rice ear counting method | |
Hati et al. | Plant recognition from leaf image through artificial neural network | |
CN109740639A (en) | A method, system and electronic device for detecting cloud in remote sensing image of Fengyun satellite | |
CN105335716A (en) | Improved UDN joint-feature extraction-based pedestrian detection method | |
CN111738048B (en) | Pedestrian re-identification method | |
CN110334703B (en) | A method for ship detection and recognition in day and night images | |
CN104794479A (en) | Method for detecting text in natural scene picture based on local width change of strokes | |
CN108805102A (en) | A kind of video caption detection and recognition methods and system based on deep learning | |
CN108509950B (en) | Railway contact net support number plate detection and identification method based on probability feature weighted fusion | |
Ma et al. | Towards improved accuracy of UAV-based wheat ears counting: A transfer learning method of the ground-based fully convolutional network | |
CN111507967A (en) | A high-precision detection method for mangoes in a natural orchard scene | |
CN111260645A (en) | Tampered image detection method and system based on deep learning of block classification | |
CN115082776A (en) | Electric energy meter automatic detection system and method based on image recognition | |
Feng et al. | A novel saliency detection method for wild animal monitoring images with WMSN | |
CN115578364A (en) | Weak target detection method and system based on mixed attention and harmonic factor | |
CN111414855B (en) | Telegraph pole sign target detection and identification method based on end-to-end regression model | |
CN109460767A (en) | Rule-based convex print bank card number segmentation and recognition methods | |
CN116188943A (en) | Solar radio spectrum burst information detection method and device | |
CN102915449A (en) | Photo classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190129 |
|
RJ01 | Rejection of invention patent application after publication |