CN107133929A

CN107133929A - Low quality file and picture binary coding method based on background estimating and energy minimization

Info

Publication number: CN107133929A
Application number: CN201710289747.7A
Authority: CN
Inventors: 熊炜; 徐晶晶; 李敏; 熊子婕; 王改华; 刘敏; 赵楠; 王鑫睿; 冯川
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2017-09-05
Anticipated expiration: 2037-04-27
Also published as: CN107133929B

Abstract

The invention discloses a kind of low quality file and picture binary coding method based on background estimating and energy minimization, first color document images are carried out with gray scale pretreatment, noise reduction process, image background estimation, background subtraction and image enhaucament is carried out to image using bilateral filtering, constructs energy function, tectonic network figure, finally use the figure based on augmenting path to cut the minimum that algorithm realizes energy function.Invention significantly improves the document image binaryzation effect under complex background, the document image binaryzation that can have spot or the low complex background of texture, uneven illumination, contrast suitable for multiple color writing, stroke gradual change, ink marks infiltration, the page is handled.

Description

Low-quality document image binarization method based on background estimation and energy minimization

Technical Field

The invention belongs to the technical field of digital image processing, pattern recognition and machine learning, and particularly relates to a low-quality document image binarization method based on background estimation and energy minimization.

Background

Document Analysis and Recognition (DAR) technology has been widely applied to the fields of ancient book digitization, layout analysis and character recognition, video subtitle extraction, text information retrieval and the like, and mainly comprises the processes of image acquisition, binaryzation, skew correction, character segmentation and recognition and the like. Image binarization is one of key preprocessing links, and is to convert a gray level image into a binary image so as to realize the separation of a character foreground from a document background. The performance of the whole DAR system is directly influenced by the effect of the binarization algorithm, so that in recent years, a plurality of scholars research the DAR system and put forward a plurality of algorithms; however, binarization of low quality document images remains a challenge due to factors such as poor image contrast, ink saturation, page smearing, or illumination non-uniformity.

The binarization algorithm can be roughly classified into a global threshold method and a local threshold method. The global thresholding method adopts a single threshold to divide a document image into two categories of characters (foreground) and background, for example, an Otsu algorithm selects an optimal threshold by using a gray histogram of the image, so that the inter-category variance of pixels of the foreground and the background after threshold segmentation is maximum. The global thresholding method has a good segmentation effect on images with large foreground and background differences, i.e. histograms with significant bimodal features, but some or even all foreground details are lost when processing low-quality document images.

The local thresholding method (also called adaptive thresholding method) is to set different thresholds on different parts of the image by convolution of the sliding window and the document image, for example, algorithms such as Niblack, Sauvola, Wolf and the like use the gray mean and variance in the neighborhood of pixels to construct a threshold segmentation curved surface, and the performance of the algorithm depends on the size of the sliding window, the thickness of character strokes and the like. Dynamically adjusting the window size aiming at the document images with different qualities to obtain the optimal threshold processing result; when the image contrast is low, a large number of noise points are generated or erroneous judgment is caused.

In addition, researchers at home and abroad also propose a plurality of more complex algorithms, such as a local contrast method, a background estimation and stroke edge detection method, a Laplace energy method, a convolutional neural network method and the like. However, none of the above methods can solve well for image binarization in complex document backgrounds such as low contrast, ink saturation, gradient illumination, smudges and textures.

Disclosure of Invention

In order to solve the technical problems, the invention provides a low-quality document image binarization method based on background estimation and energy minimization, which obviously improves the document image binarization effect under a complex background and can be suitable for document image binarization processing of complex backgrounds such as multi-color writing, stroke gradual change, ink mark infiltration, dirty or texture on pages, uneven illumination, low contrast and the like.

The technical scheme adopted by the invention is as follows: a low-quality document image binarization method based on background estimation and energy minimization is characterized by comprising the following steps:

step 1: carrying out gray level pretreatment on the color document image;

step 2: performing noise reduction processing on the image by adopting bilateral filtering;

and step 3: the image background estimation specifically comprises the following substeps:

step 3.1: performing stroke width transformation on the image processed in the step 2;

step 3.2: calculating a simulation distance and an imaging height;

step 3.3: weakening dark features in the document image through two morphological closing operations aiming at the image processed in the step 2;

step 3.4: combining the results of the step 3.2 and the step 3.3 to perform down sampling and up sampling on the image;

and 4, step 4: the background subtraction and image enhancement specifically comprise the following sub-steps:

step 4.1: background subtraction;

calculating an absolute difference value between the bilateral filtering image in the step 2 and the background estimation image in the step 3, wherein a pixel point with zero gray level in the difference image belongs to a high-confidence background pixel point, and the gray value of the pixel point is set to be 255;

step 4.2: histogram equalization;

negating non-zero pixel points in the background subtraction image to obtain a gray value corresponding to the point, and then performing histogram equalization on the whole image to increase the contrast ratio of the foreground and the background of the image;

and 5: constructing an energy function;

step 6: constructing a network diagram;

and 7: and (4) realizing the minimization of the energy function by adopting an image cutting algorithm based on the augmented path.

Compared with the existing algorithm, the method has the remarkable advantages that:

(1) the invention adopts the minimum mean value method to carry out gray scale pretreatment on the color document image, and the obtained gray scale image has color independence, thereby not only increasing the contrast between foreground pixels and background pixels, but also reducing the gray scale variance between the foreground pixels;

(2) the invention adopts a nonlinear bilateral filtering algorithm to realize the image noise reduction treatment, and simultaneously considers the spatial proximity and the gray similarity of the image, thereby achieving the purpose of edge protection and noise reduction;

(3) the stroke width in the document image is estimated by adopting a stroke width transformation method, and the method has the advantages that the stroke characteristics basically belong to unique characteristics of characters (certainly, the interference of certain degradation factors is not eliminated, and the subsequent operation is required to be eliminated), and the method has universality on texts of different languages;

(4) based on a visual sensitivity test model, the method adopts morphological closed operation to realize image background estimation, and performs histogram equalization on a background subtraction image, thereby effectively inhibiting the influence of degradation factors and enhancing the local contrast of the image;

(5) the invention realizes the document image binaryzation based on the maximum flow/minimum cut combined optimization algorithm, and the graph cut algorithm has strong universality, high feasibility and high running speed (close to real-time performance), and is suitable for various degraded low-quality document images.

Drawings

FIG. 1: is a flow chart of an embodiment of the invention;

FIG. 2: the angle resolution of the vision test model of the embodiment of the invention is shown schematically.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

The main idea of the invention is as follows: when the target image is far away from an observer, the observed detail (stroke) information of the target image is less and less, but the perceived background gray scale and depth are not influenced by the distance, so that the approximate background of the image can be estimated by simulating the scene of the remotely observed image, an energy function is constructed for the image after the estimated background is removed, and image binarization is realized by adopting a graph cut algorithm.

Referring to fig. 1, the method for binarizing a low-quality document image based on background estimation and energy minimization provided by the present invention includes the following steps:

step 1: graying the minimum mean value;

the invention adopts a minimum mean value method to carry out gray level preprocessing on a color document image f (x, y), and the specific calculation formula is as follows:

wherein f is_i(x, y) are R, G, B color component images, respectively, f_gray(x, y) is the transformed grayscale image.

The obtained gray level image has color independence, namely, in the gray level image, the contrast between the foreground pixels and the background pixels is high, and meanwhile, the gray level difference between the foreground pixels is small.

Step 2: carrying out bilateral filtering and denoising;

the invention adopts a nonlinear bilateral filtering algorithm to carry out image noise reduction processing, and the output pixel value of the nonlinear bilateral filtering algorithmDepending on the weighted combination of the pixel values f (k, l) in the neighborhood S, the specific calculation formula is:

wherein the weighting factors w (i, j, k, l) depend on the domain kernelSum-value domain kernelProduct of, i.e. Andrespectively representing a gaussian distance variance and a gaussian gray variance.

Because the bilateral filter considers the spatial proximity and the gray similarity of the image at the same time, the purpose of edge-preserving and denoising can be achieved.

And step 3: estimating the background of the image;

step 3.1 Stroke Width Transformation (SWT): adopting Canny operator to carry out edge detection on the gray image after bilateral filtering, searching another edge pixel point q corresponding to each edge pixel point p according to the gradient direction of the edge pixel point p, wherein the Euclidean distance between the two points is p-q i, i.e. stroke width estimation of all pixel points on a [ p, q ] path, and unless the pixel point is assigned with a smaller width value, the stroke width SWE of the image is mathematical expectation of stroke width estimation of all non-zero pixel points, and the specific calculation formula is as follows:

wherein n is the total number of non-zero-value pixel points in the stroke width transformation output image s (x, y).

Step 3.2, calculating the simulation distance and the imaging height: based on the visual acuity test model, the smallest resolution angle (angle of 1') of the human eye can be perceived as the smallest image, as shown in fig. 2. Because the contrast of the low-quality document image is usually lower than that of the binary image on the visual chart, the minimum visual angle of the corresponding target is usually larger than that of the visual test, and the thicker the stroke of the image is, the farther the observation distance required by the stroke details cannot be sensed, the resolution angle corresponding to the stroke width of the document image is assumed to be 3', and the simulated observation distance d is determined according to the stroke width estimated in the step 3.1₀The specific calculation formula is as follows:

d₀＝SWE×cotθ，

where θ is the observation resolution angle, here the 3' viewing angle.

Because the crystalline lens of the human eye is similar to a convex lens, the distance d between the crystalline lens and the target image can be obtained according to the lens imaging rule and the focal length equation₀Height h of image on retina_iThe specific calculation formula is as follows:

wherein f is the distance between the human lens and the retina, namely the focal length of the lens (about 17mm), h₀Is the original height of the target image.

Step 3.3 morphological closing operation: dark features (character strokes) in the document image are weakened through two morphological closing operations, and circular structural elements are adopted in the two closing operations. The diameter of the first-time structural element is set as the stroke width of the image, and the diameter of the second-time structural element is 12 pixels larger than the stroke width of the image.

Step 3.4, image down-sampling and up-sampling: distance to the target image is d₀The height of the observed image is h_iTherefore, the image after the morphological closing operation is scaled to h by bilinear down-sampling_iA height; and then restoring the zoomed image to the original size by adopting a bilinear interpolation method, wherein the obtained image is the estimated background image. When the image is zoomed, the image aspect ratio is kept unchanged.

And 4, step 4: background subtraction and image enhancement;

step 4.1 background subtraction: and calculating the absolute difference value between the bilateral filtering image and the background estimation image, wherein a pixel point with zero gray level in the difference image belongs to a high-confidence background pixel point, and the gray value of the pixel point is set to be 255 (white).

Step 4.2 histogram equalization: and (3) negating non-zero pixel points in the background subtraction image to obtain a gray value corresponding to the point, and then carrying out histogram equalization on the whole image to increase the contrast ratio of the foreground and the background of the image.

And 5: constructing an energy function;

the specific form of the laplace energy function is:

wherein the data item represents the cost of assigning a label to the pixel, e.g.Is referred to as a pixel p_ijA cost assigned to tag 0 (1); the boundary term represents the cost of discontinuity of adjacent pixels, namely the cost of endowing two adjacent pixels with different labels.

The Laplace transformation of the image can reflect the place where the gray level of the image changes suddenly, and when the Laplace value sign of a certain pixel point in the image is positive, the corresponding pixel point is generally positioned at the trough (dark) of the gray level image; on the contrary, when the laplacian value sign of a certain pixel point of the image is negative, the corresponding pixel point is located at the peak (bright) of the gray scale image. Therefore, the data items defining the laplace energy function of the present invention are specifically expressed as:

wherein,representing a pixel p_ijThe laplace value of (d);

the boundary items can be divided into boundary items in the horizontal directionAnd boundary item in vertical directionThe invention adopts a Canny edge detection operator to determine the boundary item, the probability of discontinuity of pixels near the edge is high, and the discontinuity cost between the pixels at two sides of the edge can be directly set to be zero, which is specifically expressed as:

wherein E is_ijRepresenting a pixel point p_ijEdge detection result of (1)_ijRepresenting a pixel p_ijThe gray value of (c) is an arbitrary constant: (>0)。

Step 6: constructing a network diagram;

each pixel point p of the image_ijAn intermediate node forming a network graph is added with two further terminal nodes s and t. The edge connecting the middle nodes is called nlink, and the weight of the nlink is determined by the boundary term of the energy function; the edge connecting the intermediate node and the terminal node is called tlink, and the weight value of the edge is determined by the data item of the energy function. Side (p)_ijS) weight ofSide (p)_ijT) a weight ofSide (p)_ij,p_i+1,j) Has a weight value ofSide (p)_ij,p_i,j+1) Has a weight value of

And 7: minimizing an energy function by adopting an image cutting algorithm based on an augmented path;

two search trees S and T are established based on a network graph, root nodes of the search trees are respectively positioned at a source point S and a sink point T, and the nodes of the search trees are divided into two types: the tree comprises active nodes and passive nodes, wherein the active nodes can expand free nodes into active nodes through unsaturated edges, and tree growth is achieved.

Step 7.1 growth phase: two trees grow continuously until active nodes of the two trees meet, and a path from a source point to a sink point is found;

step 7.2 augmentation stage: the path obtained in the step 7.1 is augmented, at least one saturated edge is formed by the augmentation, the child nodes connected with the edge become isolated nodes, and the trees S and T are split into a plurality of subtrees;

step 7.3, a harvesting stage: and finding a parent node for each isolated node, and if the parent node which meets the condition does not exist, changing the parent node into a free node until all the isolated nodes are processed.

And repeatedly executing the three steps until the two trees do not grow any more and are separated by the saturated edge, so that the minimum cut of the graph, namely the minimum value of the energy function, is obtained, and the final binarization of the image is realized.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A low-quality document image binarization method based on background estimation and energy minimization is characterized by comprising the following steps:

step 1: carrying out gray level pretreatment on the color document image;

step 3.2: calculating a simulation distance and an imaging height;

step 4.1: background subtraction;

step 4.2: histogram equalization;

and 5: constructing an energy function;

step 6: constructing a network diagram;

2. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 1, performing gray scale preprocessing on the color document image f (x, y) by using a minimum mean value method, wherein a preprocessing formula is as follows:

<mrow> <msub> <mi>f</mi> <mrow> <mi>g</mi> <mi>r</mi> <mi>a</mi> <mi>y</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&lsqb;</mo> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>i</mi> </munder> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>(</mo> <mrow> <mi>x</mi> <mo>,</mo> <mi>y</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mn>3</mn> </mfrac> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>f</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>,</mo> </mrow>

3. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 2, a nonlinear bilateral filtering algorithm is adopted to perform image noise reduction processing, and pixel values are outputDepending on the weighted combination of the pixel values f (k, l) in the neighborhood S, the specific calculation formula is:

<mrow> <mover> <mi>f</mi> <mo>^</mo> </mover> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> <mo>&Element;</mo> <mi>S</mi> </mrow> </munder> <mi>f</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> <mi>w</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mo>&Sigma;</mo> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> <mo>&Element;</mo> <mi>S</mi> </mrow> </munder> <mi>w</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>,</mo> <mi>l</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow>

4. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.1, Canny operator is adopted to perform edge detection on the gray image after bilateral filtering, and each edge pixel point p is searched for another edge pixel point q corresponding to the edge pixel point p according to the gradient direction of the edge pixel point p, the Euclidean distance between two points is | | | p-q | | | which is stroke width estimation of all pixel points on a [ p, q ] path, unless the pixel point is assigned with a smaller width value, the stroke width SWE of the image is mathematical expectation of stroke width estimation of all non-zero pixel points, and the specific calculation formula is as follows:

<mrow> <mi>S</mi> <mi>W</mi> <mi>E</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&NotEqual;</mo> <mn>0</mn> </mrow> </munder> <mi>s</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

5. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.2, the simulated observation distance d is determined according to the stroke width SWE estimated in step 3.1₀The specific calculation formula is as follows:

d₀＝SWE×cotθ，

wherein theta is an observation resolution angle;

obtaining d from the target image according to the lens imaging rule and the focal length equation₀Height h of image on retina_iThe specific calculation formula is as follows:

wherein f is the distance between the human lens and the retina, namely the focal length of the lens, h₀Is the original height of the target image.

6. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.3, circular structural elements are adopted in both closing operations; the diameter of the first time structural element is set to be the stroke width of the image, and the diameter of the second time structural element is 12 pixels larger than the stroke width of the image.

7. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.4, the distance to the target image is d₀The height of the observed image is h_iTherefore, the image after the morphological closing operation is scaled to h by bilinear down-sampling_iA height; then, restoring the zoomed image to the original size by adopting a bilinear interpolation method, wherein the obtained image is the estimated background image; when the image is zoomed, the image aspect ratio is kept unchanged.

8. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 5, the specific form of the laplace energy function is:

wherein the data items represent the cost of assigning a label to a pixel,is referred to as a pixel p_ijThe cost assigned to label 0/1;representing a pixel p_ijThe laplace value of (d); the boundary term represents the discontinuous cost of adjacent pixels, namely the cost when two adjacent pixels are endowed with different labels; the boundary items being divided into horizontally oriented boundary itemsAnd boundary item in vertical directionE_ijRepresenting a pixel point p_ijEdge detection result of (1)_ijRepresenting a pixel p_ijC is an arbitrary constant, c is>0。

9. The binarization method for low-quality document images based on background estimation and energy minimization as claimed in claim 8, wherein the specific implementation procedure of step 6 is as follows: each pixel point p of the image_ijIntermediate nodes forming a network diagram, two additional terminalsNodes s and t; the edge connecting the middle nodes is called nlink, and the weight of the nlink is determined by the boundary term of the energy function; the edge connecting the intermediate node and the terminal node is called as tlink, and the weight value of the tlink is determined by the data item of the energy function; side (p)_ijS) weight ofSide (p)_ijT) a weight ofSide (p)_ij,p_i+1,j) Has a weight value ofSide (p)_ij,p_i,j+1) Has a weight value of

10. The binarization method for low-quality document images based on background estimation and energy minimization according to any one of claims 1-9, characterized in that the specific implementation process of step 7 is as follows: two search trees S and T are established based on a network graph, root nodes of the search trees are respectively positioned at a source point S and a sink point T, and the nodes of the search trees are divided into two types: the tree comprises active nodes and passive nodes, wherein the active nodes can expand free nodes into active nodes from unsaturated edges to realize tree growth;

step 7.1: a growth stage;

two trees grow continuously until active nodes of the two trees meet, and a path from a source point to a sink point is found;

step 7.2, an augmentation stage;

the path obtained in the step 7.1 is augmented, at least one saturated edge is formed by the augmentation, the child nodes connected with the edge become isolated nodes, and the trees S and T are split into a plurality of subtrees;

step 7.3: a harvesting stage;

finding a father node for each isolated node, if no father node meeting the condition exists, changing the father node into a free node until all the isolated nodes are processed;

step 7.4: and repeatedly executing the three steps until the two trees do not grow any more and are separated by the saturated edge, so that the minimum cut of the graph, namely the minimum value of the energy function, is obtained, and the final binarization of the image is realized.