CN114202766A - Method and device for extracting text field and electronic equipment - Google Patents
Method and device for extracting text field and electronic equipment Download PDFInfo
- Publication number
- CN114202766A CN114202766A CN202111428606.1A CN202111428606A CN114202766A CN 114202766 A CN114202766 A CN 114202766A CN 202111428606 A CN202111428606 A CN 202111428606A CN 114202766 A CN114202766 A CN 114202766A
- Authority
- CN
- China
- Prior art keywords
- image
- text field
- target
- text
- original image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 238000001514 detection method Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 description 20
- 238000001914 filtration Methods 0.000 description 17
- 238000013461 design Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000012937 correction Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 229910052704 radon Inorganic materials 0.000 description 1
- SYUHGPGVQRZVTB-UHFFFAOYSA-N radon atom Chemical compound [Rn] SYUHGPGVQRZVTB-UHFFFAOYSA-N 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Character Input (AREA)
Abstract
The method comprises the steps of obtaining an original image containing text fields, determining each target image area in the original image, then carrying out text recognition on the text fields in each target image area to obtain the text fields corresponding to each target image area, and then extracting the target text fields meeting business requirements from the text fields according to preset extraction rules. Based on the method, the extraction of the target text field in the tax finalization certification image is realized, the problem that the target text field meeting the business requirement in the tax finalization certification image cannot be extracted in the prior art is solved, and the accuracy rate of extracting the target text field is effectively improved.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a method, an apparatus, and an electronic device for extracting a text field.
Background
With the development of image processing technology, text fields in certification images such as tax completion certificates can be identified by image processing technologies such as HOG (Histogram of Oriented Gradient), LBP (Local Binary Patterns), and the like. However, this method is specifically to extract all text fields in the tax finalization image, and cannot extract only the target text field satisfying the business requirements in the tax finalization image.
Disclosure of Invention
The application provides a method and a device for extracting a text field and electronic equipment, which are used for extracting a target text field in a tax finalization image, solving the problem that the target text field meeting business requirements in the tax finalization image cannot be extracted in the prior art, and effectively improving the accuracy of extracting the target text field in the tax finalization image.
In a first aspect, the present application provides a method for extracting a text field, the method comprising:
acquiring an original image containing a text field, and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, obtaining an original image containing a text field includes: the method comprises the steps of obtaining an image to be processed containing a text field, rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, projecting the text field in the rotating images in a given direction, superposing the projections of the rotating images in the given direction to obtain projection values of the rotating images, determining N projection values corresponding to the N rotating images, selecting the rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, after obtaining the original image containing the text field, the method further includes: dividing an original image into a plurality of image blocks, calculating Euclidean distance between the two image blocks, determining the image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks, identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image, and denoising each similar area in the original image to obtain the original image subjected to denoising.
In one possible design, determining each target image region in the original image includes: based on the target detection model, the image characteristics in the original image are extracted, and then each target image area in the original image is determined according to the image characteristics.
In one possible design, extracting a target text field satisfying a business requirement from the text fields according to a preset extraction rule includes: acquiring an incidence relation between a target text field and a text field based on a preset database, and extracting the target text field meeting the service requirement from the text field in the original image according to the incidence relation.
In one possible design, after extracting a target text field satisfying a business requirement from the text field, the method further includes: and sending the target text field to a front-end display interface for display.
By the method, the target text field can be extracted, the problem that the target text field meeting the business requirement in the tax certified image cannot be extracted in the prior art is solved, and the following technical effects can be achieved:
1. the method comprises the steps of correcting an original image to be in a set state by performing image preprocessing operation on the original image and based on angle correction and noise filtering on the original image, filtering noise pixels existing in the original image, restoring image information of the original image to the maximum extent, and facilitating improvement of accuracy of determining each target image area in the original image;
2. the method comprises the steps of performing image detection on an original image, realizing end-to-end rapid text field detection by a modeling regression task based on a target detection model, realizing lightweight feature calculation through a small convolution kernel, realizing text box prediction through a full connection layer, and quickly realizing detection of a target image area in the image;
3. through a text recognition mode, a CTC model is introduced to combine pixel information with a time sequence relation, so that word content in an image is aligned, and through one-to-one corresponding conversion of 'image- > word', recognition of a text field corresponding to a target image area is realized, and the recognition accuracy is improved;
4. a coordinate analysis method is specifically introduced based on the processing of target field retrieval, line text alignment and multi-line text combination through a preset extraction rule so as to ensure the accuracy of the output target text field.
In a second aspect, the present application provides an apparatus for extracting a text field, the apparatus comprising:
a module for determining a target image area, which is used for acquiring an original image containing a text field and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
a text field identification module for performing text identification on the text field in each target image area to obtain the text field corresponding to each target image area;
and the target text field extraction module is used for extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, the target image area determining module is specifically configured to obtain an image to be processed including a text field; rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1; projecting the text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain the projection values of the rotating image, and determining N projection values corresponding to the N rotating images; and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, the target image area determining module is specifically configured to divide the original image into a plurality of image blocks, where the image blocks represent a partial image of the original image; calculating a Euclidean distance between two image blocks, and determining the two image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks; identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image; and denoising each similar region in the original image to obtain the denoised original image.
In one possible design, the module for determining a target image region is specifically configured to extract image features in the original image based on a target detection model; and determining each target image area in the original image according to the image characteristics.
In one possible design, the target text field extraction module is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
In one possible design, after the module for extracting the target text field is further configured to send the target text field to a front-end display interface for display.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the method for extracting the text field when executing the computer program stored in the memory.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the above-mentioned method steps of extracting a text field.
For each of the second to fourth aspects and possible technical effects of each aspect, please refer to the above description of the first aspect or the possible technical effects of each solution in the first aspect, and no repeated description is given here.
Drawings
FIG. 1 is a flow chart of a method for extracting text fields provided herein;
FIG. 2 is a schematic diagram of a projection-based angle correction algorithm provided herein;
FIG. 3 is a schematic view of an angle correction provided herein;
FIG. 4 is a schematic diagram of a network architecture of a YOLO model provided in the present application;
FIG. 5 is a schematic illustration of target detection provided herein;
FIG. 6 is a diagram illustrating extraction of a target text field according to the present application;
FIG. 7 is a diagram illustrating an output target text field provided herein;
FIG. 8 is a diagram illustrating an apparatus for extracting text fields according to the present disclosure;
fig. 9 is a schematic diagram of a structure of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied to apparatus embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
The embodiment of the application provides a method and a device for extracting a text field and electronic equipment, and solves the problem that the prior art cannot extract a target text field meeting business requirements in a tax finalization image.
It is to be understood that the preferred embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the present invention, and that the embodiments and features thereof in this application may be combined with each other without conflict.
Referring to fig. 1, an embodiment of the present application provides a method for extracting a text field, which includes the following specific processes:
step 101: acquiring an original image containing a text field, and determining each target image area in the original image;
after the original image is obtained, each target image area in the original image can be determined by performing operations of image preprocessing and image detection on the original image, where the image preprocessing can include angle correction on the original image and/or noise filtering on the original image, and the target image area is an area containing a text field in the original image.
It should be noted that the original image is not limited to the tax finalization image, and may be other certification images or any images containing text fields, and the tax finalization image is taken as an example and is specifically described below.
The specific procedures of S1 angle correction, S2 noise filtering, and S3 image detection are described below.
And S1 angle correction:
the angular correction of the original image can be implemented based on a projected angular correction algorithm. Such as Radon transform algorithm, that is, finding the angle at the maximum projection value by projection superposition in a given direction, thereby determining the tilt angle of the original image.
Specifically, the projection-based angle correction algorithm mainly applies the characteristic that the image projection is longest in the normal direction and shortest in the horizontal direction. If the original image is represented as a binary function f (x, y), the projection of the original image in a given direction can be represented as a line integral of f (x, y) in that direction. Referring to fig. 2, the projection of f (x, y) in the x direction can be represented as the line integral of f (x, y) in the vertical direction; the projection of f (x, y) in the y direction can be expressed as the line integral of f (x, y) in the horizontal direction; the projection of f (x, y) in the x 'direction can be expressed as the line integral of f (x, y) in the y' direction.
It is worth mentioning that f (x, y) can be projected along any given direction to obtain the line integral in the corresponding given direction. Taking the case of the projection of f (x, y) along the x 'direction as an example, a specific calculation formula of the line integral of f (x, y) along the y' direction can be seen as the following formula 1.
Wherein (x, y) is the original coordinate of the original image, and (x ', y') is the original coordinate (x, y)) New coordinate of rotation theta angle, Rθ(x ') is the line integral of the original image in the y' direction.
In addition, the above projection-based angle correction algorithm is a possible angle correction algorithm, and is also applicable to other correction algorithms, and is not specifically described here.
For example, whether the original image is in a set state is determined, if yes, the angle correction is not performed on the original image; if not, the original image is adjusted to a set state by using an angle correction algorithm, for example, the horizontal direction is taken as a given direction, and the set state is a state in which a text field in the original image is parallel to the horizontal direction, as shown in fig. 3, when the original image is not in the set state, the angle of the original image is adjusted so that the adjusted image is in the set state, and the adjusted image at this time is taken as the original image.
The method for correcting the angle is beneficial to improving the accuracy of determining the target image area.
S2 noise filtering:
noise filtering of the raw image can be implemented by Non-Local mean filtering (NLM) algorithm. Of course, other filtering algorithms can be used to implement noise filtering, and the process of noise filtering is described in detail below by taking the non-local mean filtering algorithm as an example.
In particular, the non-local mean filtering algorithm mainly applies non-local self-similarity of an image, wherein the non-local self-similarity is that textures or structures in non-local areas of the image have repeated characteristics, and the characteristics can be used for effectively keeping edges and details of the image. For example, in an image, image blocks having the same pixels are numerous and the noise therein is uncorrelated, and processing the image blocks can effectively remove the noise in the image.
For example, the original image is divided into a plurality of image blocks, then the euclidean distance between the image blocks is calculated to determine similar image blocks in the image blocks, the similar image blocks are identified as a similar area, one or more similar areas in the original image are determined, and then each similar area is subjected to averaging processing to achieve the purpose of removing noise in the original image.
Referring to the following formula 2, a specific calculation formula for performing noise filtering on an original image by using a non-local mean filtering algorithm is shown.
W (i, j) represents the image block similarity of the weighted average and is determined by the euclidean distance between two image blocks. Since the similarity between pixels in an image is greatly affected by the physical distance, the pixels need to be aligned for normalization, and z (i) is the distance between processed image blocks.
It should be noted that the euclidean distance is one possible way to determine similar image blocks, and is not intended to limit the embodiments of the present application.
By the noise filtering method, the accuracy of subsequent text field extraction is improved.
S3 image detection:
image detection of the original image may be implemented based on an object detection model algorithm. Such as a YOLO (young Only Look Once) model algorithm, based on which a target image region in an original image can be determined by recognizing a text field in the original image.
Specifically, a network architecture of the YOLO model can be shown in fig. 4, where the YOLO model is a one-step target detection model, and the model can model a text field detection task as a regression problem to solve the problem. The YOLO model is used in a practical application to complete the output of the position and the category from the input of the original image to the target image area based on an end-to-end network. In the YOLO model, image features in an original image are extracted based on a Convolutional Neural Network (CNN), and then a predicted probability value of a position and a category of a target image area in the original image is obtained by using a full Connected Layer (FC).
For example, referring to fig. 5, after image detection processing by the YOLO model, each target image area in the original image can be determined.
It should be noted that the above YOLO model is a possible model for target detection, and other suitable target detection models are not specifically described herein.
By the image detection method, each target image area in the original image is determined, the accuracy of determining the target image area is effectively improved, and the accuracy of subsequently extracting the text field is improved.
Step 302: performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
in the embodiment of the application, the text recognition of the text field in each target image area can be realized by adopting a 'feature extraction' and 'word prediction'. Text recognition enables corresponding processing for different text fields, i.e. converting image information in various writing forms into uniformly represented text information.
Specifically, in the stage of "feature processing", image features in the original image are extracted first by a convolutional Neural Network, and then character sequence features are extracted by a Recurrent Neural Network (RNN). Here, since the extracted image features are affected by the writing style, the size of characters, the font factor, and the like, there is a case where the same letter is predicted multiple times in the predicted text field, for example, "Hello" is predicted as "helloloooo".
Further, in the stage of "text prediction", a ctc (connect i on i st Tempora l C l ass i f i cat i on) model is introduced to perform text prediction in the embodiment of the present application, and alignment compression is performed on redundant features, for example, "Hello" of incorrect prediction is adjusted to "Hello", so as to ensure that the predicted text field corresponds to the detected target image region one to one.
By the text recognition mode, the accuracy of recognizing the text fields in the target image areas can be effectively improved.
Step 303: and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
In the embodiment of the present application, the preset extraction rule is to extract a target text field from text fields corresponding to each target image area, where the target text field may be as shown in fig. 6, and output the target text field according to a specified format, and the output target text field may be as shown in fig. 7. Here, the extraction process may include three parts, namely, target field retrieval, line text alignment, and multi-line text merging, which will be described in detail below with reference to the accompanying drawings.
First, target field retrieval: firstly, a retrieval rule of the target field is set according to the characteristics of the target field, and then the text field is retrieved and matched based on the retrieval rule.
Here, the association relationship between the target text field and the text field may be obtained based on a preset database, and then the target text field meeting the service requirement is extracted from the text fields corresponding to the target image areas according to the association relationship.
It should be noted that, in order to facilitate a more complete description of the technical solution provided by the embodiment of the present application in combination with the second and third sections, the extracted target text field is described as a target field in the following. It should be understood by those skilled in the art that the target text field can be extracted through the processing operation of the first part, and in addition, the first part can be implemented in combination with the second part or/and the third part, which is an optimized technical solution provided by the embodiments of the present application.
Specifically, for the original image of the tax completed proof, text fields of "chinese", "tax receipt", "verification code", "tax payer identification number", "20", "M", and the like may be extracted. Setting a retrieval rule of a target field according to actual business requirements, namely presetting a database, and if fields such as 'China' and 'tax' are not set in the retrieval rule as the target field, then not retrieving the text fields; only the target fields such as "validation code", "tax payer identification number", "20", "M", etc. are retrieved. After the target field is retrieved, the target field is matched according to a preset matching rule, such as a regular matching mode.
For example, a matching rule is preset according to actual business requirements, for example, the specific content corresponding to the "taxpayer identification number" is "number + capital letter"; the specific content corresponding to the "verification code" is "20 first digits", and the like, and thus matching of the target field can be achieved.
As shown in fig. 6, the text field "captcha" is matched with the text field "20 × × according to the preset matching rule, and the matching result" captcha "is obtained: and 20. the text field "taxpayer identification number" is matched with the text field "taxpayer identification number" M ", and a matching result" taxpayer identification number "is obtained: and M ", etc.
A second part: the lines are aligned, and when multiple lines of target fields are matched in the first part, the multiple lines of target fields are analyzed by introducing a coordinate analysis method, so that the matching accuracy in a multiple-line target field scene is improved.
Specifically, for the original image of the tax completion certification, a plurality of lines of matching may be performed for target fields such as "original certificate number", "tax type", "item name", "time to which tax belongs", "date of entering (returning) the bank", "amount of actually paid (returning)". In the way of rule matching, there may be a problem of multiple rows of staggered matching.
As shown in fig. 6, three "target fields" are respectively matched for "original certificate number", "tax type", "item name", "period to which tax belongs", "date of entering (returning) the bank", and "amount of actual payment (returning)". It is assumed that the first row "31" matching under the "original certificate number" is "local education addition" corresponding to "tax type", the corresponding "item name" is "value-added tax local education addition", the corresponding "time to which the tax belongs" is "2021-06-01 to 2021-06-30", the corresponding "income (withdrawal) library date" is "2021-07-08", and the corresponding "actual payment (withdrawal) amount" is "1339.00"; the second row "31" matching under the "original voucher number" is "education fee addition", the corresponding "item name" is "value-added tax education addition", the corresponding "time to which the tax belongs" is "2021-. The problem of multiple rows of cross-matches is that the "education expense plus" match of the second row corresponds to the "1339.00" case of the first row.
In order to solve the problem, in the line text alignment part, firstly, the target fields corresponding to a plurality of lines are determined, then, the reference text and the ordinate of the reference text are determined, and the average height of the target fields in the same line with the reference text is determined. See equation 3 for a specific calculation formula for text alignment.
Wherein, yRIs the ordinate of the reference text, y' is the ordinate of the object field in line with the reference text, havgIs the average height from the reference text.
It should be noted that "0.5" in the formula 3 is a possible preset threshold, and can be set according to the actual application requirements, and the formula 3 is described with reference to the accompanying drawings.
Referring to fig. 6, first, the first line "31" under the left "certificate number" is used as a reference text, and the ordinate of the reference text is yRThen, according to the practical application condition, the height range of the vertical coordinate floating up and down is set, and the reference is determined based on the height rangeThe test text is in other target fields in the same line. After determining the relationship of the first line, the process is repeated with the second line "31" under "document number" as the reference text until the alignment of the lines of the target field is completed.
And a third part: the multi-line text merging mainly aims at the case of multi-line text, as shown in fig. 6, it is assumed that the target document segment "tax authority" corresponds to two target document segments "national tax administration and economic development area" and "tax authority first tax authority", but in an actual application scenario, the two corresponding target fields should be one target field, namely "national tax administration and economic development area tax authority first tax authority", for which the multi-text merging operation is required.
Specifically, firstly, the "tax authority" is determined as a reference text, and target fields to be merged, namely the "national tax administration, economic development area" and the "tax authority first tax authority" are determined through the ordinate and abscissa of the reference text. See equation 4 for a specific calculation formula for multi-text merging.
Wherein, yRAs ordinate, x, of the reference textRFor reference to the abscissa of the text, wRFor the width of the reference text, y 'is the ordinate of the target field to be merged, x' is the abscissa of the target field to be merged, havgIs the average height from the reference text.
It should be noted that "0.5" and "1.5" in equation 4 are all possible preset thresholds, and may be set according to actual application requirements.
In addition, the drawings and the description of the examples provided by the embodiments of the application are only used for illustration and are not used for other purposes, so that the using specification of the drawings and the description of the examples meets the relevant national regulation.
Therefore, the extraction of the target text field can be realized through the processing of the three parts, and the accuracy of the extraction of the target text field is effectively improved through the processing of the three parts, namely target field retrieval, line text alignment and multi-line text combination.
By the method, the target text field can be extracted, the problem that the target text field meeting the service requirement in the tax finalization image cannot be extracted in the prior art is solved, and the following technical effects can be achieved:
1. the method comprises the steps of correcting an original image to be in a set state by performing image preprocessing operation on the original image and based on angle correction and noise filtering on the original image, filtering noise pixels existing in the original image, restoring image information of the original image to the maximum extent, and facilitating improvement of accuracy of determining each target image area in the original image;
2. the method comprises the steps of performing image detection on an original image, realizing end-to-end rapid text field detection by a modeling regression task based on a target detection model, realizing lightweight feature calculation through a small convolution kernel, realizing text box prediction through a full connection layer, and quickly realizing detection of a target image area in the image;
3. through a text recognition mode, a CTC model is introduced to combine pixel information with a time sequence relation, so that word content in an image is aligned, and through one-to-one corresponding conversion of 'image- > word', recognition of a text field corresponding to a target image area is realized, and the recognition accuracy is improved;
4. a coordinate analysis method is specifically introduced based on the processing of target field retrieval, line text alignment and multi-line text combination through a preset extraction rule so as to ensure the accuracy of the output target text field.
Based on the method provided by the embodiment of the application, the target text field can be sent to the front-end display interface, and personalized display service is provided for the user according to actual application requirements.
Based on the same inventive concept, the present application further provides a device for extracting a text field, so as to extract a target text field in a tax finalization image, solve the problem in the prior art that the target text field meeting the service requirement in the tax finalization image cannot be extracted, and effectively improve the accuracy of extracting the target text field, referring to fig. 8, the device includes:
a module 801 for determining a target image area, which is to obtain an original image containing a text field, and determine each target image area in the original image, where the target image area is an area containing the text field in the image to be processed;
a text field identification module 802, configured to perform text identification on the text field in each target image area to obtain a text field corresponding to each target image area;
and the target text field extraction module 803 extracts a target text field meeting the service requirement from the text field according to a preset extraction rule.
In one possible design, the target image area determining module 801 is specifically configured to obtain an image to be processed including a text field; rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1; projecting a text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain projection values of the rotating image, and determining N projection values corresponding to the N rotating images; and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
In one possible design, the target image area determining module 801 is specifically configured to divide the original image into a plurality of image blocks, where the image blocks represent part of images in the original image; calculating a Euclidean distance between two image blocks, and determining the two image blocks with the Euclidean distance smaller than a preset threshold value as similar image blocks; identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image; and denoising each similar region in the original image to obtain the denoised original image.
In one possible design, the target image region determining module 801 is specifically configured to extract image features in the original image based on a target detection model; and determining each target image area in the original image according to the image characteristics.
In a possible design, the target text field extracting module 803 is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
In one possible design, after the target text field extracting module 803, the target text field extracting module is further configured to send the target text field to a front-end display interface for display.
Based on the device, the extraction of the target text field in the tax finalization certification image is realized, the problem that the target text field meeting the business requirement in the tax finalization certification image cannot be extracted in the prior art is solved, and the accuracy rate of extracting the target text field is effectively improved.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing apparatus for extracting a text field, and with reference to fig. 9, the electronic device includes:
at least one processor 901 and a memory 902 connected to the at least one processor 901, in this embodiment, a specific connection medium between the processor 901 and the memory 902 is not limited in this application, and fig. 9 illustrates an example in which the processor 901 and the memory 902 are connected through a bus 900. The bus 900 is shown in fig. 9 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 900 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 9 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 901 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 902 stores instructions executable by the at least one processor 901, and the at least one processor 901 can execute the method for extracting text field discussed above by executing the instructions stored in the memory 902. The processor 901 may implement the functions of the respective modules in the apparatus shown in fig. 8.
The processor 901 is a control center of the apparatus, and can connect various parts of the entire control device by using various interfaces and lines, and by executing or executing instructions stored in the memory 902 and calling data stored in the memory 902, various functions of the apparatus and processing data are performed, thereby performing overall monitoring of the apparatus.
In one possible design, the processor 901 may include one or more processing units, and the processor 901 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, and the like, and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 901. In some embodiments, the processor 901 and the memory 902 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 901 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for extracting text fields disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
The processor 901 is programmed to solidify the code corresponding to the text field extracting method described in the foregoing embodiments into the chip, so that the chip can execute the steps of the text field extracting method of the embodiment shown in fig. 1 when running. How to program the processor 901 is well known to those skilled in the art and will not be described herein.
Based on the same inventive concept, the embodiment of the present application further provides a storage medium storing computer instructions, which when run on a computer, cause the computer to execute the method for extracting text fields discussed above.
In some possible embodiments, the various aspects of the method for extracting a text field provided by the present application may also be implemented in the form of a program product, which includes program code for causing the control apparatus to perform the steps of the method for extracting a text field according to various exemplary embodiments of the present application described above in this specification, when the program product runs on a device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (10)
1. A method of extracting a text field, the method comprising:
acquiring an original image containing a text field, and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
performing text recognition on the text fields in the target image areas to obtain the text fields corresponding to the target image areas;
and extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
2. The method of claim 1, wherein said obtaining an original image containing a text field comprises:
acquiring an image to be processed containing a text field;
rotating the image to be processed according to a preset angle to obtain N rotating images with different rotating angles corresponding to the image to be processed, wherein N is a positive integer greater than or equal to 1;
projecting a text field in the rotating image in a given direction, superposing the projections of the rotating image in the given direction to obtain projection values of the rotating image, and determining N projection values corresponding to the N rotating images;
and selecting a rotating image corresponding to the minimum projection value from the N projection values, and taking the rotating image as an original image.
3. The method of any of claims 1-2, further comprising, after the obtaining an original image containing a text field:
dividing the original image into a plurality of image blocks, wherein the image blocks represent partial images in the original image;
calculating a Euclidean distance between two image blocks, and determining the image block of which the Euclidean distance is smaller than a preset threshold value as a similar image block;
identifying the similar image blocks as similar areas to obtain one or more similar areas in the original image;
and denoising each similar region in the original image to obtain the denoised original image.
4. The method of claim 1, wherein said determining each target image region in said original image comprises:
extracting image features in the original image based on a target detection model;
and determining each target image area in the original image according to the image characteristics.
5. The method of claim 1, wherein the extracting a target text field satisfying a service requirement from the text fields according to a preset extraction rule comprises:
acquiring an incidence relation between a target text field and a text field based on a preset database;
and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
6. The method of claim 1, wherein after extracting a target text field satisfying a business requirement from the text field, further comprising:
and sending the target text field to a front-end display interface for display.
7. An apparatus for extracting a text field, the apparatus comprising:
a module for determining a target image area, which is used for acquiring an original image containing a text field and determining each target image area in the original image, wherein the target image area is an area containing the text field in the image to be processed;
a text field identification module for performing text identification on the text field in each target image area to obtain the text field corresponding to each target image area;
and the target text field extraction module is used for extracting a target text field meeting the service requirement from the text field according to a preset extraction rule.
8. The apparatus according to claim 7, wherein the module for determining a target image area is specifically configured to obtain an association relationship between a target text field and a text field based on a preset database; and extracting a target text field meeting the service requirement from the text fields in the original image according to the incidence relation.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-6 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111428606.1A CN114202766A (en) | 2021-11-29 | 2021-11-29 | Method and device for extracting text field and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111428606.1A CN114202766A (en) | 2021-11-29 | 2021-11-29 | Method and device for extracting text field and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114202766A true CN114202766A (en) | 2022-03-18 |
Family
ID=80649291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111428606.1A Pending CN114202766A (en) | 2021-11-29 | 2021-11-29 | Method and device for extracting text field and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114202766A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447067A (en) * | 2018-10-24 | 2019-03-08 | 北方民族大学 | A kind of bill angle detecting antidote and automatic ticket checking system |
CN110414517A (en) * | 2019-04-18 | 2019-11-05 | 河北神玥软件科技股份有限公司 | It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures |
CN112052858A (en) * | 2020-09-02 | 2020-12-08 | 中国银行股份有限公司 | Method for extracting target field in bill image and related device |
CN113011426A (en) * | 2021-03-16 | 2021-06-22 | 上饶市中科院云计算中心大数据研究院 | Method and device for identifying certificate |
-
2021
- 2021-11-29 CN CN202111428606.1A patent/CN114202766A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447067A (en) * | 2018-10-24 | 2019-03-08 | 北方民族大学 | A kind of bill angle detecting antidote and automatic ticket checking system |
CN110414517A (en) * | 2019-04-18 | 2019-11-05 | 河北神玥软件科技股份有限公司 | It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures |
CN112052858A (en) * | 2020-09-02 | 2020-12-08 | 中国银行股份有限公司 | Method for extracting target field in bill image and related device |
CN113011426A (en) * | 2021-03-16 | 2021-06-22 | 上饶市中科院云计算中心大数据研究院 | Method and device for identifying certificate |
Non-Patent Citations (1)
Title |
---|
仇伟涛: "复杂文本图像倾斜校正算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 March 2017 (2017-03-15), pages 138 - 4516 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401371B (en) | Text detection and identification method and system and computer equipment | |
CN110414507B (en) | License plate recognition method and device, computer equipment and storage medium | |
WO2019174130A1 (en) | Bill recognition method, server, and computer readable storage medium | |
CN109740606B (en) | Image identification method and device | |
CN112052781A (en) | Feature extraction model training method, face recognition device, face recognition equipment and medium | |
CN111680690B (en) | Character recognition method and device | |
CN108197644A (en) | A kind of image-recognizing method and device | |
CN112016438A (en) | Method and system for identifying certificate based on graph neural network | |
CN112541443B (en) | Invoice information extraction method, invoice information extraction device, computer equipment and storage medium | |
JP7026165B2 (en) | Text recognition method and text recognition device, electronic equipment, storage medium | |
CN114038004A (en) | Certificate information extraction method, device, equipment and storage medium | |
CN111353491B (en) | Text direction determining method, device, equipment and storage medium | |
CN112052845A (en) | Image recognition method, device, equipment and storage medium | |
CN110738219A (en) | Method and device for extracting lines in image, storage medium and electronic device | |
CN112949455B (en) | Value-added tax invoice recognition system and method | |
CN113011144A (en) | Form information acquisition method and device and server | |
CN111104941B (en) | Image direction correction method and device and electronic equipment | |
CN108830275B (en) | Method and device for identifying dot matrix characters and dot matrix numbers | |
CN114444565B (en) | Image tampering detection method, terminal equipment and storage medium | |
CN114881698A (en) | Advertisement compliance auditing method and device, electronic equipment and storage medium | |
CN111881923B (en) | Bill element extraction method based on feature matching | |
CN113486715A (en) | Image reproduction identification method, intelligent terminal and computer storage medium | |
CN111461905A (en) | Vehicle insurance fraud and claim evasion method and device, computer equipment and storage medium | |
CN114155363A (en) | Converter station vehicle identification method and device, computer equipment and storage medium | |
CN114241463A (en) | Signature verification method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |