CN109740606B - Image identification method and device - Google Patents
Image identification method and device Download PDFInfo
- Publication number
- CN109740606B CN109740606B CN201811563632.3A CN201811563632A CN109740606B CN 109740606 B CN109740606 B CN 109740606B CN 201811563632 A CN201811563632 A CN 201811563632A CN 109740606 B CN109740606 B CN 109740606B
- Authority
- CN
- China
- Prior art keywords
- character
- image
- region
- pixel
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The embodiment of the invention provides an image identification method and device, and relates to the technical field of image identification, wherein the method comprises the following steps: performing morphological gradient calculation on an image to be recognized to obtain a first gradient map; determining a region corresponding to an image region where characters in an image to be recognized are located in the first gradient image as a first image region; determining the number of characters in the first image area; determining a first grouping mode of characters in the first image area based on the number of the characters; based on a first grouping mode, performing character segmentation on the first image area to obtain a single character area; and performing character recognition on each single character area to further obtain a character recognition result of the image to be recognized. When the scheme provided by the embodiment of the invention is applied to image recognition, the anti-interference performance and the accuracy of the image recognition can be improved.
Description
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to an image recognition method and an image recognition device.
Background
In the internet and big data era, the demand of enterprises on information data is increased dramatically, the mode of collecting information data is more and more diversified, and the information is input by selecting a proper mode, so that convenience is brought to the enterprises and users.
With the continuous progress of the image recognition technology, some links requiring the user to manually input information can be completed by shooting and recognizing the image, so that convenience is brought to the user, and meanwhile, the situation of user input errors can be avoided. For example, the bank card number and the identification number of the user can be input through an image recognition method.
Taking the identification of a bank card number as an example, in the prior art, when identifying a bank card image, binarization processing is performed on the bank card image, then the position of a card number region is found through vertical projection of an obtained binary image, then character segmentation is performed on the binary image according to a histogram peak value after horizontal projection of the binary image, and finally each segmented character region is identified to obtain the card number.
The inventor finds that the prior art at least has the following problems in the process of implementing the invention: because the background patterns of the bank cards have various styles, when some bank cards are identified, the identification fails or the identification result is wrong. The current bank card image identification technology has weak anti-interference performance and low accuracy.
Disclosure of Invention
The embodiment of the invention aims to provide an image identification method and device so as to improve the anti-interference performance and accuracy of image identification. The specific technical scheme is as follows:
the embodiment of the invention provides an image identification method, which comprises the following steps:
performing morphological gradient calculation on an image to be recognized to obtain a first gradient map;
determining a region corresponding to an image region where characters in an image to be recognized are located in the first gradient image as a first image region;
determining the number of characters in the first image area;
determining a first grouping of characters within the first image region based on the number of characters;
performing character segmentation on the first image area based on the first grouping mode to obtain a single character area;
and performing character recognition on each single character area to further obtain a character recognition result of the image to be recognized.
In an implementation manner of the present invention, after obtaining the character recognition result of the image to be recognized, the method further includes:
and verifying whether the character recognition result is a valid recognition result or not to obtain a verification result.
In an implementation manner of the present invention, the performing character recognition on each single character region to obtain a character recognition result of the image to be recognized includes:
inputting each obtained single character region into a character recognition model for character recognition, obtaining a character recognition result of each character region as a first type recognition result of each character region, wherein the character recognition model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a first sample character area in advance to obtain a model used for detecting characters contained in the area, wherein the first sample character area is as follows: the first sample gradient map represents a region where one character is located, and the first sample gradient map is as follows: performing morphological gradient calculation on the first sample image to obtain an image;
and determining character recognition results of the single character areas based on the first type recognition results.
In one implementation manner of the present invention, the determining the character recognition result of each single-character region based on the first type recognition result includes:
determining a region corresponding to a preset number of pixel points of each single character region in the first image region along a preset direction, and taking the region as a candidate region of each single character region;
inputting each obtained candidate region into a character judgment model, judging whether each candidate region is a region containing characters or not, and obtaining a character judgment result of each candidate region, wherein the character judgment model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a second sample character area in advance to obtain a model for judging whether the area contains characters, wherein the second sample character area is as follows: the second sample gradient graph represents a region where a character is located or a region where a non-character is located, and the second sample gradient graph is as follows: performing morphological gradient calculation on the second sample image to obtain an image;
determining a candidate region with the highest confidence coefficient in the candidate regions as a correction region of each candidate region based on the obtained character judgment result of each candidate region;
inputting the correction area of each single character area into the character recognition model for character recognition, and obtaining the character recognition result of the correction area of each single character area as the second type recognition result of each single character area;
and determining the recognition result with the highest confidence level in the first type recognition result and the second type recognition result of each single character region as the character recognition result of the character region.
In an implementation manner of the present invention, the determining the number of characters in the first image region includes:
inputting each pixel row of the first image area into a character number detection model respectively to detect the number of characters to which pixel points belong in each pixel row, and obtaining a detection result corresponding to each pixel row, wherein the character number detection model is as follows: the method includes the steps that a preset neural network model is trained in advance by using the labeled number of characters to which pixel points belong in each pixel row and each pixel row in a third sample gradient graph, and the neural network model is used for detecting the number of characters to which the pixel points belong in the pixel rows, wherein the third sample gradient graph is as follows: performing morphological gradient calculation on the third sample image to obtain a gradient map;
based on the obtained detection result, the number of characters of the characters in the first image area is obtained.
In one implementation of the present invention, the determining a first grouping manner of the characters in the first image region based on the number of the characters includes:
determining a grouping mode detection model corresponding to the obtained number of the characters and used for detecting character grouping modes in the image, wherein the grouping mode detection model is as follows: the method includes the steps that a preset neural network model is trained in advance by using a labeling grouping mode of characters to which pixel points belong in each pixel row and each pixel row in a fourth sample gradient graph, and the neural network model is used for detecting the grouping mode of the characters of the pixel points in the pixel rows, wherein the fourth sample gradient graph is as follows: performing morphological gradient calculation on the fourth sample image to obtain a gradient map;
inputting each pixel row of the first image area into the grouping mode detection model respectively to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode;
calculating the probability and the value of the character to which the pixel points belong in each pixel row of the first image region in the preset grouping mode aiming at each grouping mode;
and determining the grouping mode corresponding to the maximum sum value as the first grouping mode of the characters in the first image area.
In an implementation manner of the present invention, the character segmentation on the first image region based on the first grouping manner to obtain a single character region includes:
counting the number of character pixel points in each pixel row of the first image area, wherein the character pixel points are as follows: pixel points belonging to a character;
obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being the first grouping mode, wherein the character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different;
determining a first estimated quantity distribution with the minimum difference degree between the obtained first estimated quantity distribution and the first distribution, wherein the first distribution is as follows: the number distribution of the character pixel points determined by the counted number;
and performing character segmentation on the first image area according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character area.
In an implementation manner of the present invention, determining, in the first gradient map, a region corresponding to an image region where a character in an image to be recognized is located as a first image region includes:
inputting each pixel row of the first gradient map into a region detection model respectively to obtain a first probability that the pixel row corresponding to each pixel row in the image to be recognized is located in an image region containing characters, wherein the region detection model is as follows: pre-training a preset neural network model by using each pixel row in a fifth sample gradient map to obtain a two-class neural network model, wherein the fifth sample gradient map is as follows: performing morphological gradient calculation on the fifth sample image to obtain a gradient map;
calculating a sum of first probabilities of each of a first preset number of consecutive pixel rows in the first gradient map;
and determining a corresponding area of a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be identified as a first image area.
In an implementation manner of the present invention, the performing morphological gradient calculation on the image to be recognized to obtain a first gradient map includes:
obtaining a gray component image and a chrominance component image of an image to be identified;
performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient image and a chrominance component gradient image;
and performing difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
In an implementation manner of the present invention, the performing a difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map includes:
carrying out binarization processing on the chrominance component gradient map to obtain a chrominance component binary map;
determining a pixel value of a first pixel point in the gray component gradient map as a first preset pixel value to obtain a first gradient map, wherein the first preset pixel value is as follows: the represented gradient value is smaller than the pixel value of the preset threshold, and the first pixel point is as follows: pixel points in the gray component gradient map corresponding to pixel points in the chromaticity component binary image whose pixel values are second preset pixel values, where the second preset pixel values are: and the pixel values of the background pixel points in the chromaticity component binary image.
An embodiment of the present invention further provides an image recognition apparatus, including:
the gradualization calculation module is used for performing morphological gradualization calculation on the image to be identified to obtain a first gradient map;
the region determining module is used for determining a region corresponding to an image region where characters in an image to be recognized are located in the first gradient map as a first image region;
the number determining module is used for determining the number of the characters in the first image area;
the grouping mode determining module is used for determining a first grouping mode of characters in the first image area based on the number of the characters;
the region obtaining module is used for carrying out character segmentation on the first image region based on the first grouping mode to obtain a single character region;
and the recognition result obtaining module is used for carrying out character recognition on each single character area so as to obtain the character recognition result of the image to be recognized.
In an implementation manner of the present invention, the apparatus further includes:
and the result verification module is used for verifying whether the character recognition result is a valid recognition result or not after the recognition result obtaining module obtains the character recognition result of the image to be recognized, so as to obtain a verification result.
In an implementation manner of the present invention, the identification result obtaining module includes:
the recognition result obtaining sub-module is used for inputting the obtained single character areas into a character recognition model for character recognition, obtaining the character recognition result of each character area as the first type recognition result of each character area, wherein the character recognition model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a first sample character area in advance to obtain a model used for detecting characters contained in the area, wherein the first sample character area is as follows: the first sample gradient map represents a region where one character is located, and the first sample gradient map is as follows: performing morphological gradient calculation on the first sample image to obtain an image;
and the recognition result determining submodule is used for determining the character recognition result of each single character area based on the first type of recognition result.
In one implementation manner of the present invention, the recognition result determining sub-module includes:
a candidate region determining unit, configured to determine, as a candidate region of each single-character region, a region corresponding to a predetermined number of pixels shifted in a predetermined direction in the first image region;
a judgment result obtaining unit, configured to input each obtained candidate region into a character judgment model, judge whether each candidate region is a region including a character, and obtain a character judgment result of each candidate region, where the character judgment model is: the method comprises the following steps of training a convolutional neural network model by adopting a second sample character area in advance to obtain a model for judging whether the area contains characters, wherein the second sample character area is as follows: the second sample gradient graph represents a region where a character is located or a region where a non-character is located, and the second sample gradient graph is as follows: performing morphological gradient calculation on the second sample image to obtain an image;
a correction region determining unit configured to determine, as a correction region of each candidate region, a candidate region having the highest confidence in the respective candidate regions based on the obtained character determination result of each candidate region;
a recognition result obtaining unit, configured to input the correction region of each single-character region to the character recognition model for character recognition, and obtain a character recognition result of the correction region of each single-character region as a second-type recognition result of each single-character region;
and the result determining unit is used for determining the recognition result with the highest confidence degree in the first type recognition result and the second type recognition result of each single character area as the character recognition result of the character area.
In an implementation manner of the present invention, the number determining module includes:
a detection result obtaining submodule, configured to input each pixel row of the first image area into a character number detection model respectively, detect the number of characters to which pixel points in each pixel row belong, and obtain a detection result corresponding to each pixel row, where the character number detection model is: the method includes the steps that a preset neural network model is trained in advance by using the labeled number of characters to which pixel points belong in each pixel row and each pixel row in a third sample gradient graph, and the neural network model is used for detecting the number of characters to which the pixel points belong in the pixel rows, wherein the third sample gradient graph is as follows: performing morphological gradient calculation on the third sample image to obtain a gradient map;
and the number obtaining submodule is used for obtaining the number of the characters in the first image area based on the obtained detection result.
In an implementation manner of the present invention, the grouping manner determining module includes:
a model determining sub-module, configured to determine a grouping manner detection model corresponding to the obtained number of characters and used for detecting a character grouping manner in the image, where the grouping manner detection model is: the method includes the steps that a preset neural network model is trained in advance by using a labeling grouping mode of characters to which pixel points belong in each pixel row and each pixel row in a fourth sample gradient graph, and the neural network model is used for detecting the grouping mode of the characters of the pixel points in the pixel rows, wherein the fourth sample gradient graph is as follows: performing morphological gradient calculation on the fourth sample image to obtain a gradient map;
the first probability obtaining submodule is used for respectively inputting each pixel row of the first image area into the grouping mode detection model to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode;
the first sum value operator module is used for calculating the probability and the value of the character to which the grouping mode of the pixel point in each pixel row of the first image area belongs to a preset grouping mode aiming at each grouping mode;
and the grouping mode determining submodule is used for determining the grouping mode corresponding to the maximum sum value as the first grouping mode of the characters in the first image area.
In an implementation manner of the present invention, the region obtaining module includes:
the number counting submodule is used for counting the number of character pixel points in each pixel row of the first image area, wherein the character pixel points are as follows: pixel points belonging to a character;
the distribution obtaining submodule is used for obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being the first grouping mode, wherein the character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different;
a distribution determining submodule, configured to determine a first estimated quantity distribution with a smallest difference from the first estimated quantity distribution, where the first distribution is: the number distribution of the character pixel points determined by the counted number;
and the region obtaining submodule is used for carrying out character segmentation on the first image region according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character region.
In one implementation manner of the present invention, the region determining module includes:
a second probability obtaining submodule, configured to input each pixel row of the first gradient map into a region detection model, to obtain a first probability that a pixel row corresponding to each pixel row in the image to be recognized is located in an image region containing characters, where the region detection model is: pre-training a preset neural network model by using each pixel row in a fifth sample gradient map to obtain a two-class neural network model, wherein the fifth sample gradient map is as follows: performing morphological gradient calculation on the fifth sample image to obtain a gradient map;
the second sum value operator module is used for calculating the sum value of first probabilities of each continuous first preset number of pixel rows in the first gradient map;
and the region determining submodule is used for determining a region corresponding to a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be identified as a first image region.
In one implementation of the present invention, the gradiometric module includes:
the image obtaining submodule is used for obtaining a gray component image and a chrominance component image of the image to be identified;
the first gradient map obtaining submodule is used for performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient map and a chrominance component gradient map;
and the second gradient map obtaining submodule is used for carrying out difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
In one implementation manner of the present invention, the second gradient map obtaining sub-module includes:
the image obtaining unit is used for carrying out binarization processing on the chrominance component gradient image to obtain a chrominance component binary image;
a gradient map obtaining unit, configured to determine that a pixel value of a first pixel in the grayscale component gradient map is a first preset pixel value, so as to obtain a first gradient map, where the first preset pixel value is: the represented gradient value is smaller than the pixel value of the preset threshold, and the first pixel point is as follows: pixel points in the gray component gradient map corresponding to pixel points in the chromaticity component binary image whose pixel values are second preset pixel values, where the second preset pixel values are: and the pixel values of the background pixel points in the chromaticity component binary image.
The embodiment of the invention also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing any of the steps of the image recognition method described above when executing a program stored in the memory.
In yet another aspect of the present invention, the present invention further provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform any of the steps of the image recognition method described above.
In yet another aspect of the present invention, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any of the image recognition methods described above.
The image recognition method and the image recognition device provided by the embodiment of the invention can obtain the gradient map of the image, determine the area where the characters are located in the image, determine the number of the characters in the image area and the first grouping mode of the characters, perform character segmentation on the area to obtain a single character area, and finally perform character recognition on each single character area to obtain the recognition result of the image. In the scheme provided by the embodiment of the invention, the card number region is determined by using the vertical projection of the binary image, the character segmentation is performed by using the horizontal projection of the binary image, the character region detection, the character length detection and the character grouping mode detection are performed on the morphologically graded image, the character is segmented according to the determined character grouping mode, and then the character is identified, so that the identification error caused by the positioning error of the character region can be avoided, the character segmentation is performed after the character number and the character grouping mode are determined in sequence, and the character segmentation error can be avoided, so that the anti-interference performance and the accuracy of the image identification are improved. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of an image recognition method according to an embodiment of the present invention;
fig. 2 is an image of a bank card according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a single character region in an image of a bank card according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
The embodiment of the invention provides an image recognition method and device, and concepts related to the embodiment of the invention are explained first.
Morphological gradiometry calculation: and respectively carrying out morphological processing of expansion and corrosion on the image, and then subtracting the corroded image from the expanded image to obtain a difference image. In morphological processing of dilation and erosion of an image, a3 × 3 convolution kernel may be selected as the feature detector.
Grouping mode: the number of characters arranged successively and the case where characters arranged discontinuously are separated are the number of characters.
Taking the card number of the bank card as an example, assuming that the card number of the bank card includes 16 characters, the grouping method can be as follows: 4-4-4-4, every 4 characters are arranged together continuously, and the continuously arranged character strings are separated by the width of 1 character, which is characterized in that: 6200000000000000, respectively; assuming that the card number of the bank card contains 19 characters, the grouping method can be as follows: 6-13, every 4 characters are arranged together continuously, and the continuously arranged character strings are separated by the width of 1 character, which is specifically represented as: 6200000000000000000.
a neural network model: complex network systems are formed by a large number of simple processing units widely interconnected. Wherein, the convolutional neural network model: the method is a feedforward neural network model and can be used for large-scale image processing. The convolutional neural network model includes convolutional and pooling layers. The convolutional neural network model comprises a one-dimensional convolutional neural network model, a two-dimensional convolutional neural network model and a three-dimensional convolutional neural network model. One-dimensional convolutional neural network models are often applied to data processing of sequence classes; a two-dimensional convolutional neural network model is often applied to the identification of image texts; the three-dimensional convolution neural network model is mainly applied to medical image and video data identification.
One-character region: indicating the area where the individual character is located. When recognizing characters in an image, character segmentation is often required to be performed on a character area in the image, an area where a single character is located is determined, and then character recognition is performed on the areas where the single character is located one by one.
Distribution of the number of character pixel points: the discrete distribution of the number of character pixels of each pixel column in the image can be expressed in an array form or a vector form.
The following describes the image recognition method provided by the embodiment of the present invention in detail by using specific embodiments.
Referring to fig. 1, fig. 1 is a flowchart of an image recognition method according to an embodiment of the present invention, including the following steps:
and S101, performing morphological gradient calculation on the image to be recognized to obtain a first gradient map.
The image to be identified can be a gray image or a color image. If the image to be recognized is a gray image, morphological gradient calculation can be directly carried out on the image to be recognized to obtain a first gradient image; if the image to be recognized is a color image, a gray scale image of the image to be recognized can be obtained first, and then morphological gradient calculation is carried out on the gray scale image to obtain a first gradient image.
The embodiment of the present invention is described only by way of example of obtaining the first gradient map, and the present invention is not limited thereto.
Step S102, in the first gradient map, determining a region corresponding to an image region where characters in the image to be recognized are located as a first image region.
The image area where the character is located includes a character portion and a character-free portion in the same line as the character. When the image area is determined, a binarization-based horizontal projection algorithm may be adopted, and the specific steps may include: carrying out binarization processing on the first gradient image to obtain a black and white binary image; counting the distribution of pixel points with white or black color in each pixel line of the binary image; and determining the image area according to the statistical result.
If all the images to be recognized are image areas where the characters are located, the whole first gradient image is the first image area. Of course, the image region where the character is located may also be a partial region in the image to be recognized, in which case, the first image region is a partial region of the first gradient map.
The characters may be numbers, letters, or Chinese characters, or a mixture of at least two of the above three, which is not limited in the embodiments of the present invention.
The image to be recognized may be a bank card image, as shown in fig. 2, the image area where the character is located may be a card number area of a bank card in the image.
And step S103, determining the number of characters in the first image area.
When determining the number of characters in the first image region, a binarization-based horizontal projection algorithm may be adopted, and the specific steps may include: carrying out binarization processing on the first image area; counting the distribution of pixel points with white or black colors in each pixel row of the obtained binary image; and determining the number of the characters in the first image area according to the statistical result.
The number of characters in the first image area can also be determined by calculating the quotient of the width of the first image area and a preset character width, and can also be determined by detecting pixel rows of the first image area through a pre-trained neural network model.
And step S104, determining a first grouping mode of characters in the first image area based on the number of the characters.
In some application scenarios, the grouping manner of the characters is fixed, so that for an image of such an application scenario, after the number of characters is determined, the grouping manner of the characters in the image can be determined according to the setting of the characters contained in the image.
For example: the image to be identified is the image of the Chinese Unionpay bank card, and the image area containing the characters is the card number area of the bank card. When the number of the bank card number is determined to be 16, the grouping mode of the bank card number can be directly determined to be 4-4-4-4 according to the rule of the bank card number of China Unionpay, every 4 numbers are continuously arranged together, and continuously arranged number strings are separated by a blank area. When the number of the bank card number is 18, the grouping mode of the bank card number is 6-6-6 according to the rule of the bank card number of China Unionpay, every 6 numbers are continuously arranged together, and the continuously arranged number strings are separated by a blank area.
For the case that there are multiple grouping modes with the same number of characters, after the number of characters is determined, it is further necessary to determine the grouping mode based on the number of characters, and how to determine the grouping mode based on the number of characters is described in detail in the following embodiments, which is not repeated herein.
Step S105, based on the first grouping mode, performing character segmentation on the first image area to obtain a single character area.
In some scenarios, the width of the region where each character is located is often fixed, and the widths of the regions where the characters are located are similar.
For example, if the width of one character is about 27 pixels, the first image region may be character-segmented according to the interval of 27 pixels, so as to obtain a plurality of character regions.
Based on the above example, as shown in FIG. 3, each white box in FIG. 3 represents a single-character region.
And S106, performing character recognition on each single character area to further obtain a character recognition result of the image to be recognized.
The recognition result obtained by recognizing the single character region is a single character. The character recognition result of the image to be recognized is as follows: and identifying characters obtained by the image to be identified. If only one single-character region exists, the result of identifying the single-character region is the character identification result of the image to be identified, and if a plurality of single-character regions exist, the identification results of the single-character regions can be combined together according to the reading sequence of the characters to obtain the character identification result of the image to be identified.
In one implementation, each single-character region may be identified by a template matching algorithm: the size of each single character area is firstly scaled to the size of the template in the character database, then the matching is carried out with all the templates, and the best matching is selected as the result. In another implementation, the individual single-character regions may be identified by a neural network model. When the neural network is used for identifying the single character region, the character can be firstly extracted, and then the obtained characteristic is input into the neural network model to obtain an identification result, or the image can be directly input into the neural network model to realize the characteristic extraction and identification by the model.
The image recognition method provided by the embodiment of the invention can obtain the gradient map of the image, determine the area where the characters are located in the image, determine the number of the characters in the image area and the first grouping mode of the characters, then perform character segmentation on the area to obtain a single character area, and finally perform character recognition on each single character area respectively to obtain the recognition result of the image. In the scheme provided by the embodiment of the invention, the card number region is determined by using the vertical projection of the binary image, the character segmentation is performed by using the horizontal projection of the binary image, the character region detection, the character length detection and the character grouping mode detection are performed on the morphologically graded image, the character is segmented according to the determined character grouping mode, and then the character is identified, so that the identification error caused by the positioning error of the character region can be avoided, the character segmentation is performed after the character number and the character grouping mode are determined in sequence, and the character segmentation error can be avoided, so that the anti-interference performance and the accuracy of the image identification are improved.
In an implementation manner of the present invention, after the step S106, it is verified whether the character recognition result is a valid recognition result, and a verification result is obtained.
The recognition of each single-character region in step S106 can obtain the recognition confidence of each single-character region, wherein the recognition confidence is a numerical value from 0 to 1. In the implementation mode, whether the sum of the recognition confidence degrees of the single character areas is larger than a preset threshold value or not is verified, and if the sum of the recognition confidence degrees of the single character areas is larger than the preset threshold value, the recognition result is determined to be a valid recognition result. The preset threshold may be the total number of each single-character region minus 1, or may be the total number of each single-character region multiplied by 0.9 or 0.8, for example, if 18 single-character regions are obtained in step S105, the preset threshold may be 17 or 16.2.
In addition, in some application scenarios, the recognition result has a preset rule, and after the recognition result is obtained, the recognition result with the recognition error can be detected by using the preset rule of the recognition result for detection.
For example, when the image to be recognized is a Bank card image and the recognition target is a Bank card Number, the first 6 digits of the recognized Number may be matched with the 6-digit BIN (Bank Identification Number) of the Bank card, and if the first 6 digits of the recognized Number have no corresponding BIN, a recognition error may be indicated. For a bank card of a known issuing bank and bank card type, if the first 6 digits of the identified number are different from the BIN of the bank card of the type, an identification error is declared.
Meanwhile, when the image to be identified is a bank card image and the identification target is a bank card number, performing modulo-10 verification on the identified number, wherein the last digit of the bank card number is a verification digit of the modulo-10 verification, and if the identification result cannot pass the modulo-10 verification, indicating that the identification is wrong.
In this implementation, it is verified whether the character recognition result is a valid recognition result. The characters in the recognized image have preset rules, the recognition result which can be verified through the rules is determined as an effective recognition result, the recognition result which does not accord with the rules is eliminated, and the accuracy of the finally obtained recognition result can be improved.
In an implementation manner of the present invention, when performing character recognition on each single character region in step S106 and further obtaining a character recognition result of the image to be recognized, the obtained each single character region may be first input into a character recognition model to perform character recognition, so as to obtain a character recognition result of each character region, which is used as a first type recognition result of each character region, and then the character recognition result of each single character region is determined based on the first type recognition result.
The character recognition model is: and the model is obtained by training the convolutional neural network model in advance by adopting the first sample character area and is used for detecting the characters contained in the area.
The first sample character area is: the first sample gradient map represents the region where a character is located.
The first sample gradient plot is: and performing morphological gradient calculation on the first sample image to obtain an image.
The character recognition result may include characters existing in each character area obtained by recognition, and in addition, the character recognition result may include: the character existing in each character region is the confidence of the character obtained by the recognition.
The confidence that the character present in each character region is a recognized character can be understood as: the probability that the character existing in the character area is the character obtained by recognition.
The first sample image may be a grayscale image or a color image.
The first sample image may be an image including a plurality of characters or an image including one character.
When the first sample image is an image containing a plurality of characters, firstly, morphological calculation is carried out on the first sample image to obtain a first sample gradient map, and then the first sample gradient map is divided to obtain a first sample character area.
In the case where the first sample image is an image including one character, the morphological gradient calculation may be performed on the first sample image to obtain the first sample gradient map, and in this case, all of the first sample gradient map may be directly used as the first sample character region. In addition, although the first sample image only contains one character, the first sample image may contain other contents besides the character, and for this reason, after the first sample gradient map is obtained, the region where the character is located may be determined, and the region where the character is located may be determined as the first sample character region.
When performing character recognition on each character region, the obtained first-type character recognition result of each character region may only include one recognized character, and may also include multiple recognized possible characters.
When the first type character recognition result of each character region only contains one recognized character, the characters contained in the image to be recognized can be determined according to the position sequence of each character region in the image to be recognized.
When the first-type character recognition result of each character region includes a plurality of recognized possible characters, the characters included in the image to be recognized can be determined according to the character with the highest degree of confidence in the first-type character recognition results of each character region and the position sequence of each character region in the image to be recognized. And determining characters contained in the image to be recognized according to the degree that the combination of the characters in the first-class character recognition results according to the position sequence conforms to the grammar structure.
Therefore, the implementation mode can determine the characters contained in the image to be recognized according to the first type character recognition result of each character area, and can quickly obtain the characters contained in the recognized image.
And determining character recognition results of the single character areas based on the first type recognition results.
In one implementation, the first type of recognition result may be directly determined as the character recognition result of each single character region; in another implementation, the first type of recognition result may be compared with other recognition results, and a suitable recognition result may be selected and determined as the character recognition result of each single character region.
In the implementation mode, each character area is input to the character recognition model for character recognition, and the character recognition result of each character area is obtained. In the implementation mode, characters are not recognized in a low-accuracy mode of a graphic algorithm, but a convolutional neural network model trained by a large number of samples is used for detecting a single-character region. The convolutional neural network is trained by using a large number of samples, so that the convolutional neural network can learn the characteristics of characters under various backgrounds, and the character areas subjected to morphological gradient calculation are used for training the convolutional neural network, and the edges in the image content can be highlighted by the morphological gradient calculation, so that the trained convolutional neural network model can effectively identify the characters of the complex background in the image, and the character identification accuracy can be improved.
In the above implementation, since the preset interval for performing character segmentation on the first image region is only a statistical value, and the actual width of each character is not absolutely equal, besides, the image may have deformation, rotation, and the like under the influence of factors such as the shooting angle, for this reason, some single character regions may include exactly one complete character, and some single character regions may include a part of one character, among the single character regions obtained by character segmentation in step S105.
In order to solve the problem that a single-character region obtained by the character segmentation may have a part containing one character, based on the above implementation manner, another implementation manner of the present invention is proposed, wherein the determining the character recognition result of each single-character region based on the first type recognition result may be implemented by the following steps a1-a 5:
step A1: and determining a region corresponding to a preset number of pixel points of each single character region in the first image region along a preset direction, and taking the region as a candidate region of each single character region.
The offset along the predetermined direction may be offset along a horizontal direction or offset along a vertical direction.
The number of the candidate regions of each character region may be one, for example, each character region is shifted in the image to be recognized along a certain direction by a region corresponding to a preset number of pixels;
the number of candidate regions per character region may be plural, for example: and each character area is deviated from the area corresponding to a preset number of pixel points in the image to be recognized along a plurality of directions.
The preset number may be 3 pixels, 4 pixels, and so on.
Since the candidate region of each character region is obtained by offsetting the character region by a preset number of pixel points along a preset direction, the candidate region of each character region is equal to the character region in size.
Step A2: and inputting each obtained candidate area into a character judgment model to judge whether each candidate area is an area containing characters or not, and obtaining a character judgment result of each candidate area.
The character judgment model is as follows: and the model is obtained by training the convolutional neural network model in advance by adopting a second sample character area and is used for judging whether the area contains characters.
The second sample character area is: the second sample gradient map represents the region where a character is located or the region where a non-character is located.
The second sample gradient map is: and performing morphological gradient calculation on the second sample image to obtain an image.
The character determination result may include: determining that the candidate region includes a character and determining that the candidate region does not include a character, that is, a non-character region, may also include: determining that the candidate region includes the character and determining that the candidate region includes the confidence of the character may further include: and judging the candidate region as a non-character region and judging the confidence of the non-character region.
The second sample image may be a grayscale image or a color image.
The second sample image may be a character sample image and a non-character sample image, where the character sample image may be a sample image containing one character or a sample image containing a plurality of characters. The character sample image and the non-character sample image may be derived from one original image or from the same type of original image.
Taking a bank card as an example, the character sample image in the second sample image is derived from a bank card image, and the non-character sample image may be obtained at a preset number of pixel points of the offset character sample image in the bank card image.
Step A3: and determining the candidate region with the highest confidence coefficient in the candidate regions as the correction region of each candidate region based on the obtained character judgment result of each candidate region.
The candidate region with the highest confidence coefficient among the plurality of candidate regions is used as the correction region.
Step A4: and inputting the correction area of each single character area into a character recognition model for character recognition, and obtaining the character recognition result of the correction area of each single character area as the second type recognition result of each single character area.
Step A5: and determining the recognition result with the highest confidence level in the first type recognition result and the second type recognition result of each single character region as the character recognition result of the character region.
In this implementation, the recognition result with the highest confidence level among the first type recognition result and the second type recognition result of each character region is determined as the final recognition result of the character region, so that the accuracy of character recognition can be improved. And the accuracy of character recognition can be further improved by inputting each candidate area into the character judgment model, outputting the character judgment result of each candidate area and taking the candidate area with the highest confidence level in the character judgment result as a correction area. In addition, in this implementation, the morphologically graded image is detected using a convolutional neural network model trained with a large number of samples. The neural network is trained by using a second sample character area of a second sample gradient image obtained by performing morphological gradient calculation on the second sample image as a sample, so that the anti-interference performance of the character judgment model is enhanced, the model can effectively judge whether characters exist in a complex background in the image, and the character recognition accuracy can be improved.
In an implementation manner of the present invention, when the number of characters in the first image region is determined in step S103, each pixel row of the first image region may be input into the character number detection model to detect the number of characters to which a pixel point in each pixel row belongs, so as to obtain a detection result corresponding to each pixel row, and then the number of characters in the first image region is obtained based on the obtained detection result.
The character number detection model is as follows: and the neural network model is obtained by training a preset neural network model in advance by using the labeled number of the characters to which the pixel points in each pixel line and each pixel line in the third sample gradient image belong, and is used for detecting the number of the characters to which the pixel points in the pixel lines belong.
The third sample gradient map is: and performing morphological gradient calculation on the third sample image to obtain a gradient map.
The first image area is a part of a first gradient image obtained by performing morphological gradient calculation on the image to be recognized, so that pixels in the first image area correspond to pixels in the image to be recognized, and because the first image area contains characters formed by pixel points, characters to which the pixel points in a pixel row in the image to be recognized belong are characters to which the pixel points in the first image area corresponding to the pixel points in the pixel row in the first image area belong.
The pixel line input to the character number detection model may be composed of a first preset number of pixel points, and the first preset number may take a value of 240 or 300. If the number of the pixel points of the pixel rows of the first image area is larger than the first preset number, the first image area can be reduced, so that the width of the first image area is the first preset number of the pixel points; if the pixel point number of the pixel line of the first image area is less than the third preset number, the pixel point can be used for completing the pixel line, and the pixel value of the pixel point used for completing is as follows: the represented gradient value is smaller than the pixel value of the preset threshold value. If the gradient values represented by the colors from white to black are sequentially decreased from large to small in the first image region, when the number of the pixel points of the pixel rows in the first image region is smaller than the first preset number, the pixel points represented by the pixel values to be black can be used to complement the pixel rows into the pixel rows with the first preset number of pixel points.
The detection result may be a specific number of characters, for example: "18"; the detection result may also be a second preset number of possible specific character numbers and a probability corresponding to each character number, for example: "17: 0.10,18:0.70,19: 0.20", i.e., the number of characters may be 17, 18, or 19, where the probability of the number of characters being 17 is 0.10, the probability of the number of characters being 18 is 0.70, and the probability of the number of characters being 19 is 0.20; the detection result may also be a probability that the number of characters is a preset number of one or more characters, for example: "0.05, 0.00,0.75, 0.20", for the preset number of four characters 16, 17, 18, and 19, the probabilities that the detected number of characters is the preset number of characters are 0.05,0.00,0.75, and 0.20, respectively.
If the detection result is a specific number of the number of characters, the number of characters in the image to be recognized corresponding to each pixel row is directly obtained from the detection result, the number of characters in the detection result of a plurality of pixel rows can be used as the number of characters in the image to be recognized, for example, the number of characters with the largest occurrence frequency in the detection result can be used as the number of characters in the image to be recognized.
If the detection result is the second preset number of possible specific character numbers and the probability corresponding to each character number, the detection result can be counted, the sum of the probability values of the character numbers in the detection result is calculated according to each obtained character number, and the character number with the maximum probability value and the maximum probability value is used as the character number of the characters in the image to be recognized.
If the detection result is the probability of the preset number of characters, the sum of the probability values corresponding to the number of each preset character in the detection result can be calculated, and the number of the characters with the maximum probability values is used as the number of the characters in the image to be recognized.
In the implementation mode, the pixel rows of the first image area are input into a neural network model obtained through pre-training, and then the character number of the characters in the first image area is obtained based on the output of the neural network model. In this implementation, the first image region is detected using a neural network model trained with a large number of samples. The preset difference characteristics of different character quantities are used as samples to train the neural network, so that the model can effectively distinguish the character quantities, the character quantities can be accurately obtained, and the accuracy of a final recognition result can be improved.
In one implementation manner of the present invention, the step S104 for determining the first grouping manner of the characters in the first image region based on the number of the characters can be implemented by the following steps B1-B4:
step B1: and determining a grouping mode detection model corresponding to the obtained number of the characters and used for detecting character grouping modes in the image.
The grouping mode detection model is as follows: and the neural network model is obtained by training a preset neural network model in advance by using the labeling grouping mode of the characters to which the pixel points in each pixel line and each pixel line in the fourth sample gradient image belong, and is used for detecting the grouping mode of the characters of the pixel points in the pixel lines.
The fourth sample gradient map is: and performing morphological gradient calculation on the fourth sample image to obtain a gradient map.
The different numbers of characters may correspond to different grouping mode detection models, and the grouping mode detection models may be obtained by training using sample images in which the number of characters included is a specific preset number of characters.
Step B2: and respectively inputting each pixel row of the first image area into a grouping mode detection model to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode.
The pixel rows input to the grouping mode detection model may be composed of a first preset number of pixel points, and the first preset number may take a value of 240 or 300. If the number of the pixel points of the pixel rows of the first image area is larger than the first preset number, the first image area can be reduced, so that the width of the first image area is the first preset number of the pixel points; if the pixel point number of the pixel line of the first image area is less than the third preset number, the pixel point can be used for completing the pixel line, and the pixel value of the pixel point used for completing is as follows: the represented gradient value is smaller than the pixel value of the preset threshold value. If the gradient values represented by the colors from white to black are sequentially decreased from large to small in the first image region, when the number of the pixel points of the pixel rows in the first image region is smaller than the first preset number, the pixel points represented by the pixel values to be black can be used to complement the pixel rows into the pixel rows with the first preset number of pixel points.
The value of the probability may be a value between 0 and 1.
Step B3: and calculating the probability and the value of the character to which the grouping mode of the pixel point in each pixel row of the first image area belongs to a preset grouping mode aiming at each grouping mode.
Step B4: and determining the grouping mode corresponding to the maximum sum value as a first grouping mode of the characters in the first image area.
If two or more different grouping modes in which the corresponding sum values are the maximum sum values are provided, the probabilities corresponding to the two or more different grouping modes and subjected to summation calculation can be compared, and the grouping mode corresponding to the probability with the maximum probability value is determined as the grouping mode of the characters in the image to be recognized.
In the implementation mode, a corresponding grouping mode detection model is determined according to the number of characters, then pixel rows of a first image area are input into a neural network model obtained through pre-training, the probability that the grouping mode corresponding to each pixel row is a preset grouping mode is obtained, then the probability sum value is calculated, and the preset grouping mode corresponding to the maximum sum value is determined as the grouping mode. In this implementation, the first image region with the determined number of characters is detected using a neural network model trained with a large number of samples. The preset difference characteristics of different grouping modes are used as samples to train the neural network, so that the model can effectively distinguish different grouping modes, can deal with the situation that the same number of characters has different grouping modes, and can accurately determine the grouping modes, thereby improving the accuracy of the final recognition result.
In an implementation manner of the present invention, the step S105 of performing character segmentation on the first image region based on the first grouping manner to obtain a single character region can be implemented by adopting the following steps C1-C4:
step C1: and counting the number of character pixel points in each pixel column of the first image area.
The character pixel points are as follows: the pixel points belonging to the character. The first image area is obtained after morphological calculation of the image to be recognized, so the pixel character points are edge pixel points of the character.
Step C2: and obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being a first grouping mode.
The character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different. The character group interval is the distance between adjacent character groups, and can be represented by the number of pixel points. The above character arrangement directly determines how to perform character segmentation on the image to be segmented.
The manner of obtaining the above character arrangement includes but is not limited to:
(1) directly acquiring preset character arrangement;
(2) and selecting different sizes in a preset size range, and obtaining the character arrangement according to the selected sizes.
The first estimated number distribution is: and when the estimated characters are distributed according to the characters corresponding to the first estimated quantity distribution, the quantity of character pixel points in each pixel unit in the image area where the estimated characters are located. The manner of obtaining the first pre-estimated number distribution includes but is not limited to:
(1) directly acquiring a preset first estimated quantity distribution;
(2) the method comprises the steps of obtaining pre-set estimated quantity distribution of single characters, and combining the pre-set estimated quantity distribution of the single characters according to corresponding character arrangement and character group spacing to obtain first estimated quantity distribution, wherein the estimated quantity distribution of the single characters represents the quantity of character pixel points in each pixel unit in an image area of a single character.
Step C3: a first estimated quantity distribution with the smallest difference degree between the obtained first estimated quantity distribution and the first distribution is determined.
The first distribution is: and the distribution of the number of character pixel points is determined by the counted number.
When calculating the difference between the first estimated quantity distribution and the first distribution, the difference between corresponding elements in the first estimated quantity distribution and the first distribution may be calculated first, and then the absolute values of the obtained differences are summed up to be used as the difference; the sum of squares of each corresponding element in the first estimated number distribution and the first distribution may also be calculated as the degree of difference.
In one implementation, each of the first estimated number distribution and the first distribution is normalized, and the difference is calculated by using the normalized distribution, so that the situation that the number of character pixels is not matched due to different image sizes can be avoided. The influence of the image size on the number of the character pixels is eliminated, and the shape formed by the character pixels can be better reflected through the distribution of the number of the character pixels.
The first estimated quantity distribution and the first distribution are both distributed discretely, and because the number of elements in the first estimated quantity distribution is not necessarily equal to the number of elements in the first distribution, the number of elements in the first estimated quantity distribution and the number of elements in the first distribution can be compared, and the distribution with a small number of elements is supplemented to be equal to the number of elements in the other distribution by a preset numerical value. The predetermined value may be 0, and for the distribution subjected to the normalization process, the predetermined value may be 0.3 or 0.5.
Step C4: and performing character segmentation on the first image area according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character area.
As shown in FIG. 3, each white box in FIG. 3 represents a single character region.
In this implementation, the number of character pixel points in the pixel column of the image to be segmented may be counted first, then the estimated number distribution of the character pixel points in each character arrangement is obtained, then the estimated number distribution with the smallest difference from the counted number composition distribution in the estimated number distribution is determined, and finally the character segmentation is performed on the image to be segmented according to the character arrangement corresponding to the estimated number distribution. In the implementation mode, the character features existing in the image are converted into character pixel point quantity distribution data, the character pixel point quantity distribution converted from the image to be segmented is compared with the pre-estimated quantity distribution corresponding to different character segmentation parameters, the character segmentation parameter with the minimum difference is determined, and compared with the character segmentation directly according to the default segmentation parameter, the character segmentation accuracy is improved.
In an implementation manner of the present invention, the step S102 mentioned above is to determine, in the first gradient map, a region corresponding to an image region where a character in the image to be recognized is located, and as the first image region, the following steps D1-D3 can be adopted to implement:
step D1: and respectively inputting each pixel row of the first gradient image into the region detection model to obtain a first probability that the pixel row corresponding to each pixel row in the image to be recognized is located in the image region containing the characters.
The region detection model is as follows: and (4) training a preset neural network model by using each pixel row in the fifth sample gradient map in advance to obtain a two-classification neural network model.
The fifth sample gradient map is: and performing morphological gradient calculation on the fifth sample image to obtain a gradient map.
The first probability is the probability that the pixel line corresponding to the input pixel line in the image to be recognized is located in the image area containing the character, and the numerical value may be between 0 and 1.
The pixel row input to the area detection model may be composed of a third preset number of pixels, and the third preset number may take a value of 240 or 300. If the number of the pixels in the pixel row of the first gradient map is greater than the third preset number, the first gradient map may be reduced so that the width of the first gradient map is the third preset number of pixels; if the number of pixels in the pixel rows of the first gradient map is less than the third preset number, the pixel rows may be completed by using pixels, and the pixel values of the pixels used for completing are: the represented gradient value is smaller than the pixel value of the preset threshold value. If the gradient values represented from white to black are sequentially decreased from large to small in the first gradient map, when the number of pixels of the pixel rows of the first gradient map is less than the third preset number, the pixel rows may be complemented into the pixel rows having the third preset number of pixels using the pixels whose pixel values are represented as black.
Step D2: a sum of first probabilities for each of a first predetermined number of consecutive pixel rows in the first gradient map is calculated.
The first preset number represents how many pixels higher the determined image area containing the character is, and the first preset number may take a value of 27 or 30, etc. If a first predetermined number of consecutive pixel rows is considered as a group of pixel rows, then each consecutive first predetermined number of pixel rows represents: a plurality of groups of continuous first preset number of pixel rows of the selected pixel rows can be repeated; the sequential representation: each pixel row in a group of pixel rows is adjacent to each other two by two.
For example, when the first preset number is 27, each continuous 27 pixel rows in the first gradient map may be represented as: line 1 to line 27, line 2 to line 28, line 3 to line 29 … …
Step D3: and determining a corresponding area of a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be recognized as a first image area.
In the implementation mode, the pixel rows of the first gradient image are input into a pre-trained binary neural network model to obtain the probability that the pixel rows in the first gradient image are located in the image area containing the characters, then the probability and the value of each continuous preset number of pixel rows are calculated, and then the area where the continuous preset number of pixel rows with the maximum probability and value are located is determined as the first image area containing the characters. In this implementation, the first gradient map is detected using a neural network model trained with a large number of samples. The difference characteristics of the characters and the background patterns are used as samples to train the neural network, so that the model can effectively distinguish the characters to be recognized from the background patterns, and the accuracy of the determined first image area containing the characters is improved.
In an implementation manner of the present invention, in step S102, when performing morphological gradient calculation on the image to be recognized to obtain the first gradient map, the gray component image and the chrominance component image of the image to be recognized may be obtained first; performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient image and a chrominance component gradient image; and then, carrying out difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
As shown in fig. 3, fig. 3 is a first gradient diagram of a bank card image obtained by applying the present implementation.
Based on the chromaticity space adopted by the image to be recognized, more than one chromaticity component image can be obtained, and each chromaticity component image represents the component of the image to be recognized on one chromaticity. And performing morphological gradient calculation on the image to be identified to obtain a plurality of chrominance component gradient maps, and performing difference operation on the gray component gradient map and the chrominance component gradient maps to obtain a first gradient map. How to perform the difference operation on the grayscale component gradient map and the multiple chrominance component gradient maps is described in detail in the following embodiments, which is not repeated herein.
When the gray component image and the chrominance component image of the image to be recognized are obtained, a YCbCr color space model can be adopted to obtain a Y component of the image to be recognized as the gray component image and obtain a Cb component and a Cr component of the image to be recognized as two chrominance component images.
In the implementation mode, the image to be recognized is divided into a gray component and a chrominance component, morphological gradient calculation is respectively carried out, and difference operation is carried out on the two obtained gradient images. The gradient image obtained by morphological gradient reflects the pattern edge in the image, and the color of the content to be identified is not rich enough, but the background pattern is rich in color, so that the method can weaken the interference of the background pattern on the determination of the first image area containing characters, character segmentation and single character identification, and improve the accuracy of image identification.
Based on the foregoing implementation manner, in another implementation manner of the present invention, when performing difference operation on the grayscale component gradient map and the chrominance component gradient map to obtain the first gradient map, binarization processing may be performed on the chrominance component gradient map to obtain a chrominance component binary map; and then determining the pixel value of a first pixel point in the gray component gradient image as a first preset pixel value to obtain a first gradient image.
The first predetermined pixel value is: the represented gradient value is smaller than the pixel value of the preset threshold value.
The first pixel point is: and the pixel points in the gray component gradient image correspond to the pixel points of which the pixel values in the chromaticity component binary image are the second preset pixel values.
The second predetermined pixel value is: and the pixel value of the background pixel point in the chromaticity component binary image.
When the pixel value of a first pixel point in the gray component gradient image is determined to be a first preset pixel value, if the pixel value of the first pixel point is originally the first preset pixel value, the pixel value is not changed, and if the pixel value of the first pixel point is not the first preset pixel value, the pixel value of the first pixel point is changed into the first preset pixel value.
The first preset pixel value may be a pixel value that causes a pixel to appear black if, when the morphological calculation yields the grayscale component gradient map and the chrominance component gradient map, white is used to indicate that the gradient value is large, black is used to indicate that the gradient value is small, and gray is used to indicate that the gradient values of both black and white are in between.
The method comprises the following steps of obtaining a chrominance component binary image by binarization of a chrominance component gradient image, wherein pixel points only have two pixel values: the pixel values indicating the original chroma component gradient map with the larger gradient value represent the background patterns which need to be removed for identification, and therefore the pixel values indicating the original chroma component gradient map with the larger gradient value are the second preset pixel values.
The chrominance component binary image and the gray component gradient image are obtained by processing images to be identified, if the sizes of the images are not changed in the image processing process of obtaining the chrominance component binary image and the gray component gradient image, pixel points in the gray component gradient image corresponding to the pixel points in the chrominance component binary image are pixel points with the same pixel point coordinates; if the size of the image is changed according to a certain rule in the image processing process of obtaining the chromaticity component binary image and the gray component gradient image, the pixel points in the gray component gradient image corresponding to the pixel points in the chromaticity component binary image are the pixel points of which the pixel point coordinates correspond to the rule.
If there are multiple images of the chrominance component image, each corresponding to a different chrominance component, then there are multiple chrominance component gradient maps and chrominance component binary maps, each corresponding to a different chrominance component. In this case, the first pixel point is: and the pixel points in the gray component gradient image correspond to the pixel points of which the pixel values in any one chromaticity component binary image are the second preset pixel values. For a pixel point in the gray component gradient image, and among the pixel points in the plurality of corresponding chrominance component binary images, as long as the pixel value of one pixel point is the second preset pixel value, the pixel point in the gray component gradient image is determined as the first preset pixel value.
For example: in the gray component gradient map, the first preset pixel value may be 0, and the color represented by it may be black; in the Cb component binary image and the Cr component binary image, the second predetermined pixel value may be 1, and the color represented by the second predetermined pixel value may be white, where the Cb component binary image and the Cr component binary image are both chrominance component binary images, and the size of the images is the same as the grayscale component gradient image; therefore, in this embodiment, the difference operation performed on the grayscale component gradient map and the chrominance component gradient map may include the following steps:
step E1: respectively determining the coordinates of a point with a pixel value of 1 in the Cb component binary image and the Cr component binary image as a first coordinate and a second coordinate;
step E2: in the gray component gradient map, the pixel value of the pixel point whose coordinates are the first coordinate and the second coordinate is determined to be 0.
In the implementation mode, pixel points representing the background in the chrominance component gradient map are selected through binarization, and the pixel values of the pixel points corresponding to the pixel points in the gray component gradient map are determined to be pixel values representing low gradients, so that difference operation between the gray component gradient map and the chrominance component gradient map is completed.
Based on the same inventive concept, according to the image recognition method provided by the above embodiment of the present invention, correspondingly, an embodiment of the present invention further provides an image recognition apparatus, a schematic structural diagram of which is shown in fig. 4, and the image recognition apparatus specifically includes:
the gradualization calculation module 401 is configured to perform morphological gradualization calculation on an image to be identified to obtain a first gradient map;
a region determining module 402, configured to determine, in the first gradient map, a region corresponding to an image region where a character in an image to be recognized is located, as a first image region;
a number determination module 403, configured to determine the number of characters in the first image area;
a grouping manner determining module 404, configured to determine a first grouping manner of the characters in the first image region based on the number of the characters;
a region obtaining module 405, configured to perform character segmentation on the first image region based on the first grouping manner to obtain a single character region;
a recognition result obtaining module 406, configured to perform character recognition on each single-character region, so as to obtain a character recognition result of the image to be recognized.
The image recognition device provided by the embodiment of the invention can obtain the gradient map of the image, determine the area where the characters are located in the image, determine the number of the characters in the image area and the first grouping mode of the characters, then perform character segmentation on the area to obtain a single character area, and finally perform character recognition on each single character area to obtain the recognition result of the image. In the scheme provided by the embodiment of the invention, the card number region is determined by using the vertical projection of the binary image, the character segmentation is performed by using the horizontal projection of the binary image, the character region detection, the character length detection and the character grouping mode detection are performed on the morphologically graded image, the character is segmented according to the determined character grouping mode, and then the character is identified, so that the identification error caused by the positioning error of the character region can be avoided, the character segmentation is performed after the character number and the character grouping mode are determined in sequence, and the character segmentation error can be avoided, so that the anti-interference performance and the accuracy of the image identification are improved.
In an implementation manner of the present invention, the apparatus further includes:
and the result verification module is used for verifying whether the character recognition result is a valid recognition result or not after the recognition result obtaining module obtains the character recognition result of the image to be recognized, so as to obtain a verification result.
In this implementation, it is verified whether the character recognition result is a valid recognition result. The characters in the recognized image have preset rules, the recognition result which can be verified through the rules is determined as an effective recognition result, the recognition result which does not accord with the rules is eliminated, and the accuracy of the finally obtained recognition result can be improved.
In an implementation manner of the present invention, the identification result obtaining module 406 includes:
the recognition result obtaining sub-module is used for inputting the obtained single character areas into a character recognition model for character recognition, obtaining the character recognition result of each character area as the first type recognition result of each character area, wherein the character recognition model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a first sample character area in advance to obtain a model used for detecting characters contained in the area, wherein the first sample character area is as follows: the first sample gradient map represents a region where one character is located, and the first sample gradient map is as follows: performing morphological gradient calculation on the first sample image to obtain an image;
and the recognition result determining submodule is used for determining the character recognition result of each single character area based on the first type of recognition result.
In the implementation mode, each character area is input to the character recognition model for character recognition, and the character recognition result of each character area is obtained. In the implementation mode, characters are not recognized in a low-accuracy mode of a graphic algorithm, but a convolutional neural network model trained by a large number of samples is used for detecting a single-character region. The convolutional neural network is trained by using a large number of samples, so that the convolutional neural network can learn the characteristics of characters under various backgrounds, and the character areas subjected to morphological gradient calculation are used for training the convolutional neural network, and the edges in the image content can be highlighted by the morphological gradient calculation, so that the trained convolutional neural network model can effectively identify the characters of the complex background in the image, and the character identification accuracy can be improved.
In one implementation manner of the present invention, the recognition result determining sub-module includes:
a candidate region determining unit, configured to determine, as a candidate region of each single-character region, a region corresponding to a predetermined number of pixels shifted in a predetermined direction in the first image region;
a judgment result obtaining unit, configured to input each obtained candidate region into a character judgment model, judge whether each candidate region is a region including a character, and obtain a character judgment result of each candidate region, where the character judgment model is: the method comprises the following steps of training a convolutional neural network model by adopting a second sample character area in advance to obtain a model for judging whether the area contains characters, wherein the second sample character area is as follows: the second sample gradient graph represents a region where a character is located or a region where a non-character is located, and the second sample gradient graph is as follows: performing morphological gradient calculation on the second sample image to obtain an image;
a correction region determining unit configured to determine, as a correction region of each candidate region, a candidate region having the highest confidence in the respective candidate regions based on the obtained character determination result of each candidate region;
a recognition result obtaining unit, configured to input the correction region of each single-character region to the character recognition model for character recognition, and obtain a character recognition result of the correction region of each single-character region as a second-type recognition result of each single-character region;
and the result determining unit is used for determining the recognition result with the highest confidence degree in the first type recognition result and the second type recognition result of each single character area as the character recognition result of the character area.
In this implementation, the recognition result with the highest confidence level among the first type recognition result and the second type recognition result of each character region is determined as the final recognition result of the character region, so that the accuracy of character recognition can be improved. And the accuracy of character recognition can be further improved by inputting each candidate area into the character judgment model, outputting the character judgment result of each candidate area and taking the candidate area with the highest confidence level in the character judgment result as a correction area. In addition, in this implementation, the morphologically graded image is detected using a convolutional neural network model trained with a large number of samples. The neural network is trained by using a second sample character area of a second sample gradient image obtained by performing morphological gradient calculation on the second sample image as a sample, so that the anti-interference performance of the character judgment model is enhanced, the model can effectively judge whether characters exist in a complex background in the image, and the character recognition accuracy can be improved.
In an implementation manner of the present invention, the quantity determining module 403 includes:
a detection result obtaining submodule, configured to input each pixel row of the first image area into a character number detection model respectively, detect the number of characters to which pixel points in each pixel row belong, and obtain a detection result corresponding to each pixel row, where the character number detection model is: the method includes the steps that a preset neural network model is trained in advance by using the labeled number of characters to which pixel points belong in each pixel row and each pixel row in a third sample gradient graph, and the neural network model is used for detecting the number of characters to which the pixel points belong in the pixel rows, wherein the third sample gradient graph is as follows: performing morphological gradient calculation on the third sample image to obtain a gradient map;
and the number obtaining submodule is used for obtaining the number of the characters in the first image area based on the obtained detection result.
In the implementation mode, the pixel rows of the first image area are input into a neural network model obtained through pre-training, and then the character number of the characters in the first image area is obtained based on the output of the neural network model. In this implementation, the first image region is detected using a neural network model trained with a large number of samples. The preset difference characteristics of different character quantities are used as samples to train the neural network, so that the model can effectively distinguish the character quantities, the character quantities can be accurately obtained, and the accuracy of a final recognition result can be improved.
In an implementation manner of the present invention, the grouping manner determining module 404 includes:
a model determining sub-module, configured to determine a grouping manner detection model corresponding to the obtained number of characters and used for detecting a character grouping manner in the image, where the grouping manner detection model is: the method includes the steps that a preset neural network model is trained in advance by using a labeling grouping mode of characters to which pixel points belong in each pixel row and each pixel row in a fourth sample gradient graph, and the neural network model is used for detecting the grouping mode of the characters of the pixel points in the pixel rows, wherein the fourth sample gradient graph is as follows: performing morphological gradient calculation on the fourth sample image to obtain a gradient map;
the first probability obtaining submodule is used for respectively inputting each pixel row of the first image area into the grouping mode detection model to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode;
the first sum value operator module is used for calculating the probability and the value of the character to which the grouping mode of the pixel point in each pixel row of the first image area belongs to a preset grouping mode aiming at each grouping mode;
and the grouping mode determining submodule is used for determining the grouping mode corresponding to the maximum sum value as the first grouping mode of the characters in the first image area.
In the implementation mode, a corresponding grouping mode detection model is determined according to the number of characters, then pixel rows of a first image area are input into a neural network model obtained through pre-training, the probability that the grouping mode corresponding to each pixel row is a preset grouping mode is obtained, then the probability sum value is calculated, and the preset grouping mode corresponding to the maximum sum value is determined as the grouping mode. In this implementation, the first image region with the determined number of characters is detected using a neural network model trained with a large number of samples. The preset difference characteristics of different grouping modes are used as samples to train the neural network, so that the model can effectively distinguish different grouping modes, can deal with the situation that the same number of characters has different grouping modes, and can accurately determine the grouping modes, thereby improving the accuracy of the final recognition result.
In an implementation manner of the present invention, the region obtaining module 405 includes:
the number counting submodule is used for counting the number of character pixel points in each pixel row of the first image area, wherein the character pixel points are as follows: pixel points belonging to a character;
the distribution obtaining submodule is used for obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being the first grouping mode, wherein the character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different;
a distribution determining submodule, configured to determine a first estimated quantity distribution with a smallest difference from the first estimated quantity distribution, where the first distribution is: the number distribution of the character pixel points determined by the counted number;
and the region obtaining submodule is used for carrying out character segmentation on the first image region according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character region.
In this implementation, the number of character pixel points in the pixel column of the image to be segmented may be counted first, then the estimated number distribution of the character pixel points in each character arrangement is obtained, then the estimated number distribution with the smallest difference from the counted number composition distribution in the estimated number distribution is determined, and finally the character segmentation is performed on the image to be segmented according to the character arrangement corresponding to the estimated number distribution. In the implementation mode, the character features existing in the image are converted into character pixel point quantity distribution data, the character pixel point quantity distribution converted from the image to be segmented is compared with the pre-estimated quantity distribution corresponding to different character segmentation parameters, the character segmentation parameter with the minimum difference is determined, and compared with the character segmentation directly according to the default segmentation parameter, the character segmentation accuracy is improved.
In an implementation manner of the present invention, the area determining module 402 includes:
a second probability obtaining submodule, configured to input each pixel row of the first gradient map into a region detection model, to obtain a first probability that a pixel row corresponding to each pixel row in the image to be recognized is located in an image region containing characters, where the region detection model is: pre-training a preset neural network model by using each pixel row in a fifth sample gradient map to obtain a two-class neural network model, wherein the fifth sample gradient map is as follows: performing morphological gradient calculation on the fifth sample image to obtain a gradient map;
the second sum value operator module is used for calculating the sum value of first probabilities of each continuous first preset number of pixel rows in the first gradient map;
and the region determining submodule is used for determining a region corresponding to a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be identified as a first image region.
In the implementation mode, the pixel rows of the first gradient image are input into a pre-trained binary neural network model to obtain the probability that the pixel rows in the first gradient image are located in the image area containing the characters, then the probability and the value of each continuous preset number of pixel rows are calculated, and then the area where the continuous preset number of pixel rows with the maximum probability and value are located is determined as the first image area containing the characters. In this implementation, the first gradient map is detected using a neural network model trained with a large number of samples. The difference characteristics of the characters and the background patterns are used as samples to train the neural network, so that the model can effectively distinguish the characters to be recognized from the background patterns, and the accuracy of the determined first image area containing the characters is improved.
In one implementation manner of the present invention, the gradiometric module 401 includes:
the image obtaining submodule is used for obtaining a gray component image and a chrominance component image of the image to be identified;
the first gradient map obtaining submodule is used for performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient map and a chrominance component gradient map;
and the second gradient map obtaining submodule is used for carrying out difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
In the implementation mode, the image to be recognized is divided into a gray component and a chrominance component, morphological gradient calculation is respectively carried out, and difference operation is carried out on the two obtained gradient images. The gradient image obtained by morphological gradient reflects the pattern edge in the image, and the color of the content to be identified is not rich enough, but the background pattern is rich in color, so that the method can weaken the interference of the background pattern on the determination of the first image area containing characters, character segmentation and single character identification, and improve the accuracy of image identification.
In one implementation manner of the present invention, the second gradient map obtaining sub-module includes:
the image obtaining unit is used for carrying out binarization processing on the chrominance component gradient image to obtain a chrominance component binary image;
a gradient map obtaining unit, configured to determine that a pixel value of a first pixel in the grayscale component gradient map is a first preset pixel value, so as to obtain a first gradient map, where the first preset pixel value is: the represented gradient value is smaller than the pixel value of the preset threshold, and the first pixel point is as follows: pixel points in the gray component gradient map corresponding to pixel points in the chromaticity component binary image whose pixel values are second preset pixel values, where the second preset pixel values are: and the pixel values of the background pixel points in the chromaticity component binary image.
In the implementation mode, pixel points representing the background in the chrominance component gradient map are selected through binarization, and the pixel values of the pixel points corresponding to the pixel points in the gray component gradient map are determined to be pixel values representing low gradients, so that difference operation between the gray component gradient map and the chrominance component gradient map is completed.
Based on the same inventive concept, according to the image recognition method provided by the above embodiment of the present invention, correspondingly, the embodiment of the present invention further provides an electronic device, as shown in fig. 5, comprising a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501 is configured to implement the steps of any of the image recognition methods in the above embodiments when executing the program stored in the memory 503.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
The image recognition electronic device provided by the embodiment of the invention can obtain the gradient map of the image, determine the region where the characters are located in the image, determine the number of the characters in the image region and the first grouping mode of the characters, perform character segmentation on the region to obtain a single character region, and finally perform character recognition on each single character region to obtain the recognition result of the image. In the scheme provided by the embodiment of the invention, the card number region is determined by using the vertical projection of the binary image, the character segmentation is performed by using the horizontal projection of the binary image, the character region detection, the character length detection and the character grouping mode detection are performed on the morphologically graded image, the character is segmented according to the determined character grouping mode, and then the character is identified, so that the identification error caused by the positioning error of the character region can be avoided, the character segmentation is performed after the character number and the character grouping mode are determined in sequence, and the character segmentation error can be avoided, so that the anti-interference performance and the accuracy of the image identification are improved.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which has instructions stored therein, and when the instructions are executed on a computer, the instructions cause the computer to perform the steps of any of the image recognition methods in the above embodiments.
In a further embodiment, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image recognition methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to them, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (19)
1. An image recognition method, comprising:
performing morphological gradient calculation on an image to be recognized to obtain a first gradient map;
determining a region corresponding to an image region where characters in an image to be recognized are located in the first gradient image as a first image region;
determining the number of characters in the first image area;
determining a first grouping of characters within the first image region based on the number of characters;
performing character segmentation on the first image area based on the first grouping mode to obtain a single character area;
carrying out character recognition on each single character area, and further obtaining a character recognition result of the image to be recognized;
wherein the determining the number of characters of the characters in the first image area comprises:
inputting each pixel row of the first image area into a character number detection model respectively to detect the number of characters to which pixel points belong in each pixel row, and obtaining a detection result corresponding to each pixel row, wherein the character number detection model is as follows: the method includes the steps that a preset neural network model is trained in advance by using the labeled number of characters to which pixel points belong in each pixel row and each pixel row in a third sample gradient graph, and the neural network model is used for detecting the number of characters to which the pixel points belong in the pixel rows, wherein the third sample gradient graph is as follows: performing morphological gradient calculation on the third sample image to obtain a gradient map;
based on the obtained detection result, the number of characters of the characters in the first image area is obtained.
2. The method according to claim 1, wherein after the obtaining of the character recognition result of the image to be recognized, the method further comprises:
and verifying whether the character recognition result is a valid recognition result or not to obtain a verification result.
3. The method of claim 1, wherein the performing character recognition on each single-character region to obtain a character recognition result of the image to be recognized comprises:
inputting each obtained single character region into a character recognition model for character recognition, obtaining a character recognition result of each character region as a first type recognition result of each character region, wherein the character recognition model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a first sample character area in advance to obtain a model used for detecting characters contained in the area, wherein the first sample character area is as follows: the first sample gradient map represents a region where one character is located, and the first sample gradient map is as follows: performing morphological gradient calculation on the first sample image to obtain an image;
and determining character recognition results of the single character areas based on the first type recognition results.
4. The method of claim 3, wherein determining character recognition results for respective single-character regions based on the first type of recognition results comprises:
determining a region corresponding to a preset number of pixel points of each single character region in the first image region along a preset direction, and taking the region as a candidate region of each single character region;
inputting each obtained candidate region into a character judgment model, judging whether each candidate region is a region containing characters or not, and obtaining a character judgment result of each candidate region, wherein the character judgment model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a second sample character area in advance to obtain a model for judging whether the area contains characters, wherein the second sample character area is as follows: the second sample gradient graph represents a region where a character is located or a region where a non-character is located, and the second sample gradient graph is as follows: performing morphological gradient calculation on the second sample image to obtain an image;
determining a candidate region with the highest confidence coefficient in the candidate regions as a correction region of each candidate region based on the obtained character judgment result of each candidate region;
inputting the correction area of each single character area into the character recognition model for character recognition, and obtaining the character recognition result of the correction area of each single character area as the second type recognition result of each single character area;
and determining the recognition result with the highest confidence level in the first type recognition result and the second type recognition result of each single character region as the character recognition result of the character region.
5. The method of claim 1, wherein determining a first grouping of characters within the first image region based on the number of characters comprises:
determining a grouping mode detection model corresponding to the obtained number of the characters and used for detecting character grouping modes in the image, wherein the grouping mode detection model is as follows: the method includes the steps that a preset neural network model is trained in advance by using a labeling grouping mode of characters to which pixel points belong in each pixel row and each pixel row in a fourth sample gradient graph, and the neural network model is used for detecting the grouping mode of the characters of the pixel points in the pixel rows, wherein the fourth sample gradient graph is as follows: performing morphological gradient calculation on the fourth sample image to obtain a gradient map;
inputting each pixel row of the first image area into the grouping mode detection model respectively to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode;
calculating the probability and the value of the character to which the pixel points belong in each pixel row of the first image region in the preset grouping mode aiming at each grouping mode;
and determining the grouping mode corresponding to the maximum sum value as the first grouping mode of the characters in the first image area.
6. The method according to claim 1, wherein said character segmenting said first image region based on said first grouping manner to obtain a single character region comprises:
counting the number of character pixel points in each pixel row of the first image area, wherein the character pixel points are as follows: pixel points belonging to a character;
obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being the first grouping mode, wherein the character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different;
determining a first estimated quantity distribution with the minimum difference degree between the obtained first estimated quantity distribution and the first distribution, wherein the first distribution is as follows: the number distribution of the character pixel points determined by the counted number;
and performing character segmentation on the first image area according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character area.
7. The method according to claim 1, wherein the determining, in the first gradient map, a region corresponding to an image region where a character in the image to be recognized is located as a first image region comprises:
inputting each pixel row of the first gradient map into a region detection model respectively to obtain a first probability that the pixel row corresponding to each pixel row in the image to be recognized is located in an image region containing characters, wherein the region detection model is as follows: pre-training a preset neural network model by using each pixel row in a fifth sample gradient map to obtain a two-class neural network model, wherein the fifth sample gradient map is as follows: performing morphological gradient calculation on the fifth sample image to obtain a gradient map;
calculating a sum of first probabilities of each of a first preset number of consecutive pixel rows in the first gradient map;
and determining a corresponding area of a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be identified as a first image area.
8. The method according to any one of claims 1 to 7, wherein the performing morphological gradient calculation on the image to be identified to obtain a first gradient map comprises:
obtaining a gray component image and a chrominance component image of an image to be identified;
performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient image and a chrominance component gradient image;
and performing difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
9. The method of claim 8, wherein the performing a difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map comprises:
carrying out binarization processing on the chrominance component gradient map to obtain a chrominance component binary map;
determining a pixel value of a first pixel point in the gray component gradient map as a first preset pixel value to obtain a first gradient map, wherein the first preset pixel value is as follows: the represented gradient value is smaller than the pixel value of the preset threshold, and the first pixel point is as follows: pixel points in the gray component gradient map corresponding to pixel points in the chromaticity component binary image whose pixel values are second preset pixel values, where the second preset pixel values are: and the pixel values of the background pixel points in the chromaticity component binary image.
10. An image recognition apparatus, comprising:
the gradualization calculation module is used for performing morphological gradualization calculation on the image to be identified to obtain a first gradient map;
the region determining module is used for determining a region corresponding to an image region where characters in an image to be recognized are located in the first gradient map as a first image region;
the number determining module is used for determining the number of the characters in the first image area;
the grouping mode determining module is used for determining a first grouping mode of characters in the first image area based on the number of the characters;
the region obtaining module is used for carrying out character segmentation on the first image region based on the first grouping mode to obtain a single character region;
the recognition result obtaining module is used for carrying out character recognition on each single character area so as to obtain a character recognition result of the image to be recognized;
wherein the number determination module comprises:
a detection result obtaining submodule, configured to input each pixel row of the first image area into a character number detection model respectively, detect the number of characters to which pixel points in each pixel row belong, and obtain a detection result corresponding to each pixel row, where the character number detection model is: the method includes the steps that a preset neural network model is trained in advance by using the labeled number of characters to which pixel points belong in each pixel row and each pixel row in a third sample gradient graph, and the neural network model is used for detecting the number of characters to which the pixel points belong in the pixel rows, wherein the third sample gradient graph is as follows: performing morphological gradient calculation on the third sample image to obtain a gradient map;
and the number obtaining submodule is used for obtaining the number of the characters in the first image area based on the obtained detection result.
11. The apparatus of claim 10, further comprising:
and the result verification module is used for verifying whether the character recognition result is a valid recognition result or not after the recognition result obtaining module obtains the character recognition result of the image to be recognized, so as to obtain a verification result.
12. The apparatus of claim 10, wherein the recognition result obtaining module comprises:
the recognition result obtaining sub-module is used for inputting the obtained single character areas into a character recognition model for character recognition, obtaining the character recognition result of each character area as the first type recognition result of each character area, wherein the character recognition model is as follows: the method comprises the following steps of training a convolutional neural network model by adopting a first sample character area in advance to obtain a model used for detecting characters contained in the area, wherein the first sample character area is as follows: the first sample gradient map represents a region where one character is located, and the first sample gradient map is as follows: performing morphological gradient calculation on the first sample image to obtain an image;
and the recognition result determining submodule is used for determining the character recognition result of each single character area based on the first type of recognition result.
13. The apparatus of claim 12, wherein the recognition result determination sub-module comprises:
a candidate region determining unit, configured to determine, as a candidate region of each single-character region, a region corresponding to a predetermined number of pixels shifted in a predetermined direction in the first image region;
a judgment result obtaining unit, configured to input each obtained candidate region into a character judgment model, judge whether each candidate region is a region including a character, and obtain a character judgment result of each candidate region, where the character judgment model is: the method comprises the following steps of training a convolutional neural network model by adopting a second sample character area in advance to obtain a model for judging whether the area contains characters, wherein the second sample character area is as follows: the second sample gradient graph represents a region where a character is located or a region where a non-character is located, and the second sample gradient graph is as follows: performing morphological gradient calculation on the second sample image to obtain an image;
a correction region determining unit configured to determine, as a correction region of each candidate region, a candidate region having the highest confidence in the respective candidate regions based on the obtained character determination result of each candidate region;
a recognition result obtaining unit, configured to input the correction region of each single-character region to the character recognition model for character recognition, and obtain a character recognition result of the correction region of each single-character region as a second-type recognition result of each single-character region;
and the result determining unit is used for determining the recognition result with the highest confidence degree in the first type recognition result and the second type recognition result of each single character area as the character recognition result of the character area.
14. The apparatus of claim 10, wherein the grouping determination module comprises:
a model determining sub-module, configured to determine a grouping manner detection model corresponding to the obtained number of characters and used for detecting a character grouping manner in the image, where the grouping manner detection model is: the method includes the steps that a preset neural network model is trained in advance by using a labeling grouping mode of characters to which pixel points belong in each pixel row and each pixel row in a fourth sample gradient graph, and the neural network model is used for detecting the grouping mode of the characters of the pixel points in the pixel rows, wherein the fourth sample gradient graph is as follows: performing morphological gradient calculation on the fourth sample image to obtain a gradient map;
the first probability obtaining submodule is used for respectively inputting each pixel row of the first image area into the grouping mode detection model to detect the grouping mode of the characters to which the pixel points belong in each pixel row, and obtaining the probability that the grouping mode of the characters to which the pixel points belong in each pixel row is a preset grouping mode;
the first sum value operator module is used for calculating the probability and the value of the character to which the grouping mode of the pixel point in each pixel row of the first image area belongs to a preset grouping mode aiming at each grouping mode;
and the grouping mode determining submodule is used for determining the grouping mode corresponding to the maximum sum value as the first grouping mode of the characters in the first image area.
15. The apparatus of claim 10, wherein the region obtaining module comprises:
the number counting submodule is used for counting the number of character pixel points in each pixel row of the first image area, wherein the character pixel points are as follows: pixel points belonging to a character;
the distribution obtaining submodule is used for obtaining a first estimated quantity distribution of character pixel points in each character arrangement with the grouping mode being the first grouping mode, wherein the character width of the characters in each character arrangement is a preset width, the character group interval is a preset interval, and the character widths and/or the character group intervals in different character arrangements are different;
a distribution determining submodule, configured to determine a first estimated quantity distribution with a smallest difference from the first estimated quantity distribution, where the first distribution is: the number distribution of the character pixel points determined by the counted number;
and the region obtaining submodule is used for carrying out character segmentation on the first image region according to the character arrangement corresponding to the determined first estimated quantity distribution to obtain a single character region.
16. The apparatus of claim 10, wherein the region determining module comprises:
a second probability obtaining submodule, configured to input each pixel row of the first gradient map into a region detection model, to obtain a first probability that a pixel row corresponding to each pixel row in the image to be recognized is located in an image region containing characters, where the region detection model is: pre-training a preset neural network model by using each pixel row in a fifth sample gradient map to obtain a two-class neural network model, wherein the fifth sample gradient map is as follows: performing morphological gradient calculation on the fifth sample image to obtain a gradient map;
the second sum value operator module is used for calculating the sum value of first probabilities of each continuous first preset number of pixel rows in the first gradient map;
and the region determining submodule is used for determining a region corresponding to a first preset number of pixel rows corresponding to the maximum sum of the obtained first probabilities in the image to be identified as a first image region.
17. The apparatus of any one of claims 10-16, wherein the graduating computation module comprises:
the image obtaining submodule is used for obtaining a gray component image and a chrominance component image of the image to be identified;
the first gradient map obtaining submodule is used for performing morphological gradient calculation on the gray component image and the chrominance component image respectively to obtain a gray component gradient map and a chrominance component gradient map;
and the second gradient map obtaining submodule is used for carrying out difference operation on the gray component gradient map and the chrominance component gradient map to obtain a first gradient map.
18. The apparatus of claim 17, wherein the second gradient map obtaining sub-module comprises:
the image obtaining unit is used for carrying out binarization processing on the chrominance component gradient image to obtain a chrominance component binary image;
a gradient map obtaining unit, configured to determine that a pixel value of a first pixel in the grayscale component gradient map is a first preset pixel value, so as to obtain a first gradient map, where the first preset pixel value is: the represented gradient value is smaller than the pixel value of the preset threshold, and the first pixel point is as follows: pixel points in the gray component gradient map corresponding to pixel points in the chromaticity component binary image whose pixel values are second preset pixel values, where the second preset pixel values are: and the pixel values of the background pixel points in the chromaticity component binary image.
19. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-9 when executing a program stored in the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811563632.3A CN109740606B (en) | 2018-12-20 | 2018-12-20 | Image identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811563632.3A CN109740606B (en) | 2018-12-20 | 2018-12-20 | Image identification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109740606A CN109740606A (en) | 2019-05-10 |
CN109740606B true CN109740606B (en) | 2021-02-05 |
Family
ID=66360716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811563632.3A Active CN109740606B (en) | 2018-12-20 | 2018-12-20 | Image identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109740606B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348449B (en) * | 2019-07-10 | 2023-04-18 | 电子科技大学 | Neural network-based identification card character recognition method |
CN111291794A (en) * | 2020-01-21 | 2020-06-16 | 上海眼控科技股份有限公司 | Character recognition method, character recognition device, computer equipment and computer-readable storage medium |
CN111340040B (en) * | 2020-02-26 | 2023-09-12 | 五八有限公司 | Paper character recognition method and device, electronic equipment and storage medium |
CN111582259B (en) * | 2020-04-10 | 2024-04-16 | 支付宝实验室(新加坡)有限公司 | Machine-readable code identification method, device, electronic equipment and storage medium |
CN111583159B (en) * | 2020-05-29 | 2024-01-05 | 北京金山云网络技术有限公司 | Image complement method and device and electronic equipment |
CN113870154A (en) * | 2020-06-30 | 2021-12-31 | 广州慧睿思通人工智能技术有限公司 | Image data processing method, image data processing device, computer equipment and storage medium |
CN112215236B (en) * | 2020-10-21 | 2024-04-16 | 科大讯飞股份有限公司 | Text recognition method, device, electronic equipment and storage medium |
CN112381057A (en) * | 2020-12-03 | 2021-02-19 | 上海芯翌智能科技有限公司 | Handwritten character recognition method and device, storage medium and terminal |
CN113989795B (en) * | 2021-12-02 | 2024-06-21 | 安徽翼迈科技股份有限公司 | Water meter bubble early warning method based on template graph traversal |
CN117557896A (en) * | 2022-08-05 | 2024-02-13 | 顺丰科技有限公司 | Method and device for determining quantity of fast objects, electronic equipment and storage medium |
CN118334639B (en) * | 2024-06-12 | 2024-08-23 | 深圳市瑞意博医疗设备有限公司 | Medicine rechecking method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787194A (en) * | 1994-11-08 | 1998-07-28 | International Business Machines Corporation | System and method for image processing using segmentation of images and classification and merging of image segments using a cost function |
CN101408941A (en) * | 2008-10-20 | 2009-04-15 | 中国科学院遥感应用研究所 | Method for multi-dimension segmentation of remote sensing image and representation of segmentation result hierarchical structure |
CN104616009A (en) * | 2015-02-13 | 2015-05-13 | 广州广电运通金融电子股份有限公司 | Character cutting and recognizing method |
CN105354574A (en) * | 2015-12-04 | 2016-02-24 | 山东博昂信息科技有限公司 | Vehicle number recognition method and device |
CN107305630A (en) * | 2016-04-25 | 2017-10-31 | 腾讯科技(深圳)有限公司 | Text sequence recognition methods and device |
CN107832756A (en) * | 2017-10-24 | 2018-03-23 | 讯飞智元信息科技有限公司 | Express delivery list information extracting method and device, storage medium, electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009040922A1 (en) * | 2007-09-27 | 2009-04-02 | Glory Ltd. | Paper sheet processor |
CN105426891B (en) * | 2015-12-14 | 2019-04-09 | 广东安居宝数码科技股份有限公司 | Registration number character dividing method and its system based on image |
-
2018
- 2018-12-20 CN CN201811563632.3A patent/CN109740606B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787194A (en) * | 1994-11-08 | 1998-07-28 | International Business Machines Corporation | System and method for image processing using segmentation of images and classification and merging of image segments using a cost function |
CN101408941A (en) * | 2008-10-20 | 2009-04-15 | 中国科学院遥感应用研究所 | Method for multi-dimension segmentation of remote sensing image and representation of segmentation result hierarchical structure |
CN104616009A (en) * | 2015-02-13 | 2015-05-13 | 广州广电运通金融电子股份有限公司 | Character cutting and recognizing method |
CN105354574A (en) * | 2015-12-04 | 2016-02-24 | 山东博昂信息科技有限公司 | Vehicle number recognition method and device |
CN107305630A (en) * | 2016-04-25 | 2017-10-31 | 腾讯科技(深圳)有限公司 | Text sequence recognition methods and device |
CN107832756A (en) * | 2017-10-24 | 2018-03-23 | 讯飞智元信息科技有限公司 | Express delivery list information extracting method and device, storage medium, electronic equipment |
Non-Patent Citations (1)
Title |
---|
一种基于梯度域的彩色图像转灰度图像的方法;章卫祥 等;《影像技术》;20070630;第20-22页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109740606A (en) | 2019-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740606B (en) | Image identification method and device | |
CN108710866B (en) | Chinese character model training method, chinese character recognition method, device, equipment and medium | |
CN110060237B (en) | Fault detection method, device, equipment and system | |
CN110413824B (en) | Retrieval method and device for similar pictures | |
CN109343920B (en) | Image processing method and device, equipment and storage medium thereof | |
CN108399386A (en) | Information extracting method in pie chart and device | |
CN107145829B (en) | Palm vein identification method integrating textural features and scale invariant features | |
CN108197644A (en) | A kind of image-recognizing method and device | |
CN110490190B (en) | Structured image character recognition method and system | |
CN109389110B (en) | Region determination method and device | |
CN115457565A (en) | OCR character recognition method, electronic equipment and storage medium | |
CN111626177A (en) | PCB element identification method and device | |
CN109389115B (en) | Text recognition method, device, storage medium and computer equipment | |
CN110738216A (en) | Medicine identification method based on improved SURF algorithm | |
CN113158895A (en) | Bill identification method and device, electronic equipment and storage medium | |
CN114038004A (en) | Certificate information extraction method, device, equipment and storage medium | |
CN111507337A (en) | License plate recognition method based on hybrid neural network | |
CN110942473A (en) | Moving target tracking detection method based on characteristic point gridding matching | |
JP3228938B2 (en) | Image classification method and apparatus using distribution map | |
CN112364974A (en) | Improved YOLOv3 algorithm based on activation function | |
Singh et al. | Digit recognition system using back propagation neural network | |
CN109726722B (en) | Character segmentation method and device | |
CN112883959B (en) | Identity card integrity detection method, device, equipment and storage medium | |
CN112200789B (en) | Image recognition method and device, electronic equipment and storage medium | |
CN113392455A (en) | House type graph scale detection method and device based on deep learning and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |