CN112085024A

CN112085024A - A method for character recognition on the surface of a tank

Info

Publication number: CN112085024A
Application number: CN202010998223.7A
Authority: CN
Inventors: 罗印升; 周兴杰; 宋伟; 刘亚东; 陈传毅; 曹阳阳
Original assignee: Jiangsu University of Technology
Current assignee: Jiangsu University of Technology
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2020-12-15

Abstract

The invention provides a method for identifying characters on the surface of a tank, which comprises the following steps: acquiring a tank surface image in real time; carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information; outputting the position of the character prediction box, and cutting out a character area; the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm; segmenting text characters of each line through histogram projection; identifying the cut text region by using an improved end-to-end indefinite length text CRNN model; according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally. The method has the advantages of high robustness, high accuracy and high speed, and provides an intelligent character positioning detection scheme for milk powder tank production enterprises.

Description

A method for character recognition on the surface of a tank

技术领域technical field

本发明涉及一种自动化生产线上字符识别领域，特别是一种基于YOLO-V3与改进CRNN的字符识别方法。The invention relates to the field of character recognition on automatic production lines, in particular to a character recognition method based on YOLO-V3 and improved CRNN.

背景技术Background technique

在实际工业生产的过程中，字符识别技术是非常重要的一个方面，字符识别技术已经被成功的应用于工业生产与组装，如许多小型电子元器件、电路板和一些大型零件表面的字符喷码和标签，厂商可通过这些喷码和标签来辨认追踪产品信息。而由于人眼难以做到长时间的高效率检测，某些高温高压等恶劣的作业环境更给人工检测识别带来了困难，因此，采用机器视觉技术进行自动准确的字符定位与识别成为工业生产流程中的一个重要环节。In the process of actual industrial production, character recognition technology is a very important aspect. Character recognition technology has been successfully applied to industrial production and assembly, such as character printing on the surface of many small electronic components, circuit boards and some large parts. And labels, manufacturers can identify and track product information through these codes and labels. Since it is difficult for the human eye to achieve high-efficiency detection for a long time, some harsh operating environments such as high temperature and high pressure have brought difficulties to manual detection and recognition. Therefore, the use of machine vision technology for automatic and accurate character positioning and recognition has become an industrial production process. an important part of the process.

基于轮廓检测和边缘处理的图像处理方法也可以将字符区域提取，但是对每一种类型的图像都需要进行针对性的版面分析，轮廓的选取也受图像背景的影响很大，因而其泛化能力不强。使用深度卷积神经网络对版面中的文本信息进行定位提取，可以对各类证件的文本区域进行定位及提取，使字符识别定位的适用范围更广泛，对复杂背景的图像识别更强。Image processing methods based on contour detection and edge processing can also extract character regions, but each type of image requires targeted layout analysis, and the selection of contours is also greatly affected by the image background, so its generalization The ability is not strong. Using the deep convolutional neural network to locate and extract the text information in the layout can locate and extract the text area of various documents, so that the applicable scope of character recognition and localization is wider, and the image recognition of complex backgrounds is stronger.

当前应用于工业界的OCR识别方法主要存在以下问题：The current OCR identification methods used in the industry mainly have the following problems:

1)利用传统算法提取字符区域、分割字符文本时的适应性较差，对于亮度不均匀的成像、字符粘连、字符模糊等情况难以准确提取；1) Using traditional algorithms to extract character regions and segment character texts has poor adaptability, and it is difficult to accurately extract images with uneven brightness, character sticking, and character blurring;

2)需要针对提取到的字符人工设计提取特征的方式，并且检测新字体的通用性较差。所以提供一种基于YOLO-V3与CRNN的字符定位与识别的方法。2) It is necessary to manually design a feature extraction method for the extracted characters, and the generality of detecting new fonts is poor. Therefore, a method for character location and recognition based on YOLO-V3 and CRNN is provided.

发明内容SUMMARY OF THE INVENTION

为达到上述目的，本发明提供一种罐表面字符识别方法，包括：实时采集罐表面图像；利用预先建立的YOLO-V3神经网络检测模型对实时采集的罐表面图像进行字符定位，得到包括是否含有字符和字符位置信息；输出字符预测框位置，并裁剪出字符区域；裁剪出的字符区域存在与实际文本区域贴合不够紧密的问题，使用文本矫正矫正算法进行矫正；通过直方图投影，分割出每一行的文本字符；对裁剪出来的文本区域使用改进的端到端不定长文本CRNN模型进行识别；根据当时系统情况，如果是在线模式，识别结果上传到数据库；如果是离线模式，识别结果将保存在本地。In order to achieve the above purpose, the present invention provides a method for recognizing characters on the surface of a tank, which includes: collecting an image of the tank surface in real time; using a pre-established YOLO-V3 neural network detection model to perform character positioning on the image of the tank surface collected in real time, and obtaining information including whether it contains Character and character position information; output the position of the character prediction frame, and crop out the character area; the cropped character area is not close enough to the actual text area, use the text correction correction algorithm to correct it; through histogram projection, segment out Text characters of each line; use the improved end-to-end variable-length text CRNN model to identify the cropped text area; according to the system conditions at the time, if it is in online mode, the recognition result will be uploaded to the database; if it is in offline mode, the recognition result will be Save locally.

上述方案中，罐表面字符识别方法还包括：执行机构切换下一个待识别工件。In the above solution, the method for recognizing characters on the surface of the tank further includes: the actuator switches to the next workpiece to be recognized.

上述方案中，罐表面包括：奶粉罐表面，罐头罐表面。In the above solution, the surface of the can includes: the surface of the milk powder can and the surface of the can.

上述方案中，YOLO-V3神经网络检测模型的建立方法包括：In the above scheme, the establishment method of the YOLO-V3 neural network detection model includes:

S1：采集大量奶粉罐表面字符图像；S1: Collect a large number of character images on the surface of milk powder cans;

S2：对奶粉罐表面图像进行预处理；预处理包括对奶粉罐表面图像进行归一化处理：将等比例缩放后的奶粉罐表面图像填入预设像素大小的空白图像中，然后将空白图像上除奶粉罐图像外的像素全部填充为设定颜色；空白图像的像素大小为416*416，空白图像能够覆盖缩放后的奶粉罐图像；所述设定颜色包括灰色；S2: Preprocessing the surface image of the milk powder can; the preprocessing includes normalizing the surface image of the milk powder can: filling the proportionally scaled surface image of the milk powder can into a blank image with a preset pixel size, and then adding the blank image All pixels except the image of the milk powder can are filled with the set color; the pixel size of the blank image is 416*416, and the blank image can cover the zoomed image of the milk powder can; the set color includes gray;

S3：对预处理后的图像进行字符区域标定，得到各图像的字符位置标签；通过人工框选方式对预处理后的奶粉罐图像进行字符区域标定，将各标定区域的左上角点和右下角点坐标和相应区域的字符存储为字符标签；S3: Perform character area calibration on the preprocessed image to obtain character position labels of each image; perform character area calibration on the preprocessed milk powder can image by manual frame selection, and set the upper left corner and lower right corner of each calibration area Point coordinates and characters in the corresponding area are stored as character labels;

S4：将图像及其位置信息标签划分为训练集样本和测试集样本，划分比例为9：1；利用训练集样本和测试机样本，采用计算机图形处理器GPU对预先搭建的YOLO-V3深度神经网络进行训练，得到YOLO-V3神经网络检测模型。S4: Divide the image and its position information labels into training set samples and test set samples, with a division ratio of 9:1; use the training set samples and test machine samples, and use the computer graphics processor GPU to analyze the pre-built YOLO-V3 deep neural network. The network is trained to obtain the YOLO-V3 neural network detection model.

上述方案中，预处理还包括对归一化处理后得到的罐表面字符图像进行数据增强；罐表面字符图像进行数据增强包括：In the above solution, the preprocessing also includes data enhancement on the character images on the tank surface obtained after normalization; the data enhancement on the character images on the tank surface includes:

几何变换类：包括翻转，旋转，裁剪，变形，缩放；Geometric transformation class: including flipping, rotating, cropping, deforming, scaling;

颜色变换类：包括噪声、模糊、颜色变换、擦除、填充；Color transformation class: including noise, blur, color transformation, erase, fill;

根据图片样式模拟生成样本图片。Generate a sample image based on the image style simulation.

上述方案中，对预先搭建的YOLO-V3神经网络进行训练包括：In the above scheme, training the pre-built YOLO-V3 neural network includes:

S1：设置训练参数：将迭代步数epochs设置为10000，将学习率优化器optimizer设置为'adam'，将批训练样本数batch_size设置为64，以及设置9个先验框anchor box的大小；S1: Set the training parameters: set the number of iteration steps epochs to 10000, set the learning rate optimizer optimizer to 'adam', set the number of batch training samples batch_size to 64, and set the size of the 9 a priori box anchor boxes;

S2：训练YOLO-V3神经网络：把训练集样本数据和各样本对应的奶粉罐表面字符数据输入到YOLO-V3神经网络中进行模型训练；S2: Train the YOLO-V3 neural network: input the training set sample data and the corresponding character data on the surface of the milk powder can into the YOLO-V3 neural network for model training;

S3：模型测试：每次训练结束后，利用测试集样本对训练得到的模型进行测试，若模型对测试集样本中的瑕疵检出率超过95％且检测准确率不低于95％，则将最后一次训练得到的模型作为最终YOLO-V3神经网络检测模型；否则将最后一次训练得到的模型作为当前待训练的YOLO-V3神经网络，重复步骤S2-S3，直至得到最终YOLO-V3神经网络检测模型。S3: Model test: After each training, use the test set samples to test the trained model. If the model has a defect detection rate of more than 95% in the test set samples and the detection accuracy is not less than 95%, the model will be tested. The model obtained from the last training is used as the final YOLO-V3 neural network detection model; otherwise, the model obtained from the last training is used as the current YOLO-V3 neural network to be trained, and steps S2-S3 are repeated until the final YOLO-V3 neural network detection model is obtained. Model.

上述方案中，先验框的大小通过k-means聚类方法设置，包括：随机选取9个字符区域矩形框对象作为训练集的聚类中心，根据各对象与各个聚类中心之间的距离，把各对象分配给距离最近的聚类中心，每分配一个样本，聚类中心重新计算一次，重复分配过程，直到没有聚类中心再发生变化，当前9个聚类中心的值即9个先验框的大小。In the above scheme, the size of the prior frame is set by the k-means clustering method, including: randomly selecting 9 character area rectangular frame objects as the clustering center of the training set, and according to the distance between each object and each clustering center, Allocate each object to the nearest cluster center, each time a sample is assigned, the cluster center is recalculated once, and the assignment process is repeated until no cluster center changes. The current values of the nine cluster centers are nine priors. the size of the box.

上述方案中，文本矫正算法包括：In the above scheme, the text correction algorithm includes:

S1：裁剪后的图像进行灰度化处理，字符为白色，背景为色；S1: The cropped image is grayscaled, the characters are white, and the background is colored;

S2：进行高斯模糊，使文本部分能连成一块，再对图像进行阈值处理得到二值图像；S2: Perform Gaussian blurring so that the text parts can be connected together, and then perform threshold processing on the image to obtain a binary image;

S3：使用最小矩形拟合，然后通过矩形面积大小，筛选得到文本区域部分的面积，此部分就是文本最小外接矩形；S3: Use the smallest rectangle to fit, and then filter the area of the text area by the size of the rectangle area, which is the smallest bounding rectangle of the text;

S4：得到最小外接矩形的中心点，对文本区域进行仿射变换来实现校正倾斜文本。S4: Obtain the center point of the minimum circumscribed rectangle, and perform affine transformation on the text area to correct the oblique text.

上述方案中，直方图水平投影算法包括：In the above scheme, the histogram horizontal projection algorithm includes:

S1：对步骤四得到矫正后的图像进行二值化；S1: Binarize the corrected image obtained in step 4;

S2：对二值化后的图像计算水平方向上像素和，绘制出横轴为图像纵轴坐标范围为0～图像宽度，纵轴为像素和的直方图；S2: Calculate the pixel sum in the horizontal direction for the binarized image, and draw a histogram with the horizontal axis as the image and the vertical axis as the coordinate range from 0 to the image width, and the vertical axis as the pixel sum;

S3：找出直方图谷底部分，此部分为分割线处；S3: Find the bottom part of the histogram, which is the dividing line;

S4：根据分割线，分割出每一行文本图像。S4: According to the dividing line, each line of text image is divided.

上述方案中，改进的端到端不定长文本CRNN模型进行识别包括：In the above scheme, the improved end-to-end variable-length text CRNN model for recognition includes:

S1：每一行文本图像灰度化处理，将三通道的图像转为单通道的灰度图；S1: Grayscale processing of each line of text image, converting the three-channel image into a single-channel grayscale image;

S2：长度比例缩放和填充，将输入图像的尺寸固定为32*width，按比例进行缩放或扩充，这样能够与训练的样本相类似；S2: Length scaling and padding, fixing the size of the input image to 32*width, scaling or expanding proportionally, so that it can be similar to the training samples;

S3：数据标签稀疏矩阵转化，对标签矩阵进行处理，将其转化为需要的数据格式；S3: Data label sparse matrix conversion, process the label matrix and convert it into the required data format;

S4：搭建模型，更改结构，将原来的CNN+RNN+CTC转录层的思想，更改为CNN+CTC，主干网络在VGG上做修改设计7层卷积层，5层池化层的小型卷积神经网络，并在中间层的加入两次批量正则化，避免模型梯度弥散，加速收敛；S4: Build the model, change the structure, change the original idea of CNN+RNN+CTC transcription layer to CNN+CTC, and modify the backbone network on VGG to design a 7-layer convolutional layer and a 5-layer pooling layer. Small convolution Neural network, and adding batch regularization twice in the middle layer to avoid model gradient dispersion and accelerate convergence;

S5：数据后处理，将对应真实值的字典数组的下标，做一个映射变换变成实际值。S5: Data post-processing, transform the subscript of the dictionary array corresponding to the real value into the actual value by doing a mapping transformation.

本发明所提供的一种罐表面字符识别方法，利用YOLO-V3与CRNN深度学习方法实现了文本字符实时识别，能快速框选出字符所在的区域，并能够识别。因为运用了深度学习，所以该方法具有鲁棒性高，准确率高，速度快的优点，为奶粉罐生产企业提供智能字符定位的检测方案，解决人工识别效率低、速度慢、人员成本高等问题。The method for recognizing characters on the surface of a tank provided by the present invention realizes real-time recognition of text characters by using the YOLO-V3 and CRNN deep learning method, and can quickly frame and select the area where the characters are located, and can recognize them. Because of the use of deep learning, this method has the advantages of high robustness, high accuracy and fast speed. It provides a detection solution for intelligent character positioning for milk powder can manufacturers, and solves the problems of low manual recognition efficiency, slow speed and high personnel cost. .

附图说明Description of drawings

图1为本发明实施例提供的奶粉罐表面字符识别流程示意图；Fig. 1 is the schematic flow chart of the character recognition process on the surface of the milk powder can provided by the embodiment of the present invention;

图2为本发明实施例提供的YOLO-V3训练过程图；Fig. 2 is a YOLO-V3 training process diagram provided by an embodiment of the present invention;

图3为本发明实施例提供的改进CRNN部分网络结构图。FIG. 3 is a partial network structure diagram of the improved CRNN provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了能够更加详尽地了解本发明的特点与技术内容，下面结合附图对本发明的实现进行详细阐述，所附附图仅供参考说明之用，并非用来限定本发明。In order to be able to understand the features and technical content of the present invention in more detail, the implementation of the present invention is described in detail below with reference to the accompanying drawings, which are for reference only and are not intended to limit the present invention.

本发明提供一种罐表面字符识别方法，如图1所示，图1为本发明实施例提供的奶粉罐表面字符识别流程示意图；罐表面字符识别包括：实时采集罐表面图像；数据采集阶段应当利用摄像机等尽可能多角度的拍摄奶粉罐表面图像，以涵盖各种角度字符以及不同字符使得训练得到的模型检出率和准确率都比较高。The present invention provides a method for recognizing characters on the surface of a can, as shown in FIG. 1 , which is a schematic diagram of a process for recognizing characters on the surface of a milk powder can provided by an embodiment of the present invention; the character recognition on the surface of the can includes: collecting images of the surface of the can in real time; The surface images of the milk powder cans are taken from as many angles as possible, such as cameras, to cover characters from various angles and different characters, so that the detection rate and accuracy of the trained model are relatively high.

利用预先建立的YOLO-V3神经网络检测模型对实时采集的罐表面图像进行字符定位，得到包括是否含有字符和字符位置信息；输出字符预测框位置，并裁剪出字符区域；裁剪出的字符区域存在与实际文本区域贴合不够紧密的问题，使用文本矫正矫正算法进行矫正；通过直方图投影，分割出每一行的文本字符；对裁剪出来的文本区域使用改进的端到端不定长文本CRNN模型进行识别；根据当时系统情况，如果是在线模式，识别结果上传到数据库；如果是离线模式，识别结果将保存在本地。Use the pre-established YOLO-V3 neural network detection model to locate the characters on the surface image of the tank collected in real time, and obtain the information including whether it contains characters and the position of the characters; output the position of the character prediction frame, and crop out the character area; the cropped character area exists The problem of not fitting closely with the actual text area is corrected by using a text correction correction algorithm; the text characters of each line are segmented through histogram projection; the cropped text area is processed using an improved end-to-end variable-length text CRNN model. Recognition; according to the system conditions at the time, if it is in online mode, the recognition result will be uploaded to the database; if it is in offline mode, the recognition result will be saved locally.

其中，罐表面字符识别方法还包括：执行机构切换下一个待识别工件。Wherein, the method for recognizing characters on the surface of the tank further includes: the actuator switches over the next workpiece to be recognized.

在本发明提供的实施例中，罐表面可以包括：奶粉罐表面，罐头罐表面，一起其他罐装包装的表面。In the embodiment provided by the present invention, the surface of the can includes: the surface of the milk powder can, the surface of the can, and the surface of other cans.

在本发明提供的实施例中，YOLO-V3神经网络检测模型的建立方法包括：In the embodiment provided by the present invention, the establishment method of the YOLO-V3 neural network detection model includes:

S2：对奶粉罐表面图像进行预处理；预处理包括对奶粉罐表面图像进行归一化处理：将等比例缩放后的奶粉罐表面图像填入预设像素大小的空白图像中，然后将空白图像上除奶粉罐图像外的像素全部填充为设定颜色；空白图像的像素大小为416*416，空白图像能够覆盖缩放后的奶粉罐图像；归一化过程按图像原有长宽比缩放到相应大小，填入到416*416空白图像中间，其他像素全部填为白色，以尽可能保持图像原有纹理特征，并通过翻转、平移策略对归一化后的瑕疵图像进行数据增强，设定颜色包括灰色；S2: Preprocessing the surface image of the milk powder can; the preprocessing includes normalizing the surface image of the milk powder can: filling the proportionally scaled surface image of the milk powder can into a blank image with a preset pixel size, and then adding the blank image All pixels except the image of the milk powder can are filled with the set color; the pixel size of the blank image is 416*416, and the blank image can cover the zoomed image of the milk powder can; the normalization process is scaled to the corresponding image according to the original aspect ratio of the image. Size, fill in the middle of the 416*416 blank image, and fill all other pixels with white, so as to keep the original texture characteristics of the image as much as possible, and use the flipping and translation strategies to enhance the normalized defect image and set the color. including grey;

S3：对预处理后的图像进行字符区域标定，得到各图像的字符位置标签；通过人工框选方式对预处理后的奶粉罐图像进行字符区域标定，将各标定区域的左上角点和右下角点坐标和相应区域的字符存储为字符标签；在图像标定阶段，字符图像标签的制作包括：可使用标记软件，用矩形框人工标定字符区域；将矩形框的左上角点和右下角点的横纵坐标存于txt文件中，所述标记软件可采用现有图形绘制修改软件，labelImg等软件，只为方便在人工框选后，能够自动或人为获取相应的坐标数据。S3: Perform character area calibration on the preprocessed image to obtain character position labels of each image; perform character area calibration on the preprocessed milk powder can image by manual frame selection, and set the upper left corner and lower right corner of each calibration area The point coordinates and the characters in the corresponding area are stored as character labels; in the image calibration stage, the production of character image labels includes: labeling software can be used to manually demarcate the character area with a rectangular frame; The ordinate is stored in the txt file, and the labeling software can adopt existing graphics drawing modification software, labelImg and other software, only for the convenience of automatically or artificially obtaining the corresponding coordinate data after manual frame selection.

S4：将图像及其位置信息标签划分为训练集样本和测试集样本，训练集样本和测试机样本随机划分，划分比例为9：1，可上下浮动；利用训练集样本和测试机样本，采用计算机图形处理器GPU对预先搭建的YOLO-V3深度神经网络进行训练，得到YOLO-V3神经网络检测模型。S4: Divide the image and its position information labels into training set samples and test set samples, and randomly divide the training set samples and test machine samples, and the division ratio is 9:1, which can float up and down; using training set samples and test machine samples, adopt The computer graphics processor GPU trains the pre-built YOLO-V3 deep neural network to obtain the YOLO-V3 neural network detection model.

在本发明提供的实施例中，预处理还包括对归一化处理后得到的罐表面字符图像进行数据增强；罐表面字符图像进行数据增强包括：In the embodiment provided by the present invention, the preprocessing further includes performing data enhancement on the character image on the tank surface obtained after the normalization process; performing data enhancement on the character image on the tank surface includes:

在本发明提供的实施例中，对预先搭建的YOLO-V3神经网络进行训练包括：In the embodiment provided by the present invention, training the pre-built YOLO-V3 neural network includes:

后续即可利用最终YOLO-V3神经网络检测模型进行字符的实时检测：利用摄像头实时采集待检测的奶粉罐表面图像，根据训练好的YOLO-V3网络，实时检测奶粉罐表面字符位置。Afterwards, the final YOLO-V3 neural network detection model can be used to perform real-time detection of characters: the camera is used to collect the surface image of the milk powder can to be detected in real time, and according to the trained YOLO-V3 network, the position of the characters on the surface of the milk powder can is detected in real time.

在本发明提供的实施例中，先验框的大小通过k-means聚类方法设置，包括：随机选取9个字符区域矩形框对象作为训练集的聚类中心，根据各对象与各个聚类中心之间的距离，把各对象分配给距离最近的聚类中心，每分配一个样本，聚类中心重新计算一次，重复分配过程，直到没有聚类中心再发生变化，当前9个聚类中心的值即9个先验框的大小。In the embodiment provided by the present invention, the size of the prior frame is set by the k-means clustering method, including: randomly selecting 9 character area rectangular frame objects as the clustering centers of the training set, and according to each object and each clustering center The distance between each object is assigned to the nearest cluster center, each time a sample is assigned, the cluster center is recalculated, and the assignment process is repeated until no cluster center changes, and the current value of the 9 cluster centers That is, the size of the 9 a priori boxes.

为了更好地理解基于YOLO-V3网络的字符位置检测方法，这里对YOLO-V3网络的工作原理进行简要说明：In order to better understand the character position detection method based on the YOLO-V3 network, here is a brief description of the working principle of the YOLO-V3 network:

a、YOLO-V3网络将输入的图像平均划分成S×S个单元格；a. The YOLO-V3 network evenly divides the input image into S×S cells;

b、每个单元格会预测B个边界框(Bounding Box)，以向量的形式给出这些边界框的信息。边界框的信息包括了位置信息(矩形框中心点坐标，宽和高)，置信度(Confidence)以及预测物体的类别信息。b. Each cell will predict B bounding boxes, and give the information of these bounding boxes in the form of vectors. The information of the bounding box includes position information (coordinates of the center point of the rectangular box, width and height), confidence and category information of the predicted object.

c、对于训练数据，图像和标签输入后，将单元格输出的五个参数(矩形框中心点坐标，宽和高，置信度)代入损失函数(损失函数用来计算YOLO-V3网络计算出的五个参数与标注的五个参数之间的差距)计算，通过反向传播来调整权重，使正确字符区域的置信度升高，非正确字符区域的置信度降低；对于实时采集的数据在输入YOLO-V3网络后，也会算出五个参数(矩形框中心点坐标，宽和高，置信度)，将这五个数据通过损失函数计算后，会得到使损失函数值最低的边界框，也就是最后需要的分类框，即检测结果。c. For the training data, after the image and label are input, the five parameters output by the cell (coordinates of the center point of the rectangular box, width and height, confidence) are substituted into the loss function (the loss function is used to calculate the value calculated by the YOLO-V3 network. The difference between the five parameters and the five marked parameters) is calculated, and the weight is adjusted by backpropagation, so that the confidence of the correct character area is increased, and the confidence of the incorrect character area is decreased; for the real-time collected data in the input After the YOLO-V3 network, five parameters (coordinates of the center point of the rectangular box, width and height, confidence) will also be calculated. After calculating these five data through the loss function, the bounding box with the lowest loss function value will be obtained. It is the final classification frame required, that is, the detection result.

在本发明提供的实施例中，文本矫正算法包括：In the embodiment provided by the present invention, the text correction algorithm includes:

在本发明提供的实施例中，直方图水平投影算法包括：In the embodiment provided by the present invention, the histogram horizontal projection algorithm includes:

在本发明提供的实施例中，改进的端到端不定长文本CRNN模型进行识别包括：In the embodiment provided by the present invention, the identification of the improved end-to-end variable-length text CRNN model includes:

本发明提供的实施例中，预先搭建的改进CRNN神经网络的网络结构如下：In the embodiment provided by the present invention, the network structure of the pre-built improved CRNN neural network is as follows:

S1、其结构层次包括卷积神经网络CNN，删除原来的RNN结构，CNN包括7层卷积层，除第三层和第七层卷积层以外，每层卷积层之后都会进行最大池化处理；S1. Its structure level includes convolutional neural network CNN, delete the original RNN structure, CNN includes 7 convolutional layers, except for the third and seventh convolutional layers, each convolutional layer will be followed by maximum pooling deal with;

S2、调整CRNN网络中参数，将batch大小设置为16或32，learning rate设置为0.00001，epoch设置为100；S2. Adjust the parameters in the CRNN network, set the batch size to 16 or 32, the learning rate to 0.00001, and the epoch to 100;

S3、将上一步得到的文本图像进行训练集和测试集的分类，根据图片里面文本特征，编写代码生成相识的图片与各种情况的文本图片，这样有助于增加模型的泛化能力，将文本数据集集进行打乱，随机分成两部分：其中测试集10000张，训练集100000张；S3. Classify the text images obtained in the previous step into the training set and the test set. According to the text features in the pictures, write codes to generate familiar pictures and text pictures of various situations, which will help to increase the generalization ability of the model. The text dataset is scrambled and randomly divided into two parts: 10,000 for the test set and 100,000 for the training set;

S4、将训练集放入CRNN网络中进行训练，并在训练的同时对当前模型进行测试；S4. Put the training set into the CRNN network for training, and test the current model while training;

S5、当训练以及测试的损失函数收敛时，停止训练，获取CRNN模型S5. When the training and testing loss functions converge, stop training and obtain the CRNN model

S6、使用CRNN识别模型对裁剪后的文本进行识别；’S6. Use the CRNN recognition model to recognize the cropped text;'

S7、根据当时系统情况，如果是在线模式，识别结果上传到数据库；如果是离线模式，识别结果将保存在本地；S7. According to the system situation at the time, if it is in online mode, the recognition result will be uploaded to the database; if it is in offline mode, the recognition result will be saved locally;

S8、执行机构将切换下一个待识别工件；S8. The actuator will switch to the next workpiece to be identified;

本发明所提供的一种罐表面字符识别方法，利用YOLO-V3与CRNN深度学习方法实现了文本字符实时识别，能快速框选出字符所在的区域，并能够识别。因为运用了深度学习，所以该方法具有鲁棒性高，准确率高，速度快的优点，为奶粉罐生产企业提供智能字符定位的检测方案，解决人工识别效率低、速度慢、人员成本高等问题。The method for recognizing characters on the surface of a tank provided by the present invention realizes real-time recognition of text characters by using the YOLO-V3 and CRNN deep learning method, and can quickly select the area where the characters are located, and can recognize them. Because of the use of deep learning, this method has the advantages of high robustness, high accuracy and fast speed. It provides a detection solution for intelligent character positioning for milk powder can manufacturers, and solves the problems of low manual recognition efficiency, slow speed and high personnel cost. .

如上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以所述权利要求的保护范围为准。As mentioned above, it is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited to this. should be included within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A method of canister surface character recognition, the method comprising:

acquiring a tank surface image in real time;

carrying out character positioning on the tank surface image acquired in real time by using a pre-established YOLO-V3 neural network detection model to obtain whether the tank surface image contains characters and character position information;

outputting the position of the character prediction box, and cutting out a character area;

the cut character area has the problem that the fit with the actual text area is not tight enough, and the correction is carried out by using a text correction algorithm;

segmenting text characters of each line through histogram projection;

identifying the cut text region by using an improved end-to-end indefinite length text CRNN model;

according to the system condition at the time, if the mode is an online mode, the recognition result is uploaded to a database; if the mode is an off-line mode, the recognition result is stored locally.

2. The can surface character recognition method of claim 1, further comprising: and the executing mechanism switches the next workpiece to be identified.

3. The can surface character recognition method of claim 1, wherein the can surface comprises: milk powder can surface, can surface.

4. The can surface character recognition method of claim 1, wherein the method for building the YOLO-V3 neural network detection model comprises:

s1: collecting a large number of character images on the surface of the milk powder can;

s2: preprocessing the surface image of the milk powder can; the pretreatment comprises the following steps of carrying out normalization treatment on the surface image of the milk powder can: filling the milk powder tank surface image after the equal scaling into a blank image with a preset pixel size, and then filling all pixels except the milk powder tank image on the blank image into a set color; the pixel size of the blank image is 416 x 416, and the blank image can cover the scaled milk powder can image; the set color comprises gray;

s3: carrying out character area calibration on the preprocessed images to obtain character position labels of the images; carrying out character region calibration on the preprocessed milk powder tank image in a manual frame selection mode, and storing coordinates of an upper left corner point and a lower right corner point of each calibration region and characters of corresponding regions as character tags;

s4: dividing the image and the position information label thereof into a training set sample and a test set sample, wherein the division ratio is 9: 1; and training the preset YOLO-V3 deep neural network by using a training set sample and a test machine sample and using a GPU (graphics processing Unit) to obtain a YOLO-V3 neural network detection model.

5. The tank surface character recognition method of claim 4, wherein the preprocessing further comprises data enhancement of the tank surface character image obtained after the normalization processing; the data enhancement of the can surface character image comprises the following steps:

class of geometric transformations: turning, rotating, cutting, deforming and zooming;

color transform class: including noise, blur, color transformation, erasure, padding;

and generating a sample picture according to the picture style simulation.

6. The method of pot surface character recognition according to claim 1, wherein the training of the pre-built YOLO-V3 neural network comprises:

s1: setting training parameters: setting the iteration step number epochs to 10000, setting the learning rate optimizer optizer to 'adam', setting the batch training sample number batch _ size to 64, and setting the sizes of 9 prior frames anchor box;

s2: training the YoLO-V3 neural network: inputting the sample data of the training set and the character data of the surface of the milk powder can corresponding to each sample into a YOLO-V3 neural network for model training;

s3: and (3) testing a model: after each training is finished, testing the trained model by using the test set sample, and if the detection rate of the model on flaws in the test set sample exceeds 95% and the detection accuracy is not lower than 95%, taking the model obtained by the last training as a final YoLO-V3 neural network detection model; otherwise, taking the model obtained by the last training as the YOLO-V3 neural network to be trained currently, and repeating the steps S2-S3 until the final YOLO-V3 neural network detection model is obtained.

7. The tank surface character recognition method of claim 6, wherein the size of the prior box is set by a k-means clustering method comprising: randomly selecting 9 character region rectangular frame objects as clustering centers of a training set, distributing each object to the closest clustering center according to the distance between each object and each clustering center, distributing a sample, recalculating the clustering centers, repeating the distribution process until no clustering center changes, and obtaining the value of the current 9 clustering centers, namely the size of 9 prior frames.

8. The can surface character recognition method of claim 1, wherein the text rectification algorithm comprises:

s1: carrying out gray processing on the cut image, wherein the character is white and the background is color;

s2: performing Gaussian blur to enable the text parts to be connected into a block, and performing threshold processing on the image to obtain a binary image;

s3: fitting by using a minimum rectangle, and screening to obtain the area of a text region part according to the area size of the rectangle, wherein the text region part is the minimum external rectangle of the text;

s4: and obtaining the central point of the minimum circumscribed rectangle, and carrying out affine transformation on the text region to realize the correction of the inclined text.

9. The tank surface character recognition method of claim 1, wherein the histogram horizontal projection algorithm comprises:

s1: binaryzation is carried out on the corrected image obtained in the step four;

s2: calculating pixel sums in the horizontal direction of the binarized image, and drawing a histogram with the horizontal axis as the image vertical axis coordinate range of 0-image width and the vertical axis as the pixel sums;

s3: finding out the valley part of the histogram, wherein the valley part is a division line;

s4: and segmenting each line of text image according to the segmentation line.

10. The method of canister surface character recognition according to claim 1, wherein the improved end-to-end indefinite length text CRNN model recognition comprises:

s1: performing graying processing on each line of text images, and converting three-channel images into a single-channel grayscale image;

s2: scaling and filling the length, fixing the size of the input image to 32 × width, scaling or expanding, which can be similar to the training sample;

s3: converting the data label sparse matrix, processing the label matrix, and converting the label matrix into a required data format;

s4: building a model, changing the structure, changing the idea of the original CNN + RNN + CTC transcription layer into CNN + CTC, modifying and designing 7 layers of convolution layers and 5 layers of small convolution neural networks on a VGG (variable gradient generator) by a main network, and adding two times of batch regularization in the middle layer to avoid gradient dispersion of the model and accelerate convergence;

s5: and (4) data post-processing, namely performing mapping transformation on the subscript of the dictionary array corresponding to the real value to obtain an actual value.