Nothing Special   »   [go: up one dir, main page]

CN112348026A - A recognition method of magnetic hard disk serial code based on machine vision - Google Patents

A recognition method of magnetic hard disk serial code based on machine vision Download PDF

Info

Publication number
CN112348026A
CN112348026A CN202011235039.3A CN202011235039A CN112348026A CN 112348026 A CN112348026 A CN 112348026A CN 202011235039 A CN202011235039 A CN 202011235039A CN 112348026 A CN112348026 A CN 112348026A
Authority
CN
China
Prior art keywords
character
hard disk
feature
gradient
magnetic hard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011235039.3A
Other languages
Chinese (zh)
Inventor
徐喆
刘晓鸽
汤健
李鹏昇
张自影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011235039.3A priority Critical patent/CN112348026A/en
Publication of CN112348026A publication Critical patent/CN112348026A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开一种基于机器视觉的磁性硬盘序列码识别方法,步骤1、对磁性硬盘序列码图片进行预处理;步骤2、进行序列码的字符分割,接着对分割后的字符进行HOG特征提取;步骤3、采用支持向量机(SVM)算法构建识别模型,对磁性硬盘序列码图片识别。

Figure 202011235039

The invention discloses a method for recognizing a magnetic hard disk serial code based on machine vision. Step 1, preprocessing a magnetic hard disk serial code picture; Step 2, performing character segmentation of the serial code, and then performing HOG feature extraction on the segmented characters; In step 3, a support vector machine (SVM) algorithm is used to construct a recognition model to recognize the serial code picture of the magnetic hard disk.

Figure 202011235039

Description

Magnetic hard disk sequence code identification method based on machine vision
Technical Field
The invention belongs to the technical field of magnetic hard disks, and particularly relates to a magnetic hard disk sequence code identification method based on machine vision.
Background
Magnetic hard disks are important information storage media for computers, and are mainly branded with Western Data (WD), Shiji (ST), Maxtor (Maxtor), etc., and the capacity has been developed from the first GB level to the current T level. The magnetic hard disk has the characteristics of high efficiency, stability, convenience and the like for recording and transmitting information, and thus, a user faces the safety problems that privacy information in the hard disk is illegally stolen, copied, used and the like. Therefore, the information destruction of the waste magnetic hard disk is usually performed by adopting demagnetization treatment. At present, commercial demagnetizing equipment is put into use in the market, such as an XBC series demagnetizer, an FD series demagnetizer and the like in the security and insurance industry, and the demagnetizing machines adopt the same demagnetizing parameters to demagnetize magnetic hard disks of different brands and different ages according to standards. However, in the magnetic hard disks of different brands, specifications and ages, the coercive force is different due to the difference of the magnetic recording mode and the magnetic material, so that the parameters for realizing the complete demagnetization of different magnetic hard disks are different. Therefore, the information destruction efficiency can be improved by adopting customized demagnetization parameters for magnetic hard disks of various brands and specifications. To implement customized degaussing, the identification code of the magnetic hard disk needs to be acquired first. The identification code information on the surface of the magnetic hard disk has difference due to different brands, but all the identification code information comprises basic information such as brand identification, Serial Number (SN), MODEL number (MODEL), capacity, production time and the like, and the identification method provides possibility for identifying different magnetic hard disks. The manual distinguishing and manual inputting of information has the defects of low speed, high error possibility and the like, and the identification of the magnetic hard disk identification code is urgently needed to be realized by adopting an intelligent means. Therefore, the machine vision-based magnetic hard disk serial code recognition is carried out aiming at the characteristics of the magnetic hard disk identification code image, and the essence of the machine vision-based magnetic hard disk serial code recognition is a character recognition technology in image recognition.
An optical Character recognition technology ocr (optical Character recognition) is widely applied to digitalization of paper documents, and the principle of the technology is that after picture information of books and documents is obtained by scanning or photographing, pictures and scanning pieces are recognized, and the picture information is converted into characters. The applied fields include print manuscript identification, invoice identification, license plate identification and the like. The recognition processes commonly used by the OCR technology are image processing, character cutting, feature extraction, and character recognition. A character feature extraction technique is provided. Aiming at the character feature extraction technology, the feature extraction mode based on the character structure shape comprises transverse line features, crossing frequency features and the like, but the structural features such as upper and lower transverse lines, closed circle number and the like only have a good effect on digital classification. The method for counting the grid features of the characters is fast in calculation speed and easy to implement, but is usually used as a character fusion feature together with other features. Compared with other feature description modes, the HOG feature descriptor has the advantages that the operation part is a local grid unit of the image, and the invariance of the optical deformation and the geometric deformation of the image is kept well. The character recognition model comprises a neural network, a random forest, a Support Vector Machine (SVM) and the like. The SVM has a good effect in solving the problems of small samples, nonlinearity and high-dimensional pattern recognition. From the above analysis, the recognition research of magnetic hard disk images based on machine vision has not been reported. Due to the complexity of the magnetic hard disk image information and layout format, only sequence code (SN) identification is studied herein.
Disclosure of Invention
The demagnetization processing of the waste magnetic hard disk is a common information destruction means, and the demagnetization processing is customized for different magnetic hard disks so as to improve the information destruction efficiency. Under the scene, the identification information of the waste magnetic hard disk needs to be quickly and accurately collected and recorded. In view of the above-mentioned needs, a magnetic hard disk serial code identification method based on machine vision is proposed herein. Firstly, preprocessing a magnetic hard disk sequence code image, then, carrying out character segmentation of a sequence code, then, extracting HOG characteristics from segmented characters, and finally, adopting a Support Vector Machine (SVM) algorithm to construct a recognition model, and verifying the effectiveness of the method through a magnetic hard disk sequence code image recognition experiment.
Drawings
FIG. 1(a) a Seagate magnetic hard disk;
FIG. 1(b) Western data magnetic hard disk;
FIG. 1(c) is a Mylar magnetic hard disk;
FIG. 2 shows a Seagate hard disk ID information picture with MODEL different SN codes;
FIG. 3 is a flow chart of magnetic hard disk serial number identification;
FIG. 4(a) is a graph of the effect of gray scale image preprocessing;
FIG. 4(b) is a diagram of the effect of the pre-processing of the binary image;
FIG. 4(c) is a diagram of the effect of denoising image preprocessing;
FIG. 5 is a horizontal projection effect diagram;
FIG. 6 is a vertical projection effect diagram;
FIG. 7(a) normalized pre-character pictures;
FIG. 7(b) normalized character picture;
figure 8 sliding step is 2 x 2 parametric optimization 3D plot;
figure 9 sliding step 4 x 4 parametric optimization 3D plot;
fig. 10 sliding step 8 x 8 parametric optimization 3D plot.
Detailed Description
Different magnetic hard disk identification information has a certain degree of difference, but generally comprises a sequence code (SN), a MODEL code (MODEL), and the like, such as the magnetic hard disk identification information pictures of three brands of shijie, western data, and shitou shown in fig. 1.
The serial code (SN code) is the factory serial number set by each magnetic hard disk, and is the 'identity card number' of the magnetic hard disk. The serial code (hereinafter, referred to as SN code) is unique for both magnetic hard disks of different brands and magnetic hard disks of the same brand and different capacities. As shown in fig. 2, (a) and (b) are both xijie hard disks, and MODEL numbers (MODEL codes) are the same and different in SN code. The SN codes of the magnetic hard disk are all formed by combining Arabic numbers and English letters and are regular print characters, so that the identification of the magnetic hard disk sequence codes is substantially effective to the Arabic numbers and the English letters of the prints.
The magnetic hard disk sequence code identification method based on machine vision comprises four modules of hard disk sequence code image preprocessing, hard disk sequence code character segmentation, mixed character feature extraction and hard disk sequence code identification, and is shown in figure 3.
In FIG. 3, { X1Represents a magnetic hard disk sequence code image set;
Figure BDA0002766535400000031
representing a magnetic hard disk sequence code image set obtained after image preprocessing;
Figure BDA0002766535400000032
representing a set of sequence code atomic level character images;
Figure BDA0002766535400000033
representing a typeface character set comprising typeface numbers and capital letters; z represents a feature file of an atomic-level character image;
Figure BDA0002766535400000034
indicating the recognition result.
The functions of the modules are as follows:
(1) the hard disk sequence code picture preprocessing module: carrying out graying, binaryzation and denoising treatment on the magnetic hard disk sequence code picture;
(2) hard disk sequence code character segmentation module: performing character segmentation on the processed hard disk sequence code picture through horizontal projection and vertical projection, and normalizing the segmented characters to obtain an atomic-scale picture;
(3) the mixed character feature extraction module: extracting direction gradient Histogram (HOG) features from the atomic level characters and the print characters to obtain a feature file;
(4) hard disk sequence code identification module: and constructing an identification model based on the extracted feature file, and identifying the hard disk sequence code.
Hard disk sequence code picture preprocessing module
Image graying
The gray level image is grayed by adopting a weighted average method, and the R, G, B three components are weighted and averaged by different weights to obtain a reasonable gray level image. The gray value calculation is as follows:
L(s,t)=0.299*R(s,t)+0.587*G(s,t)+0.114*B(s,t) (1)
wherein, R (s, t), G (s, t), B (s, t) respectively represent the brightness values of the red, green, blue channels at the pixel point (s, t), and L (s, t) represents the gray value at the pixel point (s, t).
Image binarization
The binarized image is a logic array with the values of only 0 and 1, and the complexity of image processing is reduced.
The gray value L (s, t) of each pixel point is first calculated and compared with a given threshold value Thre. The pixel point setting value with the gray value larger than the threshold is 255 (white), and the pixel point setting value smaller than the threshold is 0 (black), as follows:
Figure BDA0002766535400000041
wherein L (s, t) represents the gray value at the pixel point (s, t), Thre represents the binarization threshold value, and B (s, t) represents the binarized image.
Image denoising
The median filtering is adopted for image denoising processing. The method comprises the steps of defining a two-dimensional template with the length of an odd number K, namely a region of K x K, wherein the K value is usually 3 or 5; and sorting the pixel values in the template area according to the sizes to obtain pixel point median values, and converting the pixel values in the template area into the pixel point median values in the template area so as to eliminate isolated noise points.
Hard disk sequence code character segmentation module
Character segmentation based on projection method
The target information character can be segmented by adopting a projection method. And analyzing the pixel distribution histogram of the picture to find out the boundary points of the adjacent characters for segmentation. The first step is to horizontally project the binary image of the magnetic hard disk identification code, namely to count the pixels of each row to obtain a pixel distribution histogram in the horizontal direction. The cutting is carried out by analyzing the horizontal direction pixel distribution histogram and further determining the starting position and the ending position of each line,
and secondly, vertically projecting the horizontally projected and cut row images, namely counting pixels of each column to obtain a pixel distribution histogram in the vertical direction. And determining the position coordinates of a single character by analyzing the pixel distribution histogram in the vertical direction, and segmenting the line binary image according to the character position coordinates to obtain a single character image.
Character normalization
Since the cut character "fills" the entire image because the cutting criteria are based on the position coordinates of the individual character, the cut character is subjected to normalization processing to improve recognition accuracy, including size normalization and position normalization. The following method is used for character normalization herein.
And setting the sizes of the cut character pictures to be n dimensions x n dimensions (n x n) when the characters are cut, and defining the white gray pictures with the sizes of m dimensions x m dimensions (m x m), wherein m is larger than n. And pasting the n-dimension picture to the m-dimension picture by taking a as an initial position to obtain the atomic-scale picture with consistent size and position.
Mixed character feature extraction module
Feature extraction is performed by using a Histogram of Oriented Gradients (HOG), which is often used in conjunction with a Support Vector Machine (SVM) to train a high-precision target classifier. The method comprises the steps of calculating image gradient, constructing a gradient histogram and Block normalization. Calculating the image gradient:
at each pixel point of the image there is its gradient magnitude and direction. Calculating the gradient g of the direction of the abscissa(s) of the atomic-scale character image according to the formulas (3-1) and (3-2)s(s, t) gradient g in the direction of ordinate (t)t(s,t):
gs(s,t)=I(s+1,t)-I(s-1,t) (3-1)
gt(s,t)=I(s,t+1)-I(s,t-1) (3-1)
Obtaining the gradient amplitude g (s, t) and the gradient direction value theta (s, t) of each pixel position in the atomic-scale character image according to the formulas (4) and (5):
Figure BDA0002766535400000051
Figure BDA0002766535400000052
constructing a gradient histogram:
the atomic-level character image is divided into cells of a plurality of pixels, called cell cells (cells), and directional gradient histograms are constructed in the cells.
The gradient direction is first divided evenly into I directional intervals. Histogram statistics is carried out on gradient directions of all pixels in each direction interval, a weighted projection mode is adopted for statistics, and the weight value can be generally represented through a gradient amplitude value. Calculating the gradient histogram to obtain an I-dimensional feature vector, which is the HOG feature of the cell unit, as follows:
βj=[η12,…,ηI],i∈[1,I] (6)
wherein etaiRepresenting a weighted projection of the gradient magnitude on each gradient direction interval; beta is ajThe HOG feature vector for the jth cell unit.
Block normalization:
several adjacent cell units (cells) are grouped into a compartment, called a block. Concatenating the feature vectors of all the cell units in the block, the HOG features of this block are obtained, as follows:
γk=[β12,…,βJ],j∈[1,J] (7)
wherein beta isjHOG feature vector for jth cell unit; j is the total number of cell units within each block; gamma raykRepresenting the HOG feature vector of the k-th block.
The contrast normalization operation is performed in a block (block) using equation (7), as follows:
Figure BDA0002766535400000061
wherein: gamma raykRepresents the vector to be normalized, | | γk||2Represents gammakNormalized norm,. epsilon.represents a very small constant,. vkRepresenting the normalized vector.
A block is obtained by sliding a window. And sequentially sliding the character picture by a certain step length from left to right and from top to bottom based on the atomic scale. Obtaining K blocks, and connecting all the normalized feature vectors of the blocks in series to obtain a multi-dimensional feature vector upsilon as follows:
υ=[v1,v2,…,vk],k∈[1,K] (9)
the HOG descriptor of the whole picture is a feature vector upsilon with dimension I J K composed of histogram components of all cell units in each block.
Hard disk sequence code identification module
A Support Vector Machine (SVM) is adopted to construct a sequence code recognition model. The SVM belongs to a common supervised learning algorithm, and the algorithm can automatically find out a support vector with better distinguishing capability for classification, so that the constructed classifier can maximize the class-to-class interval, and has better adaptability and higher resolution. The SVM type is herein chosen to be C-SVC, i.e., a multi-classifier that uses a penalty factor (Cost) to handle noise.
Step (1), firstly, extracting character HOG characteristics by adopting the Step length of a sliding window as 2 x 2 to obtain a characteristic file Z1
Step (2) Next, the obtained profile Zi(i-1, 2,3) as an input, transmitting the input into the SVM model, and obtaining a training set and a test set by adopting a default proportion;
step (3), traversing the parameter combinations by adopting a grid search method according to the values of the parameters gamma and C, and testing the obtained model by using a test set to obtain the classification accuracy of each group of parameter combinations;
step (4), then, drawing a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy, and analyzing to obtain an optimal parameter combination;
step (5), extracting character HOG characteristics by using the Step length of a sliding window to be 4 x 4 to obtain a characteristic file Z2And circulating the steps Step (2), Step (3) and Step (4) to obtain an optimal parameter combination aiming at the characteristic;
step (6), extracting character HOG characteristics by using the Step length of a sliding window as 8 x 8 to obtain a characteristic file Z3And the step of recyclingStep (2), Step (3) and Step (4), and obtaining an optimal parameter combination aiming at the characteristics;
and Step (7), finally, comparing the three sliding Step length obtaining models to obtain the optimal sliding Step length and parameter combination, and storing the training model to obtain the trained SVM training model.
Application verification
Description of test data
The method adopts a mixed data set to carry out model training and testing on the magnetic hard disk sequence code. The mixed data set comprises two parts of printed characters and atomic-level character pictures of magnetic hard disk sequence codes, wherein the total number of the pictures is 36 types of characters, 10 types of Arabic numerals 0-9 are included, 26 types of capital letters A-Z are included, 279 pictures of each type are included, and the total number of samples is 10044.
Results of the experiment
Image preprocessing result
In this document, according to the actual effect, a binarization threshold value Thre is 127, a median filtering two-dimensional template is 3 × 3, and the magnetic hard disk serial code image preprocessing effect is shown in fig. 4 below.
Letter segmentation result
The starting position of each line of characters can be analyzed according to the horizontal projection and used for line text cutting. The effect of the horizontal projection of the magnetic hard disk identification code picture is shown in the following figure 5.
And vertically projecting the cut line text, and determining the position coordinates of each character according to the upper left coordinate point and the lower right coordinate point of the character so as to cut the character. The single-line character vertical projection effect diagram is shown in fig. 6.
In the cutting process, the size of the cut character picture is set to 12 × 12 dimensions, as shown in fig. 7 (a). And meanwhile, setting a 16-by-16 gray value 255 (white) picture, pasting the extracted 12-by-12 character picture on the 16-by-16 picture by taking pixel point coordinates (2,2) as a starting point to obtain an atomic-level character picture with a centered position, and realizing character normalization. As shown in fig. 7 (b).
Feature extraction results
Dividing the atomic-level character image into a plurality of cell units (cells) according to pixel points, carrying out statistics on gradient information by converting the gradient direction into I direction intervals, carrying out weighted projection on each pixel in the cell in a histogram by using the gradient direction to obtain an I-dimensional feature vector corresponding to the cell, and adopting unsigned gradient, namely the gradient direction value is between 0 and 180 degrees. If each block (block) has J cell units, K blocks are obtained by traversing the whole picture through a sliding window, and the HOG feature extracted from one atomic-level character image is a feature vector of 1 x (I x J x K).
The size of the character picture of the data set is 16 × 16, a region of 4 × 4 pixels is taken as a cell, I is taken as 9, and the gradient information of the cell is counted by adopting a 9bin histogram channel. 2 x 2 cells as a block, there are 4 cells in each block. If the sliding step (unit: pixel) of the block is 2 x 2, traversing the whole picture through a sliding window to obtain 25 blocks, namely I, J, K values are respectively 9, 4 and 25, and finally obtaining a feature vector with the HOG feature of an atomic-level character image being 1 x 900 dimension. Different K values can be obtained by changing the sliding step length so as to achieve the purpose of reducing the vector dimension. For example, if the sliding step size of each block is changed to 4 × 4, 9 blocks are obtained by traversing the whole picture, that is, I, J, K takes values of 9, 4 and 9 respectively, and the final obtained HOG feature vector is 1 × 324 dimensional; if the sliding step size is changed to 8 × 8, 4 blocks are obtained by traversing the whole picture, and the final obtained HOG feature vector is 1 × 144 dimension. And finally, forming an HOG feature file by using different HOG feature vectors obtained by different atomic-level characters, and inputting the HOG feature file into the SVM for recognition model training.
Analysis of recognition results
Herein, the SVM type selects C-SVC, the kernel function selects RBF (radial kernel function), and the model training parameters are gamma and C. And (3) performing parameter optimization by adopting a grid search method, wherein the gamma value is {0.01,0.1,1,10,100}, and the C value is {0.01,0.1,1,10,100 }. The parameter optimization results are shown in fig. 8-10 according to different sliding window step sizes. Wherein the x-axis represents the value of the parameter gamma, the y-axis represents the value of the parameter C, and the z-axis represents the prediction accuracy of the model. And according to the parameter value taking characteristics, logarithm processing is carried out on the x axis and the y axis, so that the imaging is more visual and convenient to analyze.
When the HOG features are extracted, the step lengths of the sliding windows are different, and the obtained classifier model parameters are also different. Table 1 compares the classifier models obtained for different sliding step lengths.
TABLE 1 comparison of classifier model results based on different sliding step lengths
Figure BDA0002766535400000081
Figure BDA0002766535400000091
As can be seen from Table 1: the window sliding step is 2 x 2, the feature vector dimension is 900, the optimal values of gamma and C are {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 4 x 4, the feature vector dimension is 324, the optimal values of gamma and C are still {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 8 x 8, the feature vector dimension is 144, the gamma and C are optimally {1,100}, and the accuracy can reach 0.98.
The experiment adopts an HOG characteristic extraction method to compare with a traditional pixel statistical characteristic extraction method, and the magnetic hard disk sequence code is identified. The experiment is the identification effect of the HOG characteristic extraction method with different sliding step lengths, and the experiment is the identification effect obtained by adopting the traditional pixel statistical characteristic extraction method. The magnetic hard disk serial number is Y203TK1E, and the identification result is shown in Table 2.
TABLE 2 sequence code identification results
Figure BDA0002766535400000092
As can be seen from Table 2: for the same magnetic hard disk sequence code, the fourth group of experiments have obvious error identification phenomena, numbers 0 and 1 are identified as letters O, I by error, and the identification effect is poor; the similar character misidentification phenomenon appears in the experiments of the first group and the third group, and the number 0 is misidentified as a capital letter O; in the second group of experiments, the window sliding step length is 4 × 4, and accurate identification can be realized. The dimension of the feature vector decreases as the sliding step increases, which also shortens the recognition time. When the sliding step size is increased from 2 x 2 to 4 x 4, the feature vector dimension is reduced from 900 dimensions to 324 dimensions, the recognition time is greatly reduced, and the recognition time is shortened from 1.15391s to 0.36998 s; when the step size was further increased to 8 x 8, the recognition time was shortened to 0.23237 s. The experimental result shows that when the sliding step length is 4 x 4, the recognition effect is best, and the recognition time is also obviously shortened. The result also proves the effectiveness and feasibility of the HOG characteristic and the SVM in combination for magnetic hard disk sequence code recognition, and the recognition process needs to be further optimized.
Aiming at the characteristics of the sequence code in the magnetic hard disk image, the magnetic hard disk sequence code identification method based on machine vision is provided. Compared with the traditional characteristic extraction mode of counting pixel points, the HOG characteristic descriptor can better describe the appearance and the shape of a local target and better distinguish similar characters or similar characteristics. The machine vision is applied to the recognition of the magnetic hard disk sequence code for the first time, and the atomic-level characters and print characters of the magnetic hard disk sequence code are mixed to perform recognition model training; and extracting HOG characteristics of the characters to form a characteristic file for sequence code recognition. Experimental results show that the HOG characteristic combined with the SVM classifier has good effect in the identification application of the magnetic hard disk sequence code. Aiming at the characteristics of the identification code picture of the magnetic hard disk, the following research contents comprise two parts: collecting pictures of magnetic hard disks of different brands, and effectively identifying brand identifications; and analyzing the information typesetting characteristics of the magnetic hard disk identification code, and accurately positioning and identifying the effective information.

Claims (4)

1.一种基于机器视觉的磁性硬盘序列码识别方法,其特征在于,1. a magnetic hard disk serial code identification method based on machine vision, is characterized in that, 步骤1、对磁性硬盘序列码图片进行预处理;Step 1. Preprocess the serial code picture of the magnetic hard disk; 步骤2、进行序列码的字符分割,接着对分割后的字符进行HOG特征提取;Step 2, perform character segmentation of the sequence code, and then perform HOG feature extraction on the segmented characters; 步骤3、采用支持向量机(SVM)算法构建识别模型,对磁性硬盘序列码图片识别。In step 3, a support vector machine (SVM) algorithm is used to construct a recognition model to recognize the serial code picture of the magnetic hard disk. 2.如权利要求1的基于机器视觉的磁性硬盘序列码识别方法,其特征在于,步骤2中基于投影法进行字符分割包括:2. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, is characterized in that, in step 2, carry out character segmentation based on projection method and comprise: 第一步,首先对磁性硬盘标识码的二值化图片进行水平投影,即对每一行的像素进行统计,得到水平方向像素分布直方图;The first step is to first perform a horizontal projection on the binarized image of the magnetic hard disk identification code, that is, perform statistics on the pixels of each row to obtain a histogram of pixel distribution in the horizontal direction; 第二步,对水平投影切割好的行图像进行垂直投影,即对每一列的像素进行统计,得到垂直方向像素分布直方图;通过分析垂直方向像素分布直方图确定单个字符的位置坐标,根据字符位置坐标对行二值图像进行分割,得到单个字符图像;The second step is to perform vertical projection on the line image cut by the horizontal projection, that is, to count the pixels in each column to obtain a histogram of pixel distribution in the vertical direction; determine the position coordinates of a single character by analyzing the histogram of pixel distribution in the vertical direction The position coordinates are used to segment the row binary image to obtain a single character image; 其中,在字符切割时设置切割后的字符图片尺寸统一为n维*n维(n*n),定义尺寸为m维*m维(m*m)的白色灰度图片,其中m>n;以a*a为起始位置,将n*n维图片粘贴至m*m图片上,得到尺寸、位置均一致的原子级图片。Wherein, in the character cutting, the size of the cut character picture is uniformly set as n-dimension*n-dimension (n*n), and the defined size is a white grayscale picture of m-dimension*m-dimension (m*m), where m>n; Using a*a as the starting position, paste the n*n-dimensional image onto the m*m image to obtain atomic-level images with the same size and position. 3.如权利要求1的基于机器视觉的磁性硬盘序列码识别方法,其特征在于,步骤2中混合字符特征提取包括计算图像梯度、构建梯度直方图、Block归一化三步,具体为:3. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, is characterized in that, in step 2, mixed character feature extraction comprises three steps of calculating image gradient, building gradient histogram, Block normalization, and is specially: (1)计算图像梯度:(1) Calculate the image gradient: 采用式(3-1)、(3-2)计算原子级字符图像横坐标(s)方向梯度gs(s,t)、纵坐标(t)方向梯度gt(s,t):Formulas (3-1) and (3-2) are used to calculate the abscissa (s) direction gradient g s (s, t) and the ordinate (t) direction gradient g t (s, t) of the atomic character image: gs(s,t)=I(s+1,t)-I(s-1,t) (3-1)g s (s,t)=I(s+1,t)-I(s-1,t) (3-1) gt(s,t)=I(s,t+1)-I(s,t-1) (3-1)g t (s,t)=I(s,t+1)-I(s,t-1) (3-1) 由式(4)、(5)得到原子级字符图像中每个像素位置的梯度幅值g(s,t),以及梯度方向值θ(s,t):From equations (4) and (5), the gradient magnitude g(s,t) of each pixel position in the atomic character image, and the gradient direction value θ(s,t) are obtained:
Figure FDA0002766535390000011
Figure FDA0002766535390000011
Figure FDA0002766535390000012
Figure FDA0002766535390000012
(2)构建梯度直方图:(2) Build a gradient histogram: 把原子级字符图像分割为若干个像素的单元,称为细胞单元(cell),在cell中进行方向梯度直方图的构建,The atomic-level character image is divided into units of several pixels, called cells, and the directional gradient histogram is constructed in the cells. 首先将梯度方向平均划分为I个方向区间,对所有像素的梯度方向在各个方向区间进行直方图统计,统计采取加权投影的方式,通过梯度幅值来表示这个权值;通过计算梯度直方图,得到一个I维的特征向量,这个I维特征向量即为这个细胞单元的HOG特征,如下所示:First, the gradient direction is equally divided into I direction intervals, and histogram statistics are performed on the gradient directions of all pixels in each direction interval. The statistics are in the form of weighted projection, and the weight is represented by the gradient magnitude; by calculating the gradient histogram, Obtain an I-dimensional feature vector, which is the HOG feature of this cell unit, as shown below: βj=[η12,…,ηI],i∈[1,I] (6)β j =[η 12 ,...,η I ],i∈[1,I] (6) 其中,ηi表示梯度幅值在每个梯度方向区间上的加权投影;βj为第j个细胞单元的HOG特征向量;Among them, η i represents the weighted projection of the gradient magnitude on each gradient direction interval; β j is the HOG feature vector of the jth cell unit; (3)Block归一化:(3) Block normalization: 把若干相邻的细胞单元(cell)构成一个区间,称为块(block),串联块内所有细胞单元的特征向量,得到这个块的HOG特征,如下所示:A number of adjacent cells are formed into an interval, called a block, and the feature vectors of all cells in the block are concatenated to obtain the HOG feature of the block, as shown below: γk=[β12,…,βJ],j∈[1,J] (7)γ k =[β 12 ,…,β J ],j∈[1,J] (7) 其中,βj为第j个细胞单元的HOG特征向量;J为每个块内的细胞单元总数;γk表示第k个块的HOG特征向量;Among them, β j is the HOG feature vector of the jth cell unit; J is the total number of cell units in each block; γ k represents the HOG feature vector of the kth block; 在块(block)中采用式(7)进行对比度归一化操作,如下所示:In the block (block), formula (7) is used to perform the contrast normalization operation, as follows:
Figure FDA0002766535390000021
Figure FDA0002766535390000021
其中:γk表示待归一化向量,||γk||2表示γk归一化范数,ε表示一个极小的常数,vk表示归一化后向量,Where: γ k represents the vector to be normalized, || γ k || 2 represents the normalized norm of γ k , ε represents a very small constant, v k represents the normalized vector, HOG通过滑动窗口来得到块(block),基于原子级字符图片从左到右、从上到下依次滑动一定步长,得到K个block,将所有block归一化后的特征向量串联起来得到多维特征向量υ,如下所示:HOG obtains blocks by sliding windows. Based on atomic-level character images, slide a certain step size from left to right and top to bottom to obtain K blocks, and concatenate the normalized feature vectors of all blocks to obtain multi-dimensional The eigenvector υ is as follows: υ=[v1,v2,…,vk],k∈[1,K] (9)υ=[v 1 ,v 2 ,…,v k ],k∈[1,K] (9) 整幅图片的HOG描述符是由各块中所有细胞单元的直方图成分所组成的一个维数为I*J*K的特征向量υ。The HOG descriptor of the whole picture is a feature vector υ with dimension I*J*K composed of the histogram components of all cell units in each block.
4.如权利要求1的基于机器视觉的磁性硬盘序列码识别方法,其特征在于,步骤3具体包括:选择SVM类型为C-SVC,即使用惩罚因子(Cost)处理噪声的多分类器,具体为:4. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, it is characterized in that, step 3 specifically comprises: select SVM type to be C-SVC, namely use the multi-classifier of penalty factor (Cost) to process noise, concrete for: Step(1).首先,采用滑动窗口步长为2*2提取字符HOG特征,得到特征文件Z1Step (1). First, adopt the sliding window step size of 2*2 to extract the character HOG feature, and obtain the feature file Z 1 ; Step(2).接着,将得到的特征文件Zi(i=1,2,3)作为输入传入SVM模型,采用默认比例获得训练集和测试集;Step (2). Next, the obtained feature file Z i (i=1, 2, 3) is input to the SVM model, and the training set and the test set are obtained by using the default ratio; Step(3).接着,针对参数gamma和C的取值,采用网格搜索法对参数组合进行遍历,用测试集对得到的模型进行测试,得到每组参数组合的分类准确率;Step (3). Next, according to the values of parameters gamma and C, the grid search method is used to traverse the parameter combination, and the obtained model is tested with the test set to obtain the classification accuracy of each parameter combination; Step(4).接着,对Step(3)得到的参数组合以及对应的准确率绘制3D图像,并分析得到最优参数组合;Step (4). Next, draw a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy rate, and analyze to obtain the optimal parameter combination; Step(5).接着,采用用滑动窗口步长为4*4提取字符HOG特征,得到特征文件Z2,并循环步骤Step(2),Step(3),Step(4),得到针对此特征的最优参数组合;Step(5). Next, use the sliding window step size of 4*4 to extract the character HOG feature to obtain the feature file Z 2 , and loop the steps Step(2), Step(3), Step(4) to obtain the feature for this feature The optimal parameter combination of ; Step(6).接着,采用用滑动窗口步长为8*8提取字符HOG特征,得到特征文件Z3,并循环步骤Step(2),Step(3),Step(4),得到针对此特征的最优参数组合;Step(6). Next, use the sliding window step size of 8*8 to extract the character HOG feature to obtain the feature file Z 3 , and loop the steps Step(2), Step(3), Step(4) to obtain the feature for this feature The optimal parameter combination of ; Step(7).最后,对三种滑动步长得到模型进行比较,得到最优滑动步长和参数组合,将训练模型进行保存,获取训练好的SVM训练模型。Step (7). Finally, compare the models obtained by the three sliding step sizes, obtain the optimal sliding step size and parameter combination, save the training model, and obtain the trained SVM training model.
CN202011235039.3A 2020-11-08 2020-11-08 A recognition method of magnetic hard disk serial code based on machine vision Pending CN112348026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011235039.3A CN112348026A (en) 2020-11-08 2020-11-08 A recognition method of magnetic hard disk serial code based on machine vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011235039.3A CN112348026A (en) 2020-11-08 2020-11-08 A recognition method of magnetic hard disk serial code based on machine vision

Publications (1)

Publication Number Publication Date
CN112348026A true CN112348026A (en) 2021-02-09

Family

ID=74429948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011235039.3A Pending CN112348026A (en) 2020-11-08 2020-11-08 A recognition method of magnetic hard disk serial code based on machine vision

Country Status (1)

Country Link
CN (1) CN112348026A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113414136A (en) * 2021-08-25 2021-09-21 阿里云计算有限公司 Storage medium destroying method, device and system and storage medium
CN115904916A (en) * 2023-02-08 2023-04-04 天翼云科技有限公司 Hard disk failure prediction method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2008200669A1 (en) * 2002-02-01 2008-03-06 Godo Kaisha Ip Bridge 1 Moving picture coding method and moving picture decoding method
CN108133216A (en) * 2017-11-21 2018-06-08 武汉中元华电科技股份有限公司 The charactron Recognition of Reading method that achievable decimal point based on machine vision is read
CN109800616A (en) * 2019-01-17 2019-05-24 柳州康云互联科技有限公司 A kind of two dimensional code positioning identification system based on characteristics of image
CN109948432A (en) * 2019-01-29 2019-06-28 江苏裕兰信息科技有限公司 A kind of pedestrian detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2008200669A1 (en) * 2002-02-01 2008-03-06 Godo Kaisha Ip Bridge 1 Moving picture coding method and moving picture decoding method
CN108133216A (en) * 2017-11-21 2018-06-08 武汉中元华电科技股份有限公司 The charactron Recognition of Reading method that achievable decimal point based on machine vision is read
CN109800616A (en) * 2019-01-17 2019-05-24 柳州康云互联科技有限公司 A kind of two dimensional code positioning identification system based on characteristics of image
CN109948432A (en) * 2019-01-29 2019-06-28 江苏裕兰信息科技有限公司 A kind of pedestrian detection method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113414136A (en) * 2021-08-25 2021-09-21 阿里云计算有限公司 Storage medium destroying method, device and system and storage medium
CN113414136B (en) * 2021-08-25 2022-02-18 阿里云计算有限公司 Storage medium destroying method, device and system and storage medium
CN115904916A (en) * 2023-02-08 2023-04-04 天翼云科技有限公司 Hard disk failure prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112686812B (en) Bank card inclination correction detection method and device, readable storage medium and terminal
CN109993056B (en) Method, server and storage medium for identifying vehicle illegal behaviors
CN104866862B (en) A kind of method of belt steel surface area-type defect recognition classification
Zhou et al. Robust vehicle detection in aerial images using bag-of-words and orientation aware scanning
Yi et al. Text detection in natural scene images by stroke gabor words
Zhang et al. Text detection using edge gradient and graph spectrum
EP2178028A2 (en) Representing documents with runlength histograms
CN107103317A (en) Fuzzy license plate image recognition algorithm based on image co-registration and blind deconvolution
CN106709530A (en) License plate recognition method based on video
CN106203539B (en) Method and device for identifying container number
CN101266654A (en) Image text localization method and device based on connected components and support vector machines
CN104050684B (en) A kind of video frequency motion target sorting technique based on on-line training and system
CN105184291B (en) A kind of polymorphic type detection method of license plate and system
CN106991421A (en) A kind of ID card information extraction system
CN103679187A (en) Image identifying method and system
CN104751475B (en) A kind of characteristic point Optimum Matching method towards still image Object identifying
CN102289665B (en) Printed file identifying method based on powdered ink stack texture analysis
CN104537376A (en) A method, a relevant device, and a system for identifying a station caption
CN106971158A (en) A kind of pedestrian detection method based on CoLBP symbiosis feature Yu GSS features
Edward V Support vector machine based automatic electric meter reading system
CN109800756A (en) A kind of text detection recognition methods for the intensive text of Chinese historical document
CN101452532A (en) Text identification method and device irrelevant to handwriting
Forczmański et al. Stamps detection and classification using simple features ensemble
CN108073940B (en) Method for detecting 3D target example object in unstructured environment
CN112348026A (en) A recognition method of magnetic hard disk serial code based on machine vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210209

WD01 Invention patent application deemed withdrawn after publication