CN112348026A

CN112348026A - A recognition method of magnetic hard disk serial code based on machine vision

Info

Publication number: CN112348026A
Application number: CN202011235039.3A
Authority: CN
Inventors: 徐喆; 刘晓鸽; 汤健; 李鹏昇; 张自影
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-11-08
Filing date: 2020-11-08
Publication date: 2021-02-09

Abstract

The invention discloses a method for recognizing a magnetic hard disk serial code based on machine vision. Step 1, preprocessing a magnetic hard disk serial code picture; Step 2, performing character segmentation of the serial code, and then performing HOG feature extraction on the segmented characters; In step 3, a support vector machine (SVM) algorithm is used to construct a recognition model to recognize the serial code picture of the magnetic hard disk.

Description

Magnetic hard disk sequence code identification method based on machine vision

Technical Field

The invention belongs to the technical field of magnetic hard disks, and particularly relates to a magnetic hard disk sequence code identification method based on machine vision.

Background

Magnetic hard disks are important information storage media for computers, and are mainly branded with Western Data (WD), Shiji (ST), Maxtor (Maxtor), etc., and the capacity has been developed from the first GB level to the current T level. The magnetic hard disk has the characteristics of high efficiency, stability, convenience and the like for recording and transmitting information, and thus, a user faces the safety problems that privacy information in the hard disk is illegally stolen, copied, used and the like. Therefore, the information destruction of the waste magnetic hard disk is usually performed by adopting demagnetization treatment. At present, commercial demagnetizing equipment is put into use in the market, such as an XBC series demagnetizer, an FD series demagnetizer and the like in the security and insurance industry, and the demagnetizing machines adopt the same demagnetizing parameters to demagnetize magnetic hard disks of different brands and different ages according to standards. However, in the magnetic hard disks of different brands, specifications and ages, the coercive force is different due to the difference of the magnetic recording mode and the magnetic material, so that the parameters for realizing the complete demagnetization of different magnetic hard disks are different. Therefore, the information destruction efficiency can be improved by adopting customized demagnetization parameters for magnetic hard disks of various brands and specifications. To implement customized degaussing, the identification code of the magnetic hard disk needs to be acquired first. The identification code information on the surface of the magnetic hard disk has difference due to different brands, but all the identification code information comprises basic information such as brand identification, Serial Number (SN), MODEL number (MODEL), capacity, production time and the like, and the identification method provides possibility for identifying different magnetic hard disks. The manual distinguishing and manual inputting of information has the defects of low speed, high error possibility and the like, and the identification of the magnetic hard disk identification code is urgently needed to be realized by adopting an intelligent means. Therefore, the machine vision-based magnetic hard disk serial code recognition is carried out aiming at the characteristics of the magnetic hard disk identification code image, and the essence of the machine vision-based magnetic hard disk serial code recognition is a character recognition technology in image recognition.

An optical Character recognition technology ocr (optical Character recognition) is widely applied to digitalization of paper documents, and the principle of the technology is that after picture information of books and documents is obtained by scanning or photographing, pictures and scanning pieces are recognized, and the picture information is converted into characters. The applied fields include print manuscript identification, invoice identification, license plate identification and the like. The recognition processes commonly used by the OCR technology are image processing, character cutting, feature extraction, and character recognition. A character feature extraction technique is provided. Aiming at the character feature extraction technology, the feature extraction mode based on the character structure shape comprises transverse line features, crossing frequency features and the like, but the structural features such as upper and lower transverse lines, closed circle number and the like only have a good effect on digital classification. The method for counting the grid features of the characters is fast in calculation speed and easy to implement, but is usually used as a character fusion feature together with other features. Compared with other feature description modes, the HOG feature descriptor has the advantages that the operation part is a local grid unit of the image, and the invariance of the optical deformation and the geometric deformation of the image is kept well. The character recognition model comprises a neural network, a random forest, a Support Vector Machine (SVM) and the like. The SVM has a good effect in solving the problems of small samples, nonlinearity and high-dimensional pattern recognition. From the above analysis, the recognition research of magnetic hard disk images based on machine vision has not been reported. Due to the complexity of the magnetic hard disk image information and layout format, only sequence code (SN) identification is studied herein.

Disclosure of Invention

The demagnetization processing of the waste magnetic hard disk is a common information destruction means, and the demagnetization processing is customized for different magnetic hard disks so as to improve the information destruction efficiency. Under the scene, the identification information of the waste magnetic hard disk needs to be quickly and accurately collected and recorded. In view of the above-mentioned needs, a magnetic hard disk serial code identification method based on machine vision is proposed herein. Firstly, preprocessing a magnetic hard disk sequence code image, then, carrying out character segmentation of a sequence code, then, extracting HOG characteristics from segmented characters, and finally, adopting a Support Vector Machine (SVM) algorithm to construct a recognition model, and verifying the effectiveness of the method through a magnetic hard disk sequence code image recognition experiment.

Drawings

FIG. 1(a) a Seagate magnetic hard disk;

FIG. 1(b) Western data magnetic hard disk;

FIG. 1(c) is a Mylar magnetic hard disk;

FIG. 2 shows a Seagate hard disk ID information picture with MODEL different SN codes;

FIG. 3 is a flow chart of magnetic hard disk serial number identification;

FIG. 4(a) is a graph of the effect of gray scale image preprocessing;

FIG. 4(b) is a diagram of the effect of the pre-processing of the binary image;

FIG. 4(c) is a diagram of the effect of denoising image preprocessing;

FIG. 5 is a horizontal projection effect diagram;

FIG. 6 is a vertical projection effect diagram;

FIG. 7(a) normalized pre-character pictures;

FIG. 7(b) normalized character picture;

figure 8 sliding step is 2 x 2 parametric optimization 3D plot;

figure 9 sliding step 4 x 4 parametric optimization 3D plot;

fig. 10 sliding step 8 x 8 parametric optimization 3D plot.

Detailed Description

Different magnetic hard disk identification information has a certain degree of difference, but generally comprises a sequence code (SN), a MODEL code (MODEL), and the like, such as the magnetic hard disk identification information pictures of three brands of shijie, western data, and shitou shown in fig. 1.

The serial code (SN code) is the factory serial number set by each magnetic hard disk, and is the 'identity card number' of the magnetic hard disk. The serial code (hereinafter, referred to as SN code) is unique for both magnetic hard disks of different brands and magnetic hard disks of the same brand and different capacities. As shown in fig. 2, (a) and (b) are both xijie hard disks, and MODEL numbers (MODEL codes) are the same and different in SN code. The SN codes of the magnetic hard disk are all formed by combining Arabic numbers and English letters and are regular print characters, so that the identification of the magnetic hard disk sequence codes is substantially effective to the Arabic numbers and the English letters of the prints.

The magnetic hard disk sequence code identification method based on machine vision comprises four modules of hard disk sequence code image preprocessing, hard disk sequence code character segmentation, mixed character feature extraction and hard disk sequence code identification, and is shown in figure 3.

In FIG. 3, { X₁Represents a magnetic hard disk sequence code image set;

representing a magnetic hard disk sequence code image set obtained after image preprocessing;

representing a set of sequence code atomic level character images;

representing a typeface character set comprising typeface numbers and capital letters; z represents a feature file of an atomic-level character image;

indicating the recognition result.

The functions of the modules are as follows:

(1) the hard disk sequence code picture preprocessing module: carrying out graying, binaryzation and denoising treatment on the magnetic hard disk sequence code picture;

(2) hard disk sequence code character segmentation module: performing character segmentation on the processed hard disk sequence code picture through horizontal projection and vertical projection, and normalizing the segmented characters to obtain an atomic-scale picture;

(3) the mixed character feature extraction module: extracting direction gradient Histogram (HOG) features from the atomic level characters and the print characters to obtain a feature file;

(4) hard disk sequence code identification module: and constructing an identification model based on the extracted feature file, and identifying the hard disk sequence code.

Hard disk sequence code picture preprocessing module

Image graying

The gray level image is grayed by adopting a weighted average method, and the R, G, B three components are weighted and averaged by different weights to obtain a reasonable gray level image. The gray value calculation is as follows:

L(s,t)＝0.299*R(s,t)+0.587*G(s,t)+0.114*B(s,t) (1)

wherein, R (s, t), G (s, t), B (s, t) respectively represent the brightness values of the red, green, blue channels at the pixel point (s, t), and L (s, t) represents the gray value at the pixel point (s, t).

Image binarization

The binarized image is a logic array with the values of only 0 and 1, and the complexity of image processing is reduced.

The gray value L (s, t) of each pixel point is first calculated and compared with a given threshold value Thre. The pixel point setting value with the gray value larger than the threshold is 255 (white), and the pixel point setting value smaller than the threshold is 0 (black), as follows:

wherein L (s, t) represents the gray value at the pixel point (s, t), Thre represents the binarization threshold value, and B (s, t) represents the binarized image.

Image denoising

The median filtering is adopted for image denoising processing. The method comprises the steps of defining a two-dimensional template with the length of an odd number K, namely a region of K x K, wherein the K value is usually 3 or 5; and sorting the pixel values in the template area according to the sizes to obtain pixel point median values, and converting the pixel values in the template area into the pixel point median values in the template area so as to eliminate isolated noise points.

Hard disk sequence code character segmentation module

Character segmentation based on projection method

The target information character can be segmented by adopting a projection method. And analyzing the pixel distribution histogram of the picture to find out the boundary points of the adjacent characters for segmentation. The first step is to horizontally project the binary image of the magnetic hard disk identification code, namely to count the pixels of each row to obtain a pixel distribution histogram in the horizontal direction. The cutting is carried out by analyzing the horizontal direction pixel distribution histogram and further determining the starting position and the ending position of each line,

and secondly, vertically projecting the horizontally projected and cut row images, namely counting pixels of each column to obtain a pixel distribution histogram in the vertical direction. And determining the position coordinates of a single character by analyzing the pixel distribution histogram in the vertical direction, and segmenting the line binary image according to the character position coordinates to obtain a single character image.

Character normalization

Since the cut character "fills" the entire image because the cutting criteria are based on the position coordinates of the individual character, the cut character is subjected to normalization processing to improve recognition accuracy, including size normalization and position normalization. The following method is used for character normalization herein.

And setting the sizes of the cut character pictures to be n dimensions x n dimensions (n x n) when the characters are cut, and defining the white gray pictures with the sizes of m dimensions x m dimensions (m x m), wherein m is larger than n. And pasting the n-dimension picture to the m-dimension picture by taking a as an initial position to obtain the atomic-scale picture with consistent size and position.

Mixed character feature extraction module

Feature extraction is performed by using a Histogram of Oriented Gradients (HOG), which is often used in conjunction with a Support Vector Machine (SVM) to train a high-precision target classifier. The method comprises the steps of calculating image gradient, constructing a gradient histogram and Block normalization. Calculating the image gradient:

at each pixel point of the image there is its gradient magnitude and direction. Calculating the gradient g of the direction of the abscissa(s) of the atomic-scale character image according to the formulas (3-1) and (3-2)_s(s, t) gradient g in the direction of ordinate (t)_t(s,t)：

g_s(s,t)＝I(s+1,t)-I(s-1,t) (3-1)

g_t(s,t)＝I(s,t+1)-I(s,t-1) (3-1)

Obtaining the gradient amplitude g (s, t) and the gradient direction value theta (s, t) of each pixel position in the atomic-scale character image according to the formulas (4) and (5):

constructing a gradient histogram:

the atomic-level character image is divided into cells of a plurality of pixels, called cell cells (cells), and directional gradient histograms are constructed in the cells.

The gradient direction is first divided evenly into I directional intervals. Histogram statistics is carried out on gradient directions of all pixels in each direction interval, a weighted projection mode is adopted for statistics, and the weight value can be generally represented through a gradient amplitude value. Calculating the gradient histogram to obtain an I-dimensional feature vector, which is the HOG feature of the cell unit, as follows:

β_j＝[η₁,η₂,…,η_I],i∈[1,I] (6)

wherein eta_iRepresenting a weighted projection of the gradient magnitude on each gradient direction interval; beta is a_jThe HOG feature vector for the jth cell unit.

Block normalization:

several adjacent cell units (cells) are grouped into a compartment, called a block. Concatenating the feature vectors of all the cell units in the block, the HOG features of this block are obtained, as follows:

γ_k＝[β₁,β₂,…,β_J],j∈[1,J] (7)

wherein beta is_jHOG feature vector for jth cell unit; j is the total number of cell units within each block; gamma ray_kRepresenting the HOG feature vector of the k-th block.

The contrast normalization operation is performed in a block (block) using equation (7), as follows:

wherein: gamma ray_kRepresents the vector to be normalized, | | γ_k||²Represents gamma_kNormalized norm,. epsilon.represents a very small constant,. v_kRepresenting the normalized vector.

A block is obtained by sliding a window. And sequentially sliding the character picture by a certain step length from left to right and from top to bottom based on the atomic scale. Obtaining K blocks, and connecting all the normalized feature vectors of the blocks in series to obtain a multi-dimensional feature vector upsilon as follows:

υ＝[v₁,v₂,…,v_k],k∈[1,K] (9)

the HOG descriptor of the whole picture is a feature vector upsilon with dimension I J K composed of histogram components of all cell units in each block.

Hard disk sequence code identification module

A Support Vector Machine (SVM) is adopted to construct a sequence code recognition model. The SVM belongs to a common supervised learning algorithm, and the algorithm can automatically find out a support vector with better distinguishing capability for classification, so that the constructed classifier can maximize the class-to-class interval, and has better adaptability and higher resolution. The SVM type is herein chosen to be C-SVC, i.e., a multi-classifier that uses a penalty factor (Cost) to handle noise.

Step (1), firstly, extracting character HOG characteristics by adopting the Step length of a sliding window as 2 x 2 to obtain a characteristic file Z₁；

Step (2) Next, the obtained profile Z_i(i-1, 2,3) as an input, transmitting the input into the SVM model, and obtaining a training set and a test set by adopting a default proportion;

step (3), traversing the parameter combinations by adopting a grid search method according to the values of the parameters gamma and C, and testing the obtained model by using a test set to obtain the classification accuracy of each group of parameter combinations;

step (4), then, drawing a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy, and analyzing to obtain an optimal parameter combination;

step (5), extracting character HOG characteristics by using the Step length of a sliding window to be 4 x 4 to obtain a characteristic file Z₂And circulating the steps Step (2), Step (3) and Step (4) to obtain an optimal parameter combination aiming at the characteristic;

step (6), extracting character HOG characteristics by using the Step length of a sliding window as 8 x 8 to obtain a characteristic file Z₃And the step of recyclingStep (2), Step (3) and Step (4), and obtaining an optimal parameter combination aiming at the characteristics;

and Step (7), finally, comparing the three sliding Step length obtaining models to obtain the optimal sliding Step length and parameter combination, and storing the training model to obtain the trained SVM training model.

Application verification

Description of test data

The method adopts a mixed data set to carry out model training and testing on the magnetic hard disk sequence code. The mixed data set comprises two parts of printed characters and atomic-level character pictures of magnetic hard disk sequence codes, wherein the total number of the pictures is 36 types of characters, 10 types of Arabic numerals 0-9 are included, 26 types of capital letters A-Z are included, 279 pictures of each type are included, and the total number of samples is 10044.

Results of the experiment

Image preprocessing result

In this document, according to the actual effect, a binarization threshold value Thre is 127, a median filtering two-dimensional template is 3 × 3, and the magnetic hard disk serial code image preprocessing effect is shown in fig. 4 below.

Letter segmentation result

The starting position of each line of characters can be analyzed according to the horizontal projection and used for line text cutting. The effect of the horizontal projection of the magnetic hard disk identification code picture is shown in the following figure 5.

And vertically projecting the cut line text, and determining the position coordinates of each character according to the upper left coordinate point and the lower right coordinate point of the character so as to cut the character. The single-line character vertical projection effect diagram is shown in fig. 6.

In the cutting process, the size of the cut character picture is set to 12 × 12 dimensions, as shown in fig. 7 (a). And meanwhile, setting a 16-by-16 gray value 255 (white) picture, pasting the extracted 12-by-12 character picture on the 16-by-16 picture by taking pixel point coordinates (2,2) as a starting point to obtain an atomic-level character picture with a centered position, and realizing character normalization. As shown in fig. 7 (b).

Feature extraction results

Dividing the atomic-level character image into a plurality of cell units (cells) according to pixel points, carrying out statistics on gradient information by converting the gradient direction into I direction intervals, carrying out weighted projection on each pixel in the cell in a histogram by using the gradient direction to obtain an I-dimensional feature vector corresponding to the cell, and adopting unsigned gradient, namely the gradient direction value is between 0 and 180 degrees. If each block (block) has J cell units, K blocks are obtained by traversing the whole picture through a sliding window, and the HOG feature extracted from one atomic-level character image is a feature vector of 1 x (I x J x K).

The size of the character picture of the data set is 16 × 16, a region of 4 × 4 pixels is taken as a cell, I is taken as 9, and the gradient information of the cell is counted by adopting a 9bin histogram channel. 2 x 2 cells as a block, there are 4 cells in each block. If the sliding step (unit: pixel) of the block is 2 x 2, traversing the whole picture through a sliding window to obtain 25 blocks, namely I, J, K values are respectively 9, 4 and 25, and finally obtaining a feature vector with the HOG feature of an atomic-level character image being 1 x 900 dimension. Different K values can be obtained by changing the sliding step length so as to achieve the purpose of reducing the vector dimension. For example, if the sliding step size of each block is changed to 4 × 4, 9 blocks are obtained by traversing the whole picture, that is, I, J, K takes values of 9, 4 and 9 respectively, and the final obtained HOG feature vector is 1 × 324 dimensional; if the sliding step size is changed to 8 × 8, 4 blocks are obtained by traversing the whole picture, and the final obtained HOG feature vector is 1 × 144 dimension. And finally, forming an HOG feature file by using different HOG feature vectors obtained by different atomic-level characters, and inputting the HOG feature file into the SVM for recognition model training.

Analysis of recognition results

Herein, the SVM type selects C-SVC, the kernel function selects RBF (radial kernel function), and the model training parameters are gamma and C. And (3) performing parameter optimization by adopting a grid search method, wherein the gamma value is {0.01,0.1,1,10,100}, and the C value is {0.01,0.1,1,10,100 }. The parameter optimization results are shown in fig. 8-10 according to different sliding window step sizes. Wherein the x-axis represents the value of the parameter gamma, the y-axis represents the value of the parameter C, and the z-axis represents the prediction accuracy of the model. And according to the parameter value taking characteristics, logarithm processing is carried out on the x axis and the y axis, so that the imaging is more visual and convenient to analyze.

When the HOG features are extracted, the step lengths of the sliding windows are different, and the obtained classifier model parameters are also different. Table 1 compares the classifier models obtained for different sliding step lengths.

TABLE 1 comparison of classifier model results based on different sliding step lengths

As can be seen from Table 1: the window sliding step is 2 x 2, the feature vector dimension is 900, the optimal values of gamma and C are {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 4 x 4, the feature vector dimension is 324, the optimal values of gamma and C are still {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 8 x 8, the feature vector dimension is 144, the gamma and C are optimally {1,100}, and the accuracy can reach 0.98.

The experiment adopts an HOG characteristic extraction method to compare with a traditional pixel statistical characteristic extraction method, and the magnetic hard disk sequence code is identified. The experiment is the identification effect of the HOG characteristic extraction method with different sliding step lengths, and the experiment is the identification effect obtained by adopting the traditional pixel statistical characteristic extraction method. The magnetic hard disk serial number is Y203TK1E, and the identification result is shown in Table 2.

TABLE 2 sequence code identification results

As can be seen from Table 2: for the same magnetic hard disk sequence code, the fourth group of experiments have obvious error identification phenomena,

numbers

0 and 1 are identified as letters O, I by error, and the identification effect is poor; the similar character misidentification phenomenon appears in the experiments of the first group and the third group, and the number 0 is misidentified as a capital letter O; in the second group of experiments, the window sliding step length is 4 × 4, and accurate identification can be realized. The dimension of the feature vector decreases as the sliding step increases, which also shortens the recognition time. When the sliding step size is increased from 2 x 2 to 4 x 4, the feature vector dimension is reduced from 900 dimensions to 324 dimensions, the recognition time is greatly reduced, and the recognition time is shortened from 1.15391s to 0.36998 s; when the step size was further increased to 8 x 8, the recognition time was shortened to 0.23237 s. The experimental result shows that when the sliding step length is 4 x 4, the recognition effect is best, and the recognition time is also obviously shortened. The result also proves the effectiveness and feasibility of the HOG characteristic and the SVM in combination for magnetic hard disk sequence code recognition, and the recognition process needs to be further optimized.

Aiming at the characteristics of the sequence code in the magnetic hard disk image, the magnetic hard disk sequence code identification method based on machine vision is provided. Compared with the traditional characteristic extraction mode of counting pixel points, the HOG characteristic descriptor can better describe the appearance and the shape of a local target and better distinguish similar characters or similar characteristics. The machine vision is applied to the recognition of the magnetic hard disk sequence code for the first time, and the atomic-level characters and print characters of the magnetic hard disk sequence code are mixed to perform recognition model training; and extracting HOG characteristics of the characters to form a characteristic file for sequence code recognition. Experimental results show that the HOG characteristic combined with the SVM classifier has good effect in the identification application of the magnetic hard disk sequence code. Aiming at the characteristics of the identification code picture of the magnetic hard disk, the following research contents comprise two parts: collecting pictures of magnetic hard disks of different brands, and effectively identifying brand identifications; and analyzing the information typesetting characteristics of the magnetic hard disk identification code, and accurately positioning and identifying the effective information.

Claims

1. a magnetic hard disk serial code identification method based on machine vision, is characterized in that,

Step 1. Preprocess the serial code picture of the magnetic hard disk;

Step 2, perform character segmentation of the sequence code, and then perform HOG feature extraction on the segmented characters;

In step 3, a support vector machine (SVM) algorithm is used to construct a recognition model to recognize the serial code picture of the magnetic hard disk.

2. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, is characterized in that, in step 2, carry out character segmentation based on projection method and comprise:

The first step is to first perform a horizontal projection on the binarized image of the magnetic hard disk identification code, that is, perform statistics on the pixels of each row to obtain a histogram of pixel distribution in the horizontal direction;

The second step is to perform vertical projection on the line image cut by the horizontal projection, that is, to count the pixels in each column to obtain a histogram of pixel distribution in the vertical direction; determine the position coordinates of a single character by analyzing the histogram of pixel distribution in the vertical direction The position coordinates are used to segment the row binary image to obtain a single character image;

Wherein, in the character cutting, the size of the cut character picture is uniformly set as n-dimension*n-dimension (n*n), and the defined size is a white grayscale picture of m-dimension*m-dimension (m*m), where m>n; Using a*a as the starting position, paste the n*n-dimensional image onto the m*m image to obtain atomic-level images with the same size and position.

3. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, is characterized in that, in step 2, mixed character feature extraction comprises three steps of calculating image gradient, building gradient histogram, Block normalization, and is specially:

(1) Calculate the image gradient:

Formulas (3-1) and (3-2) are used to calculate the abscissa (s) direction gradient g _s (s, t) and the ordinate (t) direction gradient g _t (s, t) of the atomic character image:

g _s (s,t)=I(s+1,t)-I(s-1,t) (3-1)

g _t (s,t)=I(s,t+1)-I(s,t-1) (3-1)

From equations (4) and (5), the gradient magnitude g(s,t) of each pixel position in the atomic character image, and the gradient direction value θ(s,t) are obtained:

(2) Build a gradient histogram:

The atomic-level character image is divided into units of several pixels, called cells, and the directional gradient histogram is constructed in the cells.

First, the gradient direction is equally divided into I direction intervals, and histogram statistics are performed on the gradient directions of all pixels in each direction interval. The statistics are in the form of weighted projection, and the weight is represented by the gradient magnitude; by calculating the gradient histogram, Obtain an I-dimensional feature vector, which is the HOG feature of this cell unit, as shown below:

β _j =[η ₁ ,η ₂ ,...,η _I ],i∈[1,I] (6)

Among them, η _i represents the weighted projection of the gradient magnitude on each gradient direction interval; β _j is the HOG feature vector of the jth cell unit;

(3) Block normalization:

A number of adjacent cells are formed into an interval, called a block, and the feature vectors of all cells in the block are concatenated to obtain the HOG feature of the block, as shown below:

γ _k =[β ₁ ,β ₂ ,…,β _J ],j∈[1,J] (7)

Among them, β _j is the HOG feature vector of the jth cell unit; J is the total number of cell units in each block; γ _k represents the HOG feature vector of the kth block;

In the block (block), formula (7) is used to perform the contrast normalization operation, as follows:

Where: γ _k represents the vector to be normalized, || γ _k || ² represents the normalized norm of γ _k , ε represents a very small constant, v _k represents the normalized vector,

HOG obtains blocks by sliding windows. Based on atomic-level character images, slide a certain step size from left to right and top to bottom to obtain K blocks, and concatenate the normalized feature vectors of all blocks to obtain multi-dimensional The eigenvector υ is as follows:

υ=[v ₁ ,v ₂ ,…,v _k ],k∈[1,K] (9)

The HOG descriptor of the whole picture is a feature vector υ with dimension I*J*K composed of the histogram components of all cell units in each block.

4. the magnetic hard disk serial code identification method based on machine vision as claimed in claim 1, it is characterized in that, step 3 specifically comprises: select SVM type to be C-SVC, namely use the multi-classifier of penalty factor (Cost) to process noise, concrete for:

Step (1). First, adopt the sliding window step size of 2*2 to extract the character HOG feature, and obtain the feature file Z ₁ ;

Step (2). Next, the obtained feature file Z _i (i=1, 2, 3) is input to the SVM model, and the training set and the test set are obtained by using the default ratio;

Step (3). Next, according to the values of parameters gamma and C, the grid search method is used to traverse the parameter combination, and the obtained model is tested with the test set to obtain the classification accuracy of each parameter combination;

Step (4). Next, draw a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy rate, and analyze to obtain the optimal parameter combination;

Step(5). Next, use the sliding window step size of 4*4 to extract the character HOG feature to obtain the feature file Z ₂ , and loop the steps Step(2), Step(3), Step(4) to obtain the feature for this feature The optimal parameter combination of ;

Step(6). Next, use the sliding window step size of 8*8 to extract the character HOG feature to obtain the feature file Z ₃ , and loop the steps Step(2), Step(3), Step(4) to obtain the feature for this feature The optimal parameter combination of ;

Step (7). Finally, compare the models obtained by the three sliding step sizes, obtain the optimal sliding step size and parameter combination, save the training model, and obtain the trained SVM training model.