CN112348026A - Magnetic hard disk sequence code identification method based on machine vision - Google Patents
Magnetic hard disk sequence code identification method based on machine vision Download PDFInfo
- Publication number
- CN112348026A CN112348026A CN202011235039.3A CN202011235039A CN112348026A CN 112348026 A CN112348026 A CN 112348026A CN 202011235039 A CN202011235039 A CN 202011235039A CN 112348026 A CN112348026 A CN 112348026A
- Authority
- CN
- China
- Prior art keywords
- hard disk
- character
- gradient
- magnetic hard
- sequence code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012706 support-vector machine Methods 0.000 claims abstract description 24
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 34
- 238000010606 normalization Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 description 16
- 238000002474 experimental method Methods 0.000 description 8
- 238000005520 cutting process Methods 0.000 description 5
- 230000005347 demagnetization Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000006378 damage Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 229920002799 BoPET Polymers 0.000 description 1
- 239000005041 Mylar™ Substances 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000696 magnetic material Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a magnetic hard disk sequence code recognition method based on machine vision, which comprises the following steps of 1, preprocessing a magnetic hard disk sequence code picture; step 2, carrying out character segmentation of the sequence code, and then carrying out HOG feature extraction on the segmented characters; and 3, constructing an identification model by adopting a Support Vector Machine (SVM) algorithm, and identifying the sequence code picture of the magnetic hard disk.
Description
Technical Field
The invention belongs to the technical field of magnetic hard disks, and particularly relates to a magnetic hard disk sequence code identification method based on machine vision.
Background
Magnetic hard disks are important information storage media for computers, and are mainly branded with Western Data (WD), Shiji (ST), Maxtor (Maxtor), etc., and the capacity has been developed from the first GB level to the current T level. The magnetic hard disk has the characteristics of high efficiency, stability, convenience and the like for recording and transmitting information, and thus, a user faces the safety problems that privacy information in the hard disk is illegally stolen, copied, used and the like. Therefore, the information destruction of the waste magnetic hard disk is usually performed by adopting demagnetization treatment. At present, commercial demagnetizing equipment is put into use in the market, such as an XBC series demagnetizer, an FD series demagnetizer and the like in the security and insurance industry, and the demagnetizing machines adopt the same demagnetizing parameters to demagnetize magnetic hard disks of different brands and different ages according to standards. However, in the magnetic hard disks of different brands, specifications and ages, the coercive force is different due to the difference of the magnetic recording mode and the magnetic material, so that the parameters for realizing the complete demagnetization of different magnetic hard disks are different. Therefore, the information destruction efficiency can be improved by adopting customized demagnetization parameters for magnetic hard disks of various brands and specifications. To implement customized degaussing, the identification code of the magnetic hard disk needs to be acquired first. The identification code information on the surface of the magnetic hard disk has difference due to different brands, but all the identification code information comprises basic information such as brand identification, Serial Number (SN), MODEL number (MODEL), capacity, production time and the like, and the identification method provides possibility for identifying different magnetic hard disks. The manual distinguishing and manual inputting of information has the defects of low speed, high error possibility and the like, and the identification of the magnetic hard disk identification code is urgently needed to be realized by adopting an intelligent means. Therefore, the machine vision-based magnetic hard disk serial code recognition is carried out aiming at the characteristics of the magnetic hard disk identification code image, and the essence of the machine vision-based magnetic hard disk serial code recognition is a character recognition technology in image recognition.
An optical Character recognition technology ocr (optical Character recognition) is widely applied to digitalization of paper documents, and the principle of the technology is that after picture information of books and documents is obtained by scanning or photographing, pictures and scanning pieces are recognized, and the picture information is converted into characters. The applied fields include print manuscript identification, invoice identification, license plate identification and the like. The recognition processes commonly used by the OCR technology are image processing, character cutting, feature extraction, and character recognition. A character feature extraction technique is provided. Aiming at the character feature extraction technology, the feature extraction mode based on the character structure shape comprises transverse line features, crossing frequency features and the like, but the structural features such as upper and lower transverse lines, closed circle number and the like only have a good effect on digital classification. The method for counting the grid features of the characters is fast in calculation speed and easy to implement, but is usually used as a character fusion feature together with other features. Compared with other feature description modes, the HOG feature descriptor has the advantages that the operation part is a local grid unit of the image, and the invariance of the optical deformation and the geometric deformation of the image is kept well. The character recognition model comprises a neural network, a random forest, a Support Vector Machine (SVM) and the like. The SVM has a good effect in solving the problems of small samples, nonlinearity and high-dimensional pattern recognition. From the above analysis, the recognition research of magnetic hard disk images based on machine vision has not been reported. Due to the complexity of the magnetic hard disk image information and layout format, only sequence code (SN) identification is studied herein.
Disclosure of Invention
The demagnetization processing of the waste magnetic hard disk is a common information destruction means, and the demagnetization processing is customized for different magnetic hard disks so as to improve the information destruction efficiency. Under the scene, the identification information of the waste magnetic hard disk needs to be quickly and accurately collected and recorded. In view of the above-mentioned needs, a magnetic hard disk serial code identification method based on machine vision is proposed herein. Firstly, preprocessing a magnetic hard disk sequence code image, then, carrying out character segmentation of a sequence code, then, extracting HOG characteristics from segmented characters, and finally, adopting a Support Vector Machine (SVM) algorithm to construct a recognition model, and verifying the effectiveness of the method through a magnetic hard disk sequence code image recognition experiment.
Drawings
FIG. 1(a) a Seagate magnetic hard disk;
FIG. 1(b) Western data magnetic hard disk;
FIG. 1(c) is a Mylar magnetic hard disk;
FIG. 2 shows a Seagate hard disk ID information picture with MODEL different SN codes;
FIG. 3 is a flow chart of magnetic hard disk serial number identification;
FIG. 4(a) is a graph of the effect of gray scale image preprocessing;
FIG. 4(b) is a diagram of the effect of the pre-processing of the binary image;
FIG. 4(c) is a diagram of the effect of denoising image preprocessing;
FIG. 5 is a horizontal projection effect diagram;
FIG. 6 is a vertical projection effect diagram;
FIG. 7(a) normalized pre-character pictures;
FIG. 7(b) normalized character picture;
figure 8 sliding step is 2 x 2 parametric optimization 3D plot;
figure 9 sliding step 4 x 4 parametric optimization 3D plot;
fig. 10 sliding step 8 x 8 parametric optimization 3D plot.
Detailed Description
Different magnetic hard disk identification information has a certain degree of difference, but generally comprises a sequence code (SN), a MODEL code (MODEL), and the like, such as the magnetic hard disk identification information pictures of three brands of shijie, western data, and shitou shown in fig. 1.
The serial code (SN code) is the factory serial number set by each magnetic hard disk, and is the 'identity card number' of the magnetic hard disk. The serial code (hereinafter, referred to as SN code) is unique for both magnetic hard disks of different brands and magnetic hard disks of the same brand and different capacities. As shown in fig. 2, (a) and (b) are both xijie hard disks, and MODEL numbers (MODEL codes) are the same and different in SN code. The SN codes of the magnetic hard disk are all formed by combining Arabic numbers and English letters and are regular print characters, so that the identification of the magnetic hard disk sequence codes is substantially effective to the Arabic numbers and the English letters of the prints.
The magnetic hard disk sequence code identification method based on machine vision comprises four modules of hard disk sequence code image preprocessing, hard disk sequence code character segmentation, mixed character feature extraction and hard disk sequence code identification, and is shown in figure 3.
In FIG. 3, { X1Represents a magnetic hard disk sequence code image set;representing a magnetic hard disk sequence code image set obtained after image preprocessing;representing a set of sequence code atomic level character images;representing a typeface character set comprising typeface numbers and capital letters; z represents a feature file of an atomic-level character image;indicating the recognition result.
The functions of the modules are as follows:
(1) the hard disk sequence code picture preprocessing module: carrying out graying, binaryzation and denoising treatment on the magnetic hard disk sequence code picture;
(2) hard disk sequence code character segmentation module: performing character segmentation on the processed hard disk sequence code picture through horizontal projection and vertical projection, and normalizing the segmented characters to obtain an atomic-scale picture;
(3) the mixed character feature extraction module: extracting direction gradient Histogram (HOG) features from the atomic level characters and the print characters to obtain a feature file;
(4) hard disk sequence code identification module: and constructing an identification model based on the extracted feature file, and identifying the hard disk sequence code.
Hard disk sequence code picture preprocessing module
Image graying
The gray level image is grayed by adopting a weighted average method, and the R, G, B three components are weighted and averaged by different weights to obtain a reasonable gray level image. The gray value calculation is as follows:
L(s,t)=0.299*R(s,t)+0.587*G(s,t)+0.114*B(s,t) (1)
wherein, R (s, t), G (s, t), B (s, t) respectively represent the brightness values of the red, green, blue channels at the pixel point (s, t), and L (s, t) represents the gray value at the pixel point (s, t).
Image binarization
The binarized image is a logic array with the values of only 0 and 1, and the complexity of image processing is reduced.
The gray value L (s, t) of each pixel point is first calculated and compared with a given threshold value Thre. The pixel point setting value with the gray value larger than the threshold is 255 (white), and the pixel point setting value smaller than the threshold is 0 (black), as follows:
wherein L (s, t) represents the gray value at the pixel point (s, t), Thre represents the binarization threshold value, and B (s, t) represents the binarized image.
Image denoising
The median filtering is adopted for image denoising processing. The method comprises the steps of defining a two-dimensional template with the length of an odd number K, namely a region of K x K, wherein the K value is usually 3 or 5; and sorting the pixel values in the template area according to the sizes to obtain pixel point median values, and converting the pixel values in the template area into the pixel point median values in the template area so as to eliminate isolated noise points.
Hard disk sequence code character segmentation module
Character segmentation based on projection method
The target information character can be segmented by adopting a projection method. And analyzing the pixel distribution histogram of the picture to find out the boundary points of the adjacent characters for segmentation. The first step is to horizontally project the binary image of the magnetic hard disk identification code, namely to count the pixels of each row to obtain a pixel distribution histogram in the horizontal direction. The cutting is carried out by analyzing the horizontal direction pixel distribution histogram and further determining the starting position and the ending position of each line,
and secondly, vertically projecting the horizontally projected and cut row images, namely counting pixels of each column to obtain a pixel distribution histogram in the vertical direction. And determining the position coordinates of a single character by analyzing the pixel distribution histogram in the vertical direction, and segmenting the line binary image according to the character position coordinates to obtain a single character image.
Character normalization
Since the cut character "fills" the entire image because the cutting criteria are based on the position coordinates of the individual character, the cut character is subjected to normalization processing to improve recognition accuracy, including size normalization and position normalization. The following method is used for character normalization herein.
And setting the sizes of the cut character pictures to be n dimensions x n dimensions (n x n) when the characters are cut, and defining the white gray pictures with the sizes of m dimensions x m dimensions (m x m), wherein m is larger than n. And pasting the n-dimension picture to the m-dimension picture by taking a as an initial position to obtain the atomic-scale picture with consistent size and position.
Mixed character feature extraction module
Feature extraction is performed by using a Histogram of Oriented Gradients (HOG), which is often used in conjunction with a Support Vector Machine (SVM) to train a high-precision target classifier. The method comprises the steps of calculating image gradient, constructing a gradient histogram and Block normalization. Calculating the image gradient:
at each pixel point of the image there is its gradient magnitude and direction. Calculating the gradient g of the direction of the abscissa(s) of the atomic-scale character image according to the formulas (3-1) and (3-2)s(s, t) gradient g in the direction of ordinate (t)t(s,t):
gs(s,t)=I(s+1,t)-I(s-1,t) (3-1)
gt(s,t)=I(s,t+1)-I(s,t-1) (3-1)
Obtaining the gradient amplitude g (s, t) and the gradient direction value theta (s, t) of each pixel position in the atomic-scale character image according to the formulas (4) and (5):
constructing a gradient histogram:
the atomic-level character image is divided into cells of a plurality of pixels, called cell cells (cells), and directional gradient histograms are constructed in the cells.
The gradient direction is first divided evenly into I directional intervals. Histogram statistics is carried out on gradient directions of all pixels in each direction interval, a weighted projection mode is adopted for statistics, and the weight value can be generally represented through a gradient amplitude value. Calculating the gradient histogram to obtain an I-dimensional feature vector, which is the HOG feature of the cell unit, as follows:
βj=[η1,η2,…,ηI],i∈[1,I] (6)
wherein etaiRepresenting a weighted projection of the gradient magnitude on each gradient direction interval; beta is ajThe HOG feature vector for the jth cell unit.
Block normalization:
several adjacent cell units (cells) are grouped into a compartment, called a block. Concatenating the feature vectors of all the cell units in the block, the HOG features of this block are obtained, as follows:
γk=[β1,β2,…,βJ],j∈[1,J] (7)
wherein beta isjHOG feature vector for jth cell unit; j is the total number of cell units within each block; gamma raykRepresenting the HOG feature vector of the k-th block.
The contrast normalization operation is performed in a block (block) using equation (7), as follows:
wherein: gamma raykRepresents the vector to be normalized, | | γk||2Represents gammakNormalized norm,. epsilon.represents a very small constant,. vkRepresenting the normalized vector.
A block is obtained by sliding a window. And sequentially sliding the character picture by a certain step length from left to right and from top to bottom based on the atomic scale. Obtaining K blocks, and connecting all the normalized feature vectors of the blocks in series to obtain a multi-dimensional feature vector upsilon as follows:
υ=[v1,v2,…,vk],k∈[1,K] (9)
the HOG descriptor of the whole picture is a feature vector upsilon with dimension I J K composed of histogram components of all cell units in each block.
Hard disk sequence code identification module
A Support Vector Machine (SVM) is adopted to construct a sequence code recognition model. The SVM belongs to a common supervised learning algorithm, and the algorithm can automatically find out a support vector with better distinguishing capability for classification, so that the constructed classifier can maximize the class-to-class interval, and has better adaptability and higher resolution. The SVM type is herein chosen to be C-SVC, i.e., a multi-classifier that uses a penalty factor (Cost) to handle noise.
Step (1), firstly, extracting character HOG characteristics by adopting the Step length of a sliding window as 2 x 2 to obtain a characteristic file Z1;
Step (2) Next, the obtained profile Zi(i-1, 2,3) as an input, transmitting the input into the SVM model, and obtaining a training set and a test set by adopting a default proportion;
step (3), traversing the parameter combinations by adopting a grid search method according to the values of the parameters gamma and C, and testing the obtained model by using a test set to obtain the classification accuracy of each group of parameter combinations;
step (4), then, drawing a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy, and analyzing to obtain an optimal parameter combination;
step (5), extracting character HOG characteristics by using the Step length of a sliding window to be 4 x 4 to obtain a characteristic file Z2And circulating the steps Step (2), Step (3) and Step (4) to obtain an optimal parameter combination aiming at the characteristic;
step (6), extracting character HOG characteristics by using the Step length of a sliding window as 8 x 8 to obtain a characteristic file Z3And the step of recyclingStep (2), Step (3) and Step (4), and obtaining an optimal parameter combination aiming at the characteristics;
and Step (7), finally, comparing the three sliding Step length obtaining models to obtain the optimal sliding Step length and parameter combination, and storing the training model to obtain the trained SVM training model.
Application verification
Description of test data
The method adopts a mixed data set to carry out model training and testing on the magnetic hard disk sequence code. The mixed data set comprises two parts of printed characters and atomic-level character pictures of magnetic hard disk sequence codes, wherein the total number of the pictures is 36 types of characters, 10 types of Arabic numerals 0-9 are included, 26 types of capital letters A-Z are included, 279 pictures of each type are included, and the total number of samples is 10044.
Results of the experiment
Image preprocessing result
In this document, according to the actual effect, a binarization threshold value Thre is 127, a median filtering two-dimensional template is 3 × 3, and the magnetic hard disk serial code image preprocessing effect is shown in fig. 4 below.
Letter segmentation result
The starting position of each line of characters can be analyzed according to the horizontal projection and used for line text cutting. The effect of the horizontal projection of the magnetic hard disk identification code picture is shown in the following figure 5.
And vertically projecting the cut line text, and determining the position coordinates of each character according to the upper left coordinate point and the lower right coordinate point of the character so as to cut the character. The single-line character vertical projection effect diagram is shown in fig. 6.
In the cutting process, the size of the cut character picture is set to 12 × 12 dimensions, as shown in fig. 7 (a). And meanwhile, setting a 16-by-16 gray value 255 (white) picture, pasting the extracted 12-by-12 character picture on the 16-by-16 picture by taking pixel point coordinates (2,2) as a starting point to obtain an atomic-level character picture with a centered position, and realizing character normalization. As shown in fig. 7 (b).
Feature extraction results
Dividing the atomic-level character image into a plurality of cell units (cells) according to pixel points, carrying out statistics on gradient information by converting the gradient direction into I direction intervals, carrying out weighted projection on each pixel in the cell in a histogram by using the gradient direction to obtain an I-dimensional feature vector corresponding to the cell, and adopting unsigned gradient, namely the gradient direction value is between 0 and 180 degrees. If each block (block) has J cell units, K blocks are obtained by traversing the whole picture through a sliding window, and the HOG feature extracted from one atomic-level character image is a feature vector of 1 x (I x J x K).
The size of the character picture of the data set is 16 × 16, a region of 4 × 4 pixels is taken as a cell, I is taken as 9, and the gradient information of the cell is counted by adopting a 9bin histogram channel. 2 x 2 cells as a block, there are 4 cells in each block. If the sliding step (unit: pixel) of the block is 2 x 2, traversing the whole picture through a sliding window to obtain 25 blocks, namely I, J, K values are respectively 9, 4 and 25, and finally obtaining a feature vector with the HOG feature of an atomic-level character image being 1 x 900 dimension. Different K values can be obtained by changing the sliding step length so as to achieve the purpose of reducing the vector dimension. For example, if the sliding step size of each block is changed to 4 × 4, 9 blocks are obtained by traversing the whole picture, that is, I, J, K takes values of 9, 4 and 9 respectively, and the final obtained HOG feature vector is 1 × 324 dimensional; if the sliding step size is changed to 8 × 8, 4 blocks are obtained by traversing the whole picture, and the final obtained HOG feature vector is 1 × 144 dimension. And finally, forming an HOG feature file by using different HOG feature vectors obtained by different atomic-level characters, and inputting the HOG feature file into the SVM for recognition model training.
Analysis of recognition results
Herein, the SVM type selects C-SVC, the kernel function selects RBF (radial kernel function), and the model training parameters are gamma and C. And (3) performing parameter optimization by adopting a grid search method, wherein the gamma value is {0.01,0.1,1,10,100}, and the C value is {0.01,0.1,1,10,100 }. The parameter optimization results are shown in fig. 8-10 according to different sliding window step sizes. Wherein the x-axis represents the value of the parameter gamma, the y-axis represents the value of the parameter C, and the z-axis represents the prediction accuracy of the model. And according to the parameter value taking characteristics, logarithm processing is carried out on the x axis and the y axis, so that the imaging is more visual and convenient to analyze.
When the HOG features are extracted, the step lengths of the sliding windows are different, and the obtained classifier model parameters are also different. Table 1 compares the classifier models obtained for different sliding step lengths.
TABLE 1 comparison of classifier model results based on different sliding step lengths
As can be seen from Table 1: the window sliding step is 2 x 2, the feature vector dimension is 900, the optimal values of gamma and C are {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 4 x 4, the feature vector dimension is 324, the optimal values of gamma and C are still {0.1,100}, and the accuracy can reach 0.98; when the window sliding step size is 8 x 8, the feature vector dimension is 144, the gamma and C are optimally {1,100}, and the accuracy can reach 0.98.
The experiment adopts an HOG characteristic extraction method to compare with a traditional pixel statistical characteristic extraction method, and the magnetic hard disk sequence code is identified. The experiment is the identification effect of the HOG characteristic extraction method with different sliding step lengths, and the experiment is the identification effect obtained by adopting the traditional pixel statistical characteristic extraction method. The magnetic hard disk serial number is Y203TK1E, and the identification result is shown in Table 2.
TABLE 2 sequence code identification results
As can be seen from Table 2: for the same magnetic hard disk sequence code, the fourth group of experiments have obvious error identification phenomena, numbers 0 and 1 are identified as letters O, I by error, and the identification effect is poor; the similar character misidentification phenomenon appears in the experiments of the first group and the third group, and the number 0 is misidentified as a capital letter O; in the second group of experiments, the window sliding step length is 4 × 4, and accurate identification can be realized. The dimension of the feature vector decreases as the sliding step increases, which also shortens the recognition time. When the sliding step size is increased from 2 x 2 to 4 x 4, the feature vector dimension is reduced from 900 dimensions to 324 dimensions, the recognition time is greatly reduced, and the recognition time is shortened from 1.15391s to 0.36998 s; when the step size was further increased to 8 x 8, the recognition time was shortened to 0.23237 s. The experimental result shows that when the sliding step length is 4 x 4, the recognition effect is best, and the recognition time is also obviously shortened. The result also proves the effectiveness and feasibility of the HOG characteristic and the SVM in combination for magnetic hard disk sequence code recognition, and the recognition process needs to be further optimized.
Aiming at the characteristics of the sequence code in the magnetic hard disk image, the magnetic hard disk sequence code identification method based on machine vision is provided. Compared with the traditional characteristic extraction mode of counting pixel points, the HOG characteristic descriptor can better describe the appearance and the shape of a local target and better distinguish similar characters or similar characteristics. The machine vision is applied to the recognition of the magnetic hard disk sequence code for the first time, and the atomic-level characters and print characters of the magnetic hard disk sequence code are mixed to perform recognition model training; and extracting HOG characteristics of the characters to form a characteristic file for sequence code recognition. Experimental results show that the HOG characteristic combined with the SVM classifier has good effect in the identification application of the magnetic hard disk sequence code. Aiming at the characteristics of the identification code picture of the magnetic hard disk, the following research contents comprise two parts: collecting pictures of magnetic hard disks of different brands, and effectively identifying brand identifications; and analyzing the information typesetting characteristics of the magnetic hard disk identification code, and accurately positioning and identifying the effective information.
Claims (4)
1. A magnetic hard disk sequence code recognition method based on machine vision is characterized in that,
step 1, preprocessing a sequence code picture of a magnetic hard disk;
step 2, carrying out character segmentation of the sequence code, and then carrying out HOG feature extraction on the segmented characters;
and 3, constructing an identification model by adopting a Support Vector Machine (SVM) algorithm, and identifying the sequence code picture of the magnetic hard disk.
2. The machine vision-based magnetic hard disk serial code recognition method of claim 1, wherein the character segmentation based on the projection method in the step 2 comprises:
firstly, horizontally projecting a binary image of a magnetic hard disk identification code, namely counting pixels of each row to obtain a pixel distribution histogram in the horizontal direction;
secondly, vertically projecting the horizontal projection cut row image, namely counting pixels of each row to obtain a pixel distribution histogram in the vertical direction; determining the position coordinates of a single character by analyzing a vertical direction pixel distribution histogram, and segmenting a row binary image according to the character position coordinates to obtain a single character image;
setting the sizes of the cut character pictures to be n dimensions x n dimensions (n x n), and defining a white gray picture with the size of m dimensions x m dimensions (m x m), wherein m is larger than n; and pasting the n-dimension picture to the m-dimension picture by taking a as an initial position to obtain the atomic-scale picture with consistent size and position.
3. The machine vision-based magnetic hard disk sequence code identification method of claim 1, wherein the extraction of mixed character features in step 2 comprises three steps of calculating image gradient, constructing gradient histogram and Block normalization, and specifically comprises the following steps:
(1) calculating the image gradient:
calculating the gradient g of the direction of the abscissa(s) of the atomic-scale character image by adopting the formulas (3-1) and (3-2)s(s, t) gradient g in the direction of ordinate (t)t(s,t):
gs(s,t)=I(s+1,t)-I(s-1,t) (3-1)
gt(s,t)=I(s,t+1)-I(s,t-1) (3-1)
Obtaining the gradient amplitude g (s, t) and the gradient direction value theta (s, t) of each pixel position in the atomic-scale character image according to the formulas (4) and (5):
(2) constructing a gradient histogram:
dividing the atomic-level character image into a plurality of pixel units called cell units (cells), constructing a direction gradient histogram in the cells,
firstly, averagely dividing the gradient direction into I directional intervals, carrying out histogram statistics on the gradient directions of all pixels in each directional interval, wherein the statistics adopts a weighted projection mode, and the weight is represented by a gradient amplitude; calculating the gradient histogram to obtain an I-dimensional feature vector, which is the HOG feature of the cell unit, as follows:
βj=[η1,η2,…,ηI],i∈[1,I] (6)
wherein eta isiRepresenting a weighted projection of the gradient magnitude on each gradient direction interval; beta is ajHOG feature vector for jth cell unit;
(3) block normalization:
forming a section by a plurality of adjacent cell units (cells), called a block (block), and concatenating the feature vectors of all the cell units in the block to obtain the HOG feature of the block, as shown in the following:
γk=[β1,β2,…,βJ],j∈[1,J] (7)
wherein, betajHOG feature vector for jth cell unit; j is the total number of cell units within each block; gamma raykA HOG feature vector representing the kth block;
the contrast normalization operation is performed in a block (block) using equation (7), as follows:
wherein: gamma raykRepresents the vector to be normalized, | | γk||2Represents gammakNormalized norm,. epsilon.represents a very small constant,. vkThe normalized vector is represented as a vector after normalization,
the HOG obtains blocks (blocks) through a sliding window, the blocks are sequentially slid by a certain step length from left to right and from top to bottom based on the atomic-level character picture to obtain K blocks, and all normalized feature vectors of the blocks are connected in series to obtain a multi-dimensional feature vector upsilon, which is as follows:
υ=[v1,v2,…,vk],k∈[1,K] (9)
the HOG descriptor of the whole picture is a feature vector upsilon with dimension I J K composed of histogram components of all cell units in each block.
4. The machine vision-based magnetic hard disk serial code recognition method of claim 1, wherein the step 3 specifically comprises: selecting an SVM type as C-SVC, namely a multi-classifier for processing noise by using a penalty factor (Cost), and specifically:
step (1), firstly, extracting character HOG characteristics by adopting the Step length of a sliding window as 2 x 2 to obtain a characteristic file Z1;
Step (2) Next, the obtained profile Zi(i-1, 2,3) as an input, transmitting the input into the SVM model, and obtaining a training set and a test set by adopting a default proportion;
step (3), traversing the parameter combinations by adopting a grid search method according to the values of the parameters gamma and C, and testing the obtained model by using a test set to obtain the classification accuracy of each group of parameter combinations;
step (4), then, drawing a 3D image for the parameter combination obtained in Step (3) and the corresponding accuracy, and analyzing to obtain an optimal parameter combination;
step (5), extracting character HOG characteristics by using the Step length of a sliding window to be 4 x 4 to obtain a characteristic file Z2And step of parallel circulationStep (2), Step (3) and Step (4), and obtaining an optimal parameter combination aiming at the characteristics;
step (6), extracting character HOG characteristics by using the Step length of a sliding window as 8 x 8 to obtain a characteristic file Z3And circulating the steps Step (2), Step (3) and Step (4) to obtain an optimal parameter combination aiming at the characteristic;
and Step (7), finally, comparing the three sliding Step length obtaining models to obtain the optimal sliding Step length and parameter combination, and storing the training model to obtain the trained SVM training model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011235039.3A CN112348026A (en) | 2020-11-08 | 2020-11-08 | Magnetic hard disk sequence code identification method based on machine vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011235039.3A CN112348026A (en) | 2020-11-08 | 2020-11-08 | Magnetic hard disk sequence code identification method based on machine vision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112348026A true CN112348026A (en) | 2021-02-09 |
Family
ID=74429948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011235039.3A Pending CN112348026A (en) | 2020-11-08 | 2020-11-08 | Magnetic hard disk sequence code identification method based on machine vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112348026A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113414136A (en) * | 2021-08-25 | 2021-09-21 | 阿里云计算有限公司 | Storage medium destroying method, device and system and storage medium |
CN115904916A (en) * | 2023-02-08 | 2023-04-04 | 天翼云科技有限公司 | Hard disk failure prediction method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008200669A1 (en) * | 2002-02-01 | 2008-03-06 | Godo Kaisha Ip Bridge 1 | Moving picture coding method and moving picture decoding method |
CN108133216A (en) * | 2017-11-21 | 2018-06-08 | 武汉中元华电科技股份有限公司 | The charactron Recognition of Reading method that achievable decimal point based on machine vision is read |
CN109800616A (en) * | 2019-01-17 | 2019-05-24 | 柳州康云互联科技有限公司 | A kind of two dimensional code positioning identification system based on characteristics of image |
CN109948432A (en) * | 2019-01-29 | 2019-06-28 | 江苏裕兰信息科技有限公司 | A kind of pedestrian detection method |
-
2020
- 2020-11-08 CN CN202011235039.3A patent/CN112348026A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008200669A1 (en) * | 2002-02-01 | 2008-03-06 | Godo Kaisha Ip Bridge 1 | Moving picture coding method and moving picture decoding method |
CN108133216A (en) * | 2017-11-21 | 2018-06-08 | 武汉中元华电科技股份有限公司 | The charactron Recognition of Reading method that achievable decimal point based on machine vision is read |
CN109800616A (en) * | 2019-01-17 | 2019-05-24 | 柳州康云互联科技有限公司 | A kind of two dimensional code positioning identification system based on characteristics of image |
CN109948432A (en) * | 2019-01-29 | 2019-06-28 | 江苏裕兰信息科技有限公司 | A kind of pedestrian detection method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113414136A (en) * | 2021-08-25 | 2021-09-21 | 阿里云计算有限公司 | Storage medium destroying method, device and system and storage medium |
CN113414136B (en) * | 2021-08-25 | 2022-02-18 | 阿里云计算有限公司 | Storage medium destroying method, device and system and storage medium |
CN115904916A (en) * | 2023-02-08 | 2023-04-04 | 天翼云科技有限公司 | Hard disk failure prediction method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112686812B (en) | Bank card inclination correction detection method and device, readable storage medium and terminal | |
Kashyap et al. | Automatic number plate recognition | |
Yi et al. | Text detection in natural scene images by stroke gabor words | |
US8249343B2 (en) | Representing documents with runlength histograms | |
Zhang et al. | Text detection using edge gradient and graph spectrum | |
CN108596166A (en) | A kind of container number identification method based on convolutional neural networks classification | |
CN110298376B (en) | Bank bill image classification method based on improved B-CNN | |
CN107103317A (en) | Fuzzy license plate image recognition algorithm based on image co-registration and blind deconvolution | |
CN106203539B (en) | Method and device for identifying container number | |
CN106709530A (en) | License plate recognition method based on video | |
CN108932518B (en) | Shoe print image feature extraction and retrieval method based on visual bag-of-words model | |
CN105184291B (en) | A kind of polymorphic type detection method of license plate and system | |
CN106991421A (en) | A kind of ID card information extraction system | |
CN102289665B (en) | Printed file identifying method based on powdered ink stack texture analysis | |
CN107886066A (en) | A kind of pedestrian detection method based on improvement HOG SSLBP | |
CN105809205A (en) | Classification method and system for hyperspectral images | |
CN104537376A (en) | A method, a relevant device, and a system for identifying a station caption | |
Forczmański et al. | Stamps detection and classification using simple features ensemble | |
CN110689003A (en) | Low-illumination imaging license plate recognition method and system, computer equipment and storage medium | |
DE69130535T2 (en) | CHARACTER RECOGNITION METHOD AND DEVICE FOR LOCALIZING AND DETERMINING PRE-DETERMINED DATA OF A DOCUMENT | |
CN103886319A (en) | Intelligent held board recognizing method based on machine vision | |
CN112348026A (en) | Magnetic hard disk sequence code identification method based on machine vision | |
CN107145888A (en) | Video caption real time translating method | |
CN110276260B (en) | Commodity detection method based on depth camera | |
CN110322466B (en) | Supervised image segmentation method based on multi-layer region limitation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |