WO2020248782A1 - 一种亚洲人脸库智能建立方法 - Google Patents
一种亚洲人脸库智能建立方法 Download PDFInfo
- Publication number
- WO2020248782A1 WO2020248782A1 PCT/CN2020/091145 CN2020091145W WO2020248782A1 WO 2020248782 A1 WO2020248782 A1 WO 2020248782A1 CN 2020091145 W CN2020091145 W CN 2020091145W WO 2020248782 A1 WO2020248782 A1 WO 2020248782A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- data
- asian
- clustering
- database according
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/50—Maintenance of biometric data or enrolment thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
Definitions
- the invention belongs to the technical field of computer vision, and specifically relates to a method for intelligently establishing an Asian face database.
- the recognition rate of the face is mainly determined by the algorithm and the face data set.
- the current unified focus is on deep learning models, and the face recognition models of several giant companies with high facial recognition rates have already been formed or made public, which can implement the spirit of ethics.
- most of the current public face databases are Western data sets. Due to ethnic differences, the trained model is more suitable for Western data sets, while the performance of Asian face recognition is low.
- the domestic giant companies' own face databases are unwilling to make public, making it difficult to achieve the expected goals even if the same algorithm is used. Therefore, how to build your own Asian face database has become the core issue of the current company.
- the technical problem to be solved by the present invention is to provide a low-cost and low-manpower intelligent method for establishing an Asian face database in view of the above-mentioned shortcomings of the prior art.
- An intelligent method for establishing an Asian face database including the following steps:
- the aforementioned data source is Asian movies.
- the foregoing video decoding adopts a frame extraction method to reduce the time complexity and space complexity of the video data; the face detection adopts an improved yolov3-tiny face detection algorithm.
- the designed face detection model is improved based on the yolov3-tiny design concept, specifically:
- the improved yolov3-tiny is trained according to the classification method, that is, after the feature layer is used
- Softmax performs two classifications to obtain an initialization model
- the above-mentioned deblurring image uses the texture detection mechanism of the Sobel operator and the fast convolution function to realize the texture detection of each face.
- the above-mentioned collation and classification of clear data sets include designing facial feature extraction models, clustering between face data classes, clustering within face data classes, merging between face data classes, secondary cleaning of face data, and manual naming;
- the designed face feature extraction model is improved based on the residual network ResNet-18, specifically:
- the loss layer of the last layer adopts Triplet loss design
- CASIA-WebFace data set to train the model to realize the extraction of facial features.
- the above-mentioned clustering of face data between classes adopts the K-Means clustering method to realize the clustering distinction of the mixed face set of video data, and finally generate K face collection boxes, where K takes a value of 40.
- the above-mentioned intra-class clustering of face data is specifically: using the ResNet_clustering clustering algorithm to perform subject category screening on the K face collection boxes, and clean the previous round of misclassified data, where the number of K is adaptively determined by the algorithm.
- the above-mentioned merging of face data categories is specifically: the similarity between different collection boxes is judged by the method of averaging the characteristics of the samples in each collection box, and the similarity threshold is reasonably merged to combine different collections.
- the faces of the same type of boxes are merged.
- the above-mentioned secondary cleaning of face data includes the following steps:
- the aforementioned manual naming is the ID naming of the collection boxes through Baidu recognition, so that subsequent face collection boxes can be effectively merged.
- the method of the present invention avoids excessive financial resources, material resources, and manpower costs, and most of the established face data sets are multi-posture and multi-background, so it is helpful to improve the generalization ability of the model. At the same time, based on the advantages of the number of Asian movies, it is easy to build a million-level database.
- Figure 1 is a schematic flow diagram of the present invention
- Fig. 2 is a schematic flowchart of an embodiment of the present invention.
- a method for intelligently establishing an Asian face database of the present invention includes the following steps:
- the data source selected in the embodiment of the present invention is Asian movies, and the reasons are as follows: (1) Asia belongs to a region with high movie production. For example, China, South Korea, and Japan are all high-yield movie countries, which guarantees the quantity; (2) The frequent scene changes in the movie, the large number of actors, and the rich changes in facial gestures make it a guarantee of quality.
- Video decoding is mainly used for face detection. Due to the excessive number of picture frames and excessive redundancy in the video, in order to reduce the time complexity and space complexity, the embodiment of the present invention adopts a frame extraction method for video decoding. In units, the movie video is decoded by extracting one frame per second. On average, a movie can obtain about 7,200 pictures, and the time is about 8 minutes.
- the current face detection algorithm is relatively mature. By comparing the test results of SeetaFace face detection and Dlib face detection, it is shown that due to the different detection mechanisms of the two, when the resolution of the image is higher than 800, the performance of Dlib face detection decreases trend.
- the data source of the Asian face database is taken from movies, and its resolution is often higher, which makes SeetaFace detection a slight advantage.
- the experimental results are tested as follows:
- SeetaFace parameter settings the minimum face is set to 40 ⁇ 40, the confidence of the face is: 4.f, the scale pyramid scaling factor: 0.8f, the step size of the sliding window is: 4, based on the above parameter settings, all the pictures in the picture
- the collection of human faces requires an average of 20 minutes to process the data of a 1080p movie, which seems to be slow. Gu uses the improved yolov3-tiny algorithm.
- the face detection uses an improved yolov3-tiny face detection algorithm to perform face detection on each frame of the extracted picture, which takes a quarter of the original time to process a 1080p movie data about.
- the fuzzy judgment mechanism is adopted to filter and filter the face pictures one by one.
- the texture detection mechanism using the Sobel operator is used to remove the blurred image
- the fast convolution function is used to realize the texture detection of each face, as follows:
- the face image A is normalized to a size of 150 ⁇ 150 to achieve a unified judgment standard.
- the judgment threshold is set to Tm, and face images smaller than the threshold are directly removed to the fuzzy data set for subsequent development, and the remaining face data sets with higher definition are basically higher.
- the specific formula is as follows:
- G x and G y respectively represent the image gray value of the horizontal and vertical edge detection images.
- the image edge binarization processing formula is as follows:
- the ambiguity calculation formula is as follows:
- FU is the blur degree value of the face image. The larger the value, the higher the definition of the image. If FU ⁇ Tm, it is considered that the face image is blurred, and the face image is directly eliminated, and the opposite is retained.
- the collation and classification of the clear data set includes designing a face feature extraction model, clustering between face data classes, clustering within face data classes, merging between face data classes, secondary cleaning of face data, and Manual naming;
- the deep convolutional network designed in the embodiment of the present invention is a residual learning network (Residual Network), which includes 24 convolutional layers.
- the loss layer of the last layer uses Triplet loss.
- the training data set uses the public CASIA-WebFace data set, and the final accuracy rate tested on LFW is 95.43%, which is sufficient for face classification.
- the face feature extraction model is improved from the residual network ResNet-18, and the implementation method is as follows:
- the loss layer uses the Triplet loss loss function, and the input is a 150 ⁇ 150 three-channel face image.
- the core size of the convolutional layer in this network is 3 ⁇ 3, and the initialization method is MSRA; the first pooling layer has a core size of 3 ⁇ 3 and a step size of 2; the next three pooling layers have a core size of 2 ⁇ 2, The step size is 2, and maximum pooling is used. The last one is the global average pooling layer, the core size is 2 ⁇ 2, the step size is 2, and the final output feature length is 128.
- the model training uses the CASIA-WebFace data set, and the final accuracy rate tested on LFW is 95.43%.
- the loss function is as follows:
- the alpha edge hyperparameter mainly controls the class inner distance and class distance.
- the data set after the blurred image is cleaned is still a mixed face collection box. How to gather the same faces together and separate different faces is the key to this step.
- a special deep convolutional network is used to extract the features of all faces in the collection box, and the K-Means strategy is used to perform clustering operations on all face features. Because the collection of faces is based on a movie For a unit, its K value is 40 (experience value).
- the ResNet_face model is used to extract features of the entire set of faces, and the feature dimension is 128, that is, a feature matrix of (N, 128) is generated. Then it is subjected to K-Means clustering, where K takes a value of 40, and finally a collection box of 40 face clusters is generated.
- K-Means cost function is as follows:
- f i is the feature of the human face
- ⁇ k is the feature of the central cluster
- K-Means clustering is only to realize the overall distinction of different categories of aggregates, or to ensure that most similar individuals gather as much as possible into the same collection box, that is, the corresponding 40 collection boxes.
- the amount of face data in the collection box of each category is large and the complexity is high, and the key is still misclassification.
- intra-class clustering is essential. Therefore, the embodiment of the present invention adopts the ResNet_clustering clustering algorithm to realize intra-class clustering.
- the ResNet network is mainly used to extract the features of the face in the collection box, and at the same time, the K-Means intra-class clustering is realized with the collection box as a unit.
- the size of the cluster center K is not specified, and it is implemented by an algorithm, and finally the cluster containing the largest number of samples is filtered out as the main category of the collection box.
- the implementation example is as follows: based on 40 face collection boxes obtained by clustering between clusters, each face collection box is clustered separately, and the cluster center is adaptively determined by the algorithm, and finally each collection box generates M clusters.
- the number of samples is screened for M clusters, and the cluster with the largest number of samples is used as the main category of the corresponding collection box.
- the adaptive algorithm uses the dichotomous K-means algorithm, that is, the two samples with the farthest distance are selected before clustering, as the initial two cluster centers, and then one of these clusters is selected to continue splitting, and so on , When the distance between the two farthest samples in the cluster is less than the threshold, the splitting stops.
- the face data set of a movie is basically formed.
- the appearance, posture, and scene change are relatively large, it is very easy to cluster them into different categories, making the combination of categories indispensable.
- the implementation algorithm of this step in the embodiment of the present invention is: the similarity between different collection boxes is judged by the method of averaging the characteristics of the samples in each collection box, and reasonable merging is performed according to the similarity threshold.
- the implementation example is as follows: based on 40 subject face collection boxes obtained by clustering within the class, feature extraction is performed on the faces in each collection box, and the mean features are calculated, namely, mean_feature 1 , mean_feature 2 , ... mean_feature 40 . Then the Euclidean distance is made for each of the 40 mean features. When the distance is less than dis, the collection boxes corresponding to the two mean features are merged.
- the mean characteristic formula is as follows:
- i refers to the corresponding collection box, which represents the characteristics of the nth sample in the i-th collection box
- f i n refers to the total number of samples in the collection box.
- D ij refers to the distance between the collection box i and the collection box j
- 128 refers to the dimension of the feature.
- the second cleaning is to perform a similar median filtering operation on each collection box on the basis of the previous one.
- Reasonable screening is carried out according to the distance between the median feature and each sample feature.
- the second cleaning is performed based on the P face collection box obtained by combining between classes.
- the strategy adopted is similar to the median filter, and the algorithm is as follows:
- the sorting of the face database of a movie is basically completed, but the IDs of the collection boxes are all virtual and need to be named manually.
- Implemented strategy Randomly extract a picture from the collection box, use Baidu's image recognition tool to perform face recognition, and name the corresponding face collection box based on the specific results of the recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种亚洲人脸库智能建立方法,包括以下步骤:选取数据源;视频解码;人脸检测;去除模糊图片;整理与分类清晰数据集。所述方法避免了过多的财力、物力、人力花费,其建立的人脸数据集大多是多姿态、多背景,因此有助于模型泛化能力的提高。同时基于亚洲电影数量的优势,使建立百万级数据库成为可能。
Description
本发明属于计算机视觉技术领域,具体涉及一种亚洲人脸库智能建立方法。
近年来,安防行业掀起了一波人脸识别的热潮,众多厂商纷纷推出了相应的产品,一时间,人脸识别成为了行业的热点。据统计,在2017年的中国国际社会公共安全博览会上至少有40家企业展示了自己的人脸识别产品。其中既有大华股份,海康威视这样的大安防厂商,也有汉王、银晨这样的智能化厂商。同时,众多媒体也接连报道了人脸识别技术在学术界和工业界取得的巨大成果:比如之前,腾讯在LFW人脸识别数据集上取得了较高的识别率,刷新了年初谷歌的记录;阿里巴巴集团执行主席马云在德国展会上演示了人脸识别与支付宝的结合应用,“刷脸支付”将走向生活。这些振奋人心的消息似乎在清楚地告诉我们,人脸识别已经从“梦想”照进“现实”。
然而,人脸的识别率主要由算法和人脸数据集所决定。对于算法来讲,目前统一指向深度学习模型,而当前人脸识别率较高的几个巨头公司的人脸识别模型已然成型或是公开,这一点可以实施拿来主义精神。不幸的是当下公开的人脸数据库几乎都是西方数据集,由于存在种族差异,使得训练好的模型对西方数据集较为适应,而对亚洲人脸识别的性能偏低。而国内巨头公司自己的人脸数据库又不愿公开,导致即使使用相同的算法,也难以达到预期的目标。因此,如何建立自己的亚洲人脸数据库成为当下公司的核心问题。
发明内容
本发明所要解决的技术问题是针对上述现有技术的不足,提供一种低成本、低人力的亚洲人脸库智能建立方法。
为实现上述技术目的,本发明采取的技术方案为:
一种亚洲人脸库智能建立方法,包括以下步骤:
选取数据源;
视频解码;
人脸检测;
去除模糊图片;
整理与分类清晰数据集。
为优化上述技术方案,采取的具体措施还包括:
上述的数据源为亚洲电影。
上述的视频解码采用抽帧方式,用于降低视频数据的时间复杂度和空间复杂度;所述人脸检测采用改进的yolov3-tiny人脸检测算法。
所述设计人脸检测模型基于yolov3-tiny设计理念进行改进,具体为:
由于目标检测类别数较少,顾将卷积层的数量设定为八层;
其次在前期训练时,将改进的yolov3-tiny按分类方式进行训练,即在特征层后面 利用
softmax进行两分类,得到初始化模型;
最后利用训练好的分类模型,初始化改进的yolov3-tiny进行大规模的人脸检测训练。
上述的去除模糊图片运用Sobel算子的纹理检测机制,利用快速卷积函数实现对每张人脸的纹理检测。
上述的整理与分类清晰数据集包括设计人脸特征提取模型、人脸数据类间聚类、人脸数据类内聚类、人脸数据类间合并、人脸数据二次清洗以及人工命名;
所述设计人脸特征提取模型基于残差网络ResNet-18进行改进,具体为:
在conv4_x增加一个block,在conv5_x增加两个block;
将每层的滤波器数量减少一半;
最后一层的损失层采用Triplet loss设计;
运用CASIA-WebFace数据集进行模型的训练,实现人脸特征的提取。
上述的人脸数据类间聚类采用K-Means聚类方式,实现对视频数据人脸混合集的聚类区分,最终生成K个人脸收集箱,其中K取值40。
上述的人脸数据类内聚类具体为:采用ResNet_clustering聚类算法对K个人脸收集箱分别进行主体类别筛查,清洗前一轮错分数据,其中K的个数由算法自适应确定。
上述的人脸数据类间合并具体为:通过对每个收集箱中样本求均值特征的方法对不同收集箱之间的相似度做判断,根据相似度阈值大小进行合理的合并,从而将不同收集箱的同一类人脸进行合并。
上述的人脸数据二次清洗包括以下步骤:
(1)计算人脸收集箱中的均值特征;
(2)计算收集箱中每一张人脸特征与人脸均值特征之间的距离并按距离大小排序;
(3)根据排序提取中值索引对应的人脸特征,若为偶数求二者均值;
(4)计算收集箱中所有人脸特征与中值人脸特征的距离,剔除距离大于判断阈值的人脸数据。
上述的人工命名为通过百度识别对收集箱进行ID命名,使得后续的人脸收集箱得到有效的合并。
本发明具有以下有益效果:
本发明方法避免了过多的财力、物力、人力花费,其建立的人脸数据集大多是多姿态、多背景,因此有助于模型泛化能力的提高。同时基于亚洲电影数量的优势,建立百万级数据库轻而易举。
图1是本发明的流程示意图;
图2是本发明实施例的流程示意图。
以下结合附图对本发明的实施例作进一步详细描述。
如图1和图2所示,本发明的一种亚洲人脸库智能建立方法,包括以下步骤:
S1:选取数据源;
本发明实施例选取的数据源为亚洲电影,原因如下:(1)亚洲属于电影高产区域。如中国、韩国、日本均属于高产电影国,使得数量上有保障;(2)电影中场景变换频繁、演员类别数较多、人脸的姿态变化丰富,使得其在质量上有保证。
S2:视频解码;
视频解码主要作用于人脸检测,由于视频中图片帧数过多,冗余量过大,为了降低时间复杂度和空间复杂度,本发明实施例采取抽帧的方式进行视频解码,以电影为单位,按一秒抽取一帧的方式对电影视频进行解码操作,平均一部电影可以获取约7200张图片,时间约为8分钟。
S3:人脸检测;
当前人脸检测算法较为成熟,通过比较SeetaFace人脸检测和Dlib人脸检测的测试结果表明:由于二者的检测机理不同,当图像的分辨率高于800时,Dlib人脸检测的性能呈下降趋势。而亚洲人脸库的数据源取自电影,其分辨率往往较高,使得SeetaFace检测稍占有优势。实验结果测试如下:
SeetaFace参数设置:最小人脸设置为40×40、人脸的置信度为:4.f、尺度金字塔缩放因子:0.8f、滑动窗口的步长为:4,基于以上参数设置实现对图片中所有人脸的收集工作,平均处理一部1080p电影数据,需要20分钟,显得处理速度较慢,顾采用改进的yolov3-tiny算法。
实施例中,所述人脸检测采用改进的yolov3-tiny人脸检测算法对抽取的每一帧图片进行人脸检测,其处理一部1080p电影数据,花费的时间降为原来的四分之一左右。
S4:去除模糊图片;
基于电影数据的属性,在视频中动作、姿态往往连续变化,使得采集到的人脸存在运动模糊现象。采用模糊判断机制对人脸图片进行逐一过滤筛选。
实施例中,所述去除模糊图片采用运用Sobel算子的纹理检测机制,利用快速卷积函数实现对每张人脸的纹理检测,具体如下:
在检测之前将人脸图像A,归一化成大小为:150×150,以实现统一化判断标准。判断阈值设置为Tm,小于该阈值的人脸图像直接移除到模糊数据集中去以备后续开发,剩下基本为清晰度较高的人脸数据集。具体公式如下:
其中G
x,G
y分别代表横向以及纵向边缘检测图的图像灰度值。
图像中每一个像素横向以及纵向通过如下公式进行结合:
通常,为了提高效率采用不开平方的近似值即:
|G|=|G
x|+|G
y| (2)
图像边缘二值化处理公式如下:
其中,T=110为指定的阈值。
模糊度计算公式如下:
其中,FU为人脸图像的模糊度值,值越大图像的清晰度越高,若FU<Tm则认为模糊人脸图像,直接剔除,相反保留。
S5:整理与分类清晰数据集。
实施例中,所述整理与分类清晰数据集包括设计人脸特征提取模型、人脸数据类间聚类、人脸数据类内聚类、人脸数据类间合并、人脸数据二次清洗以及人工命名;
所述设计人脸特征提取模型:
人脸特征提取是数据整理分类的关键,但是基于人脸数据库建立的需求,其对人脸的识别率要求不高。因此本发明实施例设计的深度卷积网络为残差学习网络(Residual Network),包含24个卷积层。最后一层的损失层采用Triplet loss。训练数据集则采用公开的CASIA-WebFace数据集,其最终在LFW上测试的准确率为95.43%,对于人脸的分类来说足以。
具体地,人脸特征提取模型由残差网络ResNet-18进行改进而来,实现方法如下:
分别在conv4_x增加一个block,在conv5_x增加两个block,并且将每层的滤波器数量减少一半,损失层采用Triplet loss损失函数,输入的是150×150的三通道人脸图像。
该网络中卷积层的核大小为3×3,初始化方法为MSRA;第一个池化层核大小3×3,步长为2;紧接着三个池化层核大小为2×2,步长为2,且都采用最大池化,最后一个是全局平均池化层,核大小2×2,步长为2,最终输出的特征长度为128。模型训练采用CASIA-WebFace数据集,其最终在LFW上测试的准确率为95.43%。损失函数如下:
所述人脸数据类间聚类:
模糊图像清洗后的数据集依旧是混合的人脸收集箱,如何将相同的人脸聚集到一起,不同的人脸分离开是此步骤的关键。本发明实施例采用特制的深度卷积网络对收集箱中的所有人脸进行特征提取,并采用K-Means的策略对所有人脸特征进行聚类操作,由于人脸的聚集以一部电影为一个单位,其K取值为40(经验值)。
具体的,基于一部电影的人脸图片集合约N张,采用ResNet_face模型对整个集合 的人脸做特征提取,特征维度为128,即生成(N,128)的特征矩阵。随后将其进行K-Means聚类,其中K取值为40,最终生成40个人脸聚类的收集箱。K-Means成本函数如下:
其中,f
i为人脸的特征,μ
k为中心簇的特征。
所述人脸数据类内聚类:
K-Means聚类只是实现集合体不同类别的整体区分,或者说只是保证大多数相似的个体尽量聚集到同一个收集箱中即对应的40个收集箱。然而,每个类别的收集箱中人脸数据量较大、复杂度较高,关键依旧存在错分现象。为了摒弃这一现象,类内聚类必不可少。因此,本发明实施例采用ResNet_clustering聚类算法实现类内聚类。该过程中主要采用ResNet网络对收集箱中的人脸进行特征提取,同时以收集箱为单位,分别实现K-Means类内聚类。此次聚类过程中聚类中心K的大小不指定,由算法实现,最终过滤出包含样本数最多的簇作为收集箱的主体类别。
实施例如下:基于类间聚类得到的40个人脸收集箱,分别对每个人脸收集箱进行聚类,而聚类的中心由算法自适应确定,最终每个收集箱生成M个簇。对M个簇进行样本数量筛选,样本数最多的簇作为对应收集箱的主体类别。其中自适应算法采用的是二分K均值算法,即在聚类之前选出距离最远的两个样本,作为初始的两个聚类中心,之后从在这些簇中选取一个继续分裂,以此类推,当簇中最远的两个样本之间的距离小于阈值时即停止分裂。
所述人脸数据类间合并:
经过以上三轮操作,一部电影的人脸数据集基本成型。但是,由于电影中主角出场的次数较为频繁,外观、姿态、场景变化较大,因此极易将其聚类到不同的类别中,使得类间合并不可或缺。本发明实施例中该步骤的实现算法为:通过对每个收集箱中样本求均值特征的方法对不同收集箱之间的相似度做判断,根据相似度阈值大小进行合理的合并。
实施例如下:基于类内聚类得到的40个主体人脸收集箱,分别对每个收集箱中的人脸做特征提取,并求取均值特征即,mean_feature
1,mean_feature
2,…mean_feature
40。随后对40个均值特征两两做欧氏距离,当距离小于dis时,将二者均值特征对应的收集箱进行合并。均值特征公式如下:
其中,i指的对应的收集箱,表示第i个收集箱中第n个样本的特征,f
i
n指的是收集箱中样本的总数。
其中距离公式如下:
其中,D
ij指的是收集箱i与收集箱j之间的距离,128指的是特征的维度。
当D
ij≤dis,则将收集箱i与收集箱j进行合并,且dis=0.31为指定阈值。
所述人脸数据二次清洗:
为了进一步保证数据集的干净程度,二次清洗是在前面的基础上对每一个收集箱进行类似的中值滤波操作。根据中值特征与各样本特征之间的距离大小进行合理的筛选。
具体的,基于类间合并得到的P个人脸收集箱进行二次清洗,采用的策略类似于中值滤波,其算法如下:
(1)计算人脸收集箱中的均值特征,采用公式(7);
(2)计算收集箱中每一张人脸特征与人脸均值特征之间的距离并按距离大小排序,其中距离公式类似公式(8);
(3)根据排序提取中值索引对应的人脸特征,若为偶数求二者均值;
(4)计算收集箱中所有人脸特征与中值人脸特征的距离,若距离大于0.41时(经验值)直接剔除。
对所有的收集箱分别完成以上四个步骤。
所述人工命名:
基于以上所有步骤,一部电影的人脸库整理分类基本完成,但是收集箱的ID都为虚拟的需要人工命名。实现的策略:通过随机抽取收集箱中的一张图片,运用百度识图的工具进行人脸识别,根据识别的具体结果给对应人脸收集箱命名。
以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。
Claims (10)
- 一种亚洲人脸库智能建立方法,其特征在于,包括以下步骤:选取数据源;视频解码;人脸检测;去除模糊图片;整理与分类清晰数据集。
- 根据权利要求1所述的一种亚洲人脸库智能建立方法,其特征在于,所述数据源为亚洲电影。
- 根据权利要求1所述的一种亚洲人脸库智能建立方法,其特征在于,所述视频解码采用抽帧方式,用于降低视频数据的时间复杂度和空间复杂度;所述人脸检测采用改进的yolov3-tiny进行人脸检测模型设计,具体如下:所述人脸检测模型,其卷积层的数量为八层;在前期训练时,将改进的yolov3-tiny按分类方式进行训练,即在特征层后面利用softmax进行分类,得到初始化模型;最后利用训练好的初始化模型,初始化改进的yolov3-tiny进行大规模的人脸检测训练。
- 根据权利要求1所述的一种亚洲人脸库智能建立方法,其特征在于,所述去除模糊图片运用Sobel算子的纹理检测机制,利用快速卷积函数实现对每张人脸的纹理检测。
- 根据权利要求1所述的一种亚洲人脸库智能建立方法,其特征在于,所述整理与分类清晰数据集包括设计人脸特征提取模型、人脸数据类间聚类、人脸数据类内聚类、人脸数据类间合并、人脸数据二次清洗以及人工命名;所述设计人脸特征提取模型基于残差网络ResNet-18进行改进,具体为:在conv4_x增加一个block,在conv5_x增加两个block;每层的滤波器数量减少一半;最后一层的损失层采用Triplet loss设计;运用CASIA-WebFace数据集进行模型的训练,实现人脸特征的提取。
- 根据权利要求5所述的一种亚洲人脸库智能建立方法,其特征在于,所述人脸数据类间聚类采用K-Means聚类方式,实现对视频数据人脸混合集的聚类区分,最终生成K个人脸收集箱,其中K取值40。
- 根据权利要求5所述的一种亚洲人脸库智能建立方法,其特征在于,所述人脸数据类内聚类具体为:采用ResNet_clustering聚类算法对K个人脸收集箱分别进行主体类别筛查,清洗前一轮错分数据,其中K的个数由算法自适应确定。
- 根据权利要求5所述的一种亚洲人脸库智能建立方法,其特征在于,所述人脸数据类间合并具体为:通过对每个收集箱中样本求均值特征的方法对不同收集箱之间的相似度做判断,根据相似度阈值大小进行合理的合并,从而将不同收集箱的同一类人脸进行合并。
- 根据权利要求5所述的一种亚洲人脸库智能建立方法,其特征在于,所述人脸数据二次清洗包括以下步骤:(1)计算人脸收集箱中的均值特征;(2)计算收集箱中每一张人脸特征与人脸均值特征之间的距离并按距离大小排序;(3)根据排序提取中值索引对应的人脸特征,若为偶数求二者均值;(4)计算收集箱中所有人脸特征与中值人脸特征的距离,剔除距离大于判断阈值的人脸数据。
- 根据权利要求5所述的一种亚洲人脸库智能建立方法,其特征在于,所述人工命名为通过百度识别对收集箱进行ID命名,使得后续的人脸收集箱得到有效的合并。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910514779.1 | 2019-06-14 | ||
CN201910514779.1A CN110287835A (zh) | 2019-06-14 | 2019-06-14 | 一种亚洲人脸库智能建立方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020248782A1 true WO2020248782A1 (zh) | 2020-12-17 |
Family
ID=68004830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/091145 WO2020248782A1 (zh) | 2019-06-14 | 2020-05-20 | 一种亚洲人脸库智能建立方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110287835A (zh) |
WO (1) | WO2020248782A1 (zh) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287835A (zh) * | 2019-06-14 | 2019-09-27 | 南京云创大数据科技股份有限公司 | 一种亚洲人脸库智能建立方法 |
CN110569827B (zh) * | 2019-09-28 | 2024-01-05 | 华南理工大学 | 一种基于卷积神经网络的人脸识别提醒系统 |
CN113469321B (zh) * | 2020-03-30 | 2023-04-18 | 聚晶半导体股份有限公司 | 基于神经网络的物件检测装置和物件检测方法 |
TWI723823B (zh) | 2020-03-30 | 2021-04-01 | 聚晶半導體股份有限公司 | 基於神經網路的物件偵測裝置和物件偵測方法 |
CN112287753B (zh) * | 2020-09-23 | 2024-09-06 | 武汉天宝莱信息技术有限公司 | 一种基于机器学习提升人脸识别精度的系统及其算法 |
CN112232410B (zh) * | 2020-10-15 | 2023-08-29 | 苏州凌图科技有限公司 | 一种面向多区域大规模特征的匹配方法 |
CN112597862B (zh) * | 2020-12-16 | 2024-07-19 | 上海芯翌智能科技有限公司 | 一种用于人脸数据清洗的方法与设备 |
CN112800840B (zh) * | 2020-12-28 | 2022-07-01 | 上海万雍科技股份有限公司 | 一种人脸识别管理系统和方法 |
CN112381077B (zh) * | 2021-01-18 | 2021-05-11 | 南京云创大数据科技股份有限公司 | 一种人脸图像信息的隐藏方法 |
CN113779290A (zh) * | 2021-09-01 | 2021-12-10 | 杭州视洞科技有限公司 | 一种摄像头人脸识别聚合优化方法 |
DE112021008566T5 (de) | 2021-12-28 | 2024-10-10 | Boe Technology Group Co., Ltd. | Computerimplementiertes verfahren, vorrichtung und computerprogrammprodukt |
CN114373212A (zh) * | 2022-01-10 | 2022-04-19 | 中国民航信息网络股份有限公司 | 人脸识别模型构建方法、人脸识别方法及相关设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020076088A1 (en) * | 2000-12-15 | 2002-06-20 | Kun-Cheng Tsai | Method of multi-level facial image recognition and system using the same |
CN108921875A (zh) * | 2018-07-09 | 2018-11-30 | 哈尔滨工业大学(深圳) | 一种基于航拍数据的实时车流检测与追踪方法 |
CN109684913A (zh) * | 2018-11-09 | 2019-04-26 | 长沙小钴科技有限公司 | 一种基于社区发现聚类的视频人脸标注方法和系统 |
CN109871751A (zh) * | 2019-01-04 | 2019-06-11 | 平安科技(深圳)有限公司 | 基于人脸表情识别的服务态度评估方法、装置及存储介质 |
CN110287835A (zh) * | 2019-06-14 | 2019-09-27 | 南京云创大数据科技股份有限公司 | 一种亚洲人脸库智能建立方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102938065B (zh) * | 2012-11-28 | 2017-10-20 | 北京旷视科技有限公司 | 基于大规模图像数据的人脸特征提取方法及人脸识别方法 |
CN109117803B (zh) * | 2018-08-21 | 2021-08-24 | 腾讯科技(深圳)有限公司 | 人脸图像的聚类方法、装置、服务器及存储介质 |
-
2019
- 2019-06-14 CN CN201910514779.1A patent/CN110287835A/zh active Pending
-
2020
- 2020-05-20 WO PCT/CN2020/091145 patent/WO2020248782A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020076088A1 (en) * | 2000-12-15 | 2002-06-20 | Kun-Cheng Tsai | Method of multi-level facial image recognition and system using the same |
CN108921875A (zh) * | 2018-07-09 | 2018-11-30 | 哈尔滨工业大学(深圳) | 一种基于航拍数据的实时车流检测与追踪方法 |
CN109684913A (zh) * | 2018-11-09 | 2019-04-26 | 长沙小钴科技有限公司 | 一种基于社区发现聚类的视频人脸标注方法和系统 |
CN109871751A (zh) * | 2019-01-04 | 2019-06-11 | 平安科技(深圳)有限公司 | 基于人脸表情识别的服务态度评估方法、装置及存储介质 |
CN110287835A (zh) * | 2019-06-14 | 2019-09-27 | 南京云创大数据科技股份有限公司 | 一种亚洲人脸库智能建立方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110287835A (zh) | 2019-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020248782A1 (zh) | 一种亚洲人脸库智能建立方法 | |
CN108334847B (zh) | 一种真实场景下的基于深度学习的人脸识别方法 | |
CN106845510B (zh) | 基于深度层级特征融合的中国传统视觉文化符号识别方法 | |
CN110533084A (zh) | 一种基于自注意力机制的多尺度目标检测方法 | |
CN109886161B (zh) | 一种基于可能性聚类和卷积神经网络的道路交通标识识别方法 | |
US20210034840A1 (en) | Method for Recognzing Face from Monitoring Video Data | |
CN110458038A (zh) | 基于双链深度双流网络的小数据跨域动作识别方法 | |
CN108491786B (zh) | 一种基于分级网络和聚类合并的人脸检测方法 | |
CN111460914A (zh) | 一种基于全局和局部细粒度特征的行人重识别方法 | |
CN111488917A (zh) | 一种基于增量学习的垃圾图像细粒度分类方法 | |
CN113011253A (zh) | 基于ResNeXt网络的人脸表情识别方法、装置、设备及存储介质 | |
CN111125396B (zh) | 一种单模型多分支结构的图像检索方法 | |
CN114510594A (zh) | 一种基于自注意力机制的传统纹样子图检索方法 | |
CN110929099A (zh) | 一种基于多任务学习的短视频帧语义提取方法及系统 | |
CN114049194A (zh) | 一种基于图片背景相似性的欺诈检测识别方法及设备 | |
CN110321801B (zh) | 一种基于自编码网络的换衣行人重识别方法及系统 | |
CN106649665A (zh) | 一种面向图像检索的对象级深度特征聚合方法 | |
CN111597983A (zh) | 基于深度卷积神经网络实现生成式虚假人脸图像鉴定的方法 | |
CN107133579A (zh) | 基于CSGF(2D)2PCANet卷积网络的人脸识别方法 | |
CN105844213A (zh) | 一种绿色果实识别方法 | |
CN107492084A (zh) | 基于随机性的典型成团细胞核图像合成方法 | |
CN107358172A (zh) | 一种基于人脸朝向分类的人脸特征点初始化方法 | |
CN106682691B (zh) | 基于图像的目标检测方法及装置 | |
CN111461135B (zh) | 利用卷积神经网络集成的数字图像局部滤波取证方法 | |
CN108564020A (zh) | 基于全景3d图像的微手势识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20821890 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20821890 Country of ref document: EP Kind code of ref document: A1 |