Nothing Special   »   [go: up one dir, main page]

CN111274855B - Image processing method, image processing device, machine learning model training method and machine learning model training device - Google Patents

Image processing method, image processing device, machine learning model training method and machine learning model training device Download PDF

Info

Publication number
CN111274855B
CN111274855B CN201811480882.0A CN201811480882A CN111274855B CN 111274855 B CN111274855 B CN 111274855B CN 201811480882 A CN201811480882 A CN 201811480882A CN 111274855 B CN111274855 B CN 111274855B
Authority
CN
China
Prior art keywords
image
glasses
images
sample
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811480882.0A
Other languages
Chinese (zh)
Other versions
CN111274855A (en
Inventor
张雪
王冲
杜瑶
张彦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Orion Star Technology Co Ltd
Original Assignee
Beijing Orion Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Orion Star Technology Co Ltd filed Critical Beijing Orion Star Technology Co Ltd
Priority to CN201811480882.0A priority Critical patent/CN111274855B/en
Publication of CN111274855A publication Critical patent/CN111274855A/en
Application granted granted Critical
Publication of CN111274855B publication Critical patent/CN111274855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The specification provides an image processing method, an image processing device, a machine learning model training method and an image processing device, wherein the image processing method comprises the following steps: acquiring an image to be processed, wherein a photographer in the image to be processed wears glasses; processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographer in the processed image are removed; and carrying out face recognition on the shot person by using the processed image.

Description

Image processing method, image processing device, machine learning model training method and machine learning model training device
Technical Field
The present disclosure relates to the field of face recognition technologies, and in particular, to an image processing method and apparatus, and a machine learning model training method and apparatus.
Background
In face recognition, the influence of the wearing of the face glasses is usually reduced, the accuracy of the face recognition is reduced, the application in the market at present is based on PCA (principal component analysis principal components analysis) technology and DCNN (deep convolutional neural network Deep Convolutional Neural Network) technology, but the two methods are interfered by a plurality of factors and are basically processed on an infrared image, so that the actual application effect cannot be achieved, a color image is generally used in the process of face image acquisition, most of the technology of removing the glasses is not end-to-end, the process is complex, and the technology cannot be used in the actual application, so that the accuracy is not high enough when the face with the glasses is recognized, namely the error between the recognized face image and the actual face image is large, and the experience effect of a user is poor.
Disclosure of Invention
In view of the foregoing, embodiments of the present disclosure provide an image processing method, an image processing apparatus, a machine learning model training method, a machine learning model training apparatus, a computing device, and a storage medium, so as to solve the technical drawbacks in the prior art.
According to a first aspect of embodiments of the present specification, there is provided an image processing method including:
acquiring an image to be processed, wherein a photographer in the image to be processed wears glasses;
processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographer in the processed image are removed;
and carrying out face recognition on the shot person by using the processed image.
Optionally, the acquiring the image to be processed includes:
acquiring an original image to be processed;
screening out an image of the photographed person wearing glasses from the original image by using a first classifier;
an image of a person wearing glasses is determined as an image to be processed.
According to a second aspect of embodiments of the present specification, there is provided a machine learning model training method, comprising:
marking the state of wearing the glasses by the photographer in the original sample data by using a trained first classifier;
acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the photographer does not wear glasses, and the images in Sp are that the photographer wears glasses;
training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses.
Optionally, the first classifier is trained by:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses;
a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
Optionally, before acquiring the first data set Sn and the second data set Sp from the marked image, the method further includes:
marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a random sampling basis.
Optionally, the method further comprises:
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
According to a third aspect of the embodiments of the present specification, there is provided an image processing apparatus comprising:
the acquisition module is used for: is configured to acquire an image to be processed, wherein a photographer in the image to be processed wears glasses;
the processing module is used for: the system comprises a camera and a camera, wherein the camera is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, and glasses worn by a person to be shot in the processed image are removed;
and an identification module: is configured to perform face recognition on the subject using the processed image.
Optionally, the acquisition module is further configured to: acquiring an original image to be processed;
and screening out the image of the person wearing the glasses from the original image by using a first classifier, and determining the image of the person wearing the glasses as the image to be processed.
According to a fourth aspect of embodiments of the present specification, there is provided a machine learning model training apparatus comprising:
and a marking module: the system comprises a first classifier, a second classifier and a first image processing unit, wherein the first classifier is configured to mark the state of wearing glasses by a photographer in original sample data by using the trained first classifier;
a first acquisition module: is configured to acquire a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses;
a first training module: is configured to train a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses.
Optionally, the apparatus further comprises:
and a second training module: the sample tag is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses; a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
Optionally, the marking module is further configured to: marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis; and/or marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a basis of random sampling.
Optionally, the apparatus further comprises:
an optimization module configured to select at least one sample Is in the first data set Sn and the second data set Sp; converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression; determining a first penalty function based on the distance corresponding to the at least one sample Is; and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
According to a fifth aspect of embodiments of the present specification there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps in the first or second aspects when executing the instructions.
According to a sixth aspect of embodiments of the present description, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the first or second aspects.
In the embodiment of the specification, the to-be-processed image is obtained by obtaining the to-be-processed image, wherein a photographer in the to-be-processed image wears glasses, the to-be-processed image is processed through a pre-trained conversion network, the glasses worn by the photographer in the processed image are removed, and face recognition is performed on the photographer by using the processed image. The glasses worn by the photographer in the acquired images are removed through the trained conversion network, and face recognition is carried out on the photographer based on the images removed by the glasses.
Drawings
FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;
FIG. 2 is a flowchart of an image processing method provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a Cycle-GAN network application of an image processing method according to an embodiment of the present application;
FIG. 4 is a flow chart of a machine learning model training method of an image processing method provided by an embodiment of the present application;
FIG. 5 is a flow chart of training and optimizing a machine learning model of an image processing method provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of an image processing apparatus provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a machine learning model training device in the image processing device provided in the embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
In the present specification, an image processing method, apparatus, computing device, and storage medium are provided, and detailed description is given one by one in the following embodiments.
Fig. 1 is a block diagram illustrating a configuration of a computing device 100 according to an embodiment of the present description. The components of the computing device 100 include, but are not limited to, a memory 110 and a processor 120. Processor 120 is coupled to memory 110 via bus 130 and database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 140 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In an embodiment of the present description, other components of computing device 100 described above and not shown in FIG. 1 may also be connected to each other, such as via a bus. It should be understood that the block diagram of the computing device shown in FIG. 1 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 100 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein the processor 120 may perform the steps of the image processing method shown in fig. 2. Fig. 2 is a flowchart showing an image processing method according to an embodiment of the present specification, including steps 202 to 206.
Step 202: and acquiring an image to be processed, wherein a photographer in the image to be processed wears glasses.
In an embodiment of the present disclosure, the acquiring the image to be processed includes:
acquiring an original image to be processed;
screening out an image of the photographed person wearing glasses from the original image by using a first classifier;
an image of a person wearing glasses is determined as an image to be processed.
In practical application, if face recognition is required for 30 persons, face images of the 30 persons are collected first, wherein possible persons to be photographed wear glasses, and some persons to be photographed do not wear glasses. If the identified object wears glasses, accuracy of the identification is affected. In order to improve the accuracy of face recognition, in the embodiment of the present disclosure, face images of a person wearing glasses are selected from the acquired face images of 30 persons by the first classifier, for example, images of 10 persons wearing glasses are selected, and the 10 images are images to be processed.
Step 204: and processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographer in the processed image are removed.
In one embodiment of the present disclosure, the conversion network may be a Cycle-GAN network composed of a generative model and a discriminant model. The Cycle-GAN network can realize conversion of image styles, and enables the images in the source domain and the images in the target domain to have meaningful association relation.
Referring to fig. 3, a schematic diagram of a Cycle-GAN network application is shown, in which a source domain horse image is generated by a generation model in the Cycle-GAN network to generate a zebra image of a target domain.
Based on the conversion characteristics of the Cycle-GAN network, the Cycle-GAN network is trained by using images of the photographer wearing and not wearing the glasses, so that the images of the photographer wearing and not wearing the glasses are associated. After the Cycle-GAN network training is completed, a model is generated by the Cycle-GAN network training, an image of the person wearing the glasses is input, and an image of the person not wearing the glasses can be output.
In practical application, the images of the 10 subjects wearing the glasses are processed through a generation model in a pre-trained conversion network, so that 10 corresponding images of the subjects not wearing the glasses are generated.
Step 206: and carrying out face recognition on the shot person by using the processed image.
In the processed image, glasses worn by the photographer are removed, and face recognition is performed based on the removed glasses. In practical application, various face recognition methods can be adopted to perform face recognition, for example, feature extraction is performed on the processed image to obtain a feature expression, the obtained feature expression is compared with the feature expression of a reference face image in a registry, and whether the shot person is a registered user or not is determined according to the comparison result. If the user is registered, the related information of the shot person can be further acquired.
When extracting the features of the processed image, the method may be: and inputting the processed image into a pre-trained face recognition model, and acquiring the characteristic expression output by the hidden layer of the face recognition model as the characteristic expression of the processed image.
The application does not set any limitation on the face recognition method.
In the embodiment of the specification, the image to be processed is obtained by acquiring the image to be processed, wherein a photographer in the image to be processed wears glasses, the image to be processed is processed through a generation model in a conversion network which is composed of a generation model and a discrimination model and is trained in advance, and the glasses worn by the photographer in the processed image are removed. The glasses worn by the photographer in the acquired images are removed through the trained conversion network, and face recognition is carried out on the photographer based on the images removed by the glasses, so that the improvement of the accuracy of face recognition of the photographer in the subsequent face recognition process is facilitated.
Referring to fig. 4, a flowchart of a machine learning model training process is provided for an embodiment of the present disclosure, including steps 402 through 406.
Step 402: and marking the state of wearing the glasses by the photographer in the original sample data by using the trained first classifier.
In an embodiment of the present disclosure, the first classifier is trained by:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses;
a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
In practical application, a tag of a state that a person wearing glasses is added to a state that the person wearing glasses is in the original sample data through a first classifier, a tag of wearing glasses is added to an image of the person wearing glasses in the original sample data, a tag of not wearing glasses is added to an image of the person not wearing glasses in the original sample data, and whether the person wearing glasses in the original sample data is classified.
Step 404: the first data set Sn and the second data set Sp are acquired from marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses.
In an embodiment of the present disclosure, before the acquiring the first data set Sn and the second data set Sp from the marked image further includes:
marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a random sampling basis.
In practical application, a second classifier is utilized to mark whether the marked image has illumination or not; marking the gender of the face in the marked image by using a third classifier; the skin color of the face in the marked image can be marked by using a fourth classifier, and the marked image can be marked by using more classifiers, which are not limited in this specification.
And further marking the marked image through the second classifier and the third classifier, so that objective conditions of the marked image obtained during sampling are nearly consistent.
Optionally, the marked images are randomly sampled to obtain a first data set Sn and a second data set Sp, where the number of images in the first data set Sn and the second data set Sp may be equal or different, and in the embodiment of the present invention, the number of images included in the first data set Sn and the second data set Sp is not limited. Because each image has the same opportunity to be pumped in the sampling process, the subjective randomness in the sampling process is eliminated, and the credibility of the sampling result is improved. Of course, the sampling manner is not limited in the embodiment of the present invention, and for example, a sequential sampling manner may be adopted.
Step 406: training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses.
The conversion network may be a Cycle-GAN network, and the description of the Cycle-GAN network is specifically referred to the corresponding description of fig. 3, which is not repeated herein.
In this embodiment of the present disclosure, the conversion network is trained by randomly selecting the images with the same number from the image of the first data set Sn, which is not worn by the photographer, and the image of the second data set Sp, which is worn by the photographer, and by training the conversion network, the conversion network may correlate the image of the photographer, which is worn by the photographer, with the image of the photographer, which is not worn by the photographer, and by training the conversion network, the face image of the photographer, which is removed by the conversion network, is closer to the face image of the photographer in reality, thereby improving the conversion effect.
Referring to fig. 5, a flowchart of another machine learning model training and optimization process provided for embodiments of the present disclosure includes steps 502 through 516.
Step 502: and marking the state of wearing the glasses by the photographer in the original sample data by using the trained first classifier.
Optionally, a sample tag is added to the state of wearing glasses by the photographer in the original sample data through the first classifier, the state of wearing glasses by the photographer in the original sample data is changed into a tag whether the photographer wears glasses or not, and whether the photographer wears glasses or not in the sample image is classified.
Step 504: the first data set Sn and the second data set Sp are acquired from marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses.
Step 506: training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses.
Step 508: at least one sample Is selected from the first data set Sn and the second data set Sp.
Step 510: and converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression.
Step 512: a first penalty function Is determined based on the distance to which the at least one sample Is corresponds.
In practical applications, when one sample Is selected from the first data set Sn and the second data set Sp, the sample may be Is, the converted sample Is generated by converting the conversion network and Is Ig, feature extraction Is performed on the Is and Ig to obtain a first feature expression of the Is and a second feature expression of the Ig, a first distance corresponding to the Is calculated according to the first feature expression and the second feature expression, and the first distance Is determined as a first penalty function.
In practical application, the feature extraction can be performed on the input face image by using a pre-trained face recognition model, and after the face image is input, the feature expression output by the hidden layer of the face recognition model is obtained and used as the feature expression of the input face image. For example, is may be input into the face recognition model, the feature expression output by the face recognition model hidden layer may be used as a first feature expression of Is, ig may be input into the face recognition model, and the feature expression output by the face recognition model hidden layer may be used as a second feature expression of Ig.
In the case where a plurality of (i.e., two or more) samples are selected from the first data set Sn and the second data set Sp, three samples are selected for illustration, and the other number of samples are similar to the first data set Sn and the second data set Sp, and are not illustrated here. Assuming that the selected samples include Ix, iy and Iz, converting each sample through a conversion network to generate a corresponding converted sample, converting the sample Ix through the conversion network to generate a converted sample Im, converting the Iy through the conversion network to generate a converted sample In, converting the sample Iz through the conversion network to generate a converted sample Io, and extracting features of the Ix, im, iy, in, iz and Io respectively to obtain feature expression 1 of Ix, feature expression 2 of Im, feature expression 3 of Iy, feature expression 4 of In, feature expression 5 of Iz and feature expression 6 of Io.
Further, a first distance between the feature expression 1 and the feature expression 2, a second distance between the feature expression 3 and the feature expression 4, and a third distance between the feature expression 5 and the feature expression 6 are calculated, and a first penalty function is determined according to the first distance, the second distance and the third distance.
In the implementation, when determining the first penalty function according to the first distance, the second distance and the third distance, a mean value may be adopted or a maximum value may be adopted, which is not limited in this specification.
Step 514: and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
In an embodiment of the present disclosure, the conversion network has a second penalty function, and the conversion network is optimized; by adding the first penalty function and combining with the second penalty function in the conversion network, the optimization effect on the conversion network is improved.
In an embodiment of the present disclosure, the conversion network is trained by randomly selecting an equal number of images from the image of the first dataset Sn, where the image of the person wearing the glasses is not worn by the person, and the image of the second dataset Sp, where the image of the person wearing the glasses is not worn by the person wearing the glasses, and by training the conversion network, the conversion network may correlate the image of the person wearing the glasses with the image of the person wearing the glasses, and by adding the first penalty function and combining the first penalty function with the second penalty function in the conversion network, optimization of the conversion network is improved, so that the quality of the image obtained by the conversion network is better, the image of the person wearing the glasses is closer to the real face image by using the conversion network, the processing procedure is simple, the generated image quality is good, the interference of the glasses on the face recognition accuracy is reduced, and the improvement of the accuracy of the person to be photographed in the subsequent face recognition process is facilitated.
Corresponding to the above method embodiments, the present disclosure further provides an image processing apparatus embodiment, and fig. 6 shows a schematic structural diagram of the image processing apparatus according to one embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 includes:
the acquisition module 602: is configured to acquire an image to be processed, wherein a photographer in the image to be processed wears glasses;
the processing module 604: the system comprises a camera and a camera, wherein the camera is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, and glasses worn by a person to be shot in the processed image are removed;
the identification module 606: is configured to perform face recognition on the subject using the processed image.
In an alternative embodiment, the acquisition module is further configured to: acquiring an original image to be processed;
and screening out the image of the person wearing the glasses from the original image by using a first classifier, and determining the image of the person wearing the glasses as the image to be processed.
In an optional embodiment, the image to be processed is obtained by acquiring an image to be processed, the image to be processed is processed by a photographer in the image to be processed wearing glasses through a generation model in a conversion network consisting of a generation model and a discrimination model, which is trained in advance, so that a processed image is obtained, in the processed image, the glasses worn by the photographer are removed, the image to be processed is processed by the trained conversion network, the processing process is simple, the quality of the generated image is good, the interference of the glasses on the face recognition accuracy is reduced, and the improvement of the face recognition accuracy of the photographer in the subsequent face recognition process is facilitated.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a machine learning model training apparatus, and fig. 7 shows a schematic structural diagram of the machine learning model training apparatus according to one embodiment of the present disclosure. As shown in fig. 7, the apparatus 700 includes
Marking module 702: the system comprises a first classifier, a second classifier and a first image processing unit, wherein the first classifier is configured to mark the state of wearing glasses by a photographer in original sample data by using the trained first classifier;
the first acquisition module 704: is configured to obtain a first data set Sn and a second data set Sp from the marked image, wherein the photographer in the image in Sn does not wear glasses and the photographer in the image in Sp wears glasses;
the first training module 706: is configured to train a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses.
In an alternative embodiment, machine learning model training apparatus 700 further comprises:
and a second training module: the sample tag is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses; a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
In an alternative embodiment, the marking module is further configured to: marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis; and/or marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a basis of random sampling.
In an alternative embodiment, machine learning model training apparatus 700 further comprises:
an optimization module configured to select at least one sample Is in the first data set Sn and the second data set Sp; converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression; determining a first penalty function based on the distance corresponding to the at least one sample Is; and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
In an alternative embodiment, the first data set Sn and the second data set Sp are obtained from the marked image by marking the state of wearing glasses by the photographer in the original sample data by using the trained first classifier, wherein the photographer in the image in Sn does not wear glasses, and the photographer in the image in Sp wears glasses; based on the first data set Sn and the second data set Sp, training a conversion network realized through a machine learning model, wherein the conversion network enables images of the same person wearing glasses to be associated with images of the person not wearing glasses, and the photographed face images of the person wearing glasses after the conversion network is subjected to glasses removal are closer to face images of the person in reality, so that the conversion effect is improved.
An embodiment of the present disclosure also provides a computing device including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the image processing method when executing the instructions.
An embodiment of the present specification also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of an image processing method and a machine learning model training method.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the image processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solutions of the image processing method and the machine learning model training method.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims (12)

1. An image processing method, comprising:
acquiring an image to be processed, wherein a photographer in the image to be processed wears glasses;
processing the image to be processed through a pre-trained conversion network to obtain a processed image, wherein glasses worn by the photographer in the processed image are removed;
using the processed image to carry out face recognition on the shot person;
wherein the switching network is trained in the following manner:
acquiring a first data set Sn and a second data set Sp, wherein the images in Sn are that the photographer does not wear glasses, and the images in Sp are that the photographer wears glasses;
training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses;
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
2. The method of claim 1, wherein the acquiring the image to be processed comprises:
acquiring an original image to be processed;
screening out an image of the photographed person wearing glasses from the original image by using a first classifier;
an image of a person wearing glasses is determined as an image to be processed.
3. A machine learning model training method, comprising:
marking the state of wearing the glasses by the photographer in the original sample data by using a trained first classifier;
acquiring a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the photographer does not wear glasses, and the images in Sp are that the photographer wears glasses;
training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses;
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
4. A method according to claim 3, wherein the first classifier is trained by:
acquiring a sample image and a sample label corresponding to the sample image, wherein the sample label is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses;
a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
5. A method according to claim 3, characterized in that the acquiring of the first data set Sn and the second data set Sp from the marked image is preceded by:
marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis;
and/or
And marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a random sampling basis.
6. An image processing apparatus, comprising:
the acquisition module is used for: is configured to acquire an image to be processed, wherein a photographer in the image to be processed wears glasses;
the processing module is used for: the system comprises a camera and a camera, wherein the camera is configured to process the image to be processed through a pre-trained conversion network to obtain a processed image, and glasses worn by a person to be shot in the processed image are removed;
and an identification module: is configured to perform face recognition on the subject by using the processed image;
wherein the switching network is trained in the following manner:
acquiring a first data set Sn and a second data set Sp, wherein the images in Sn are that the photographer does not wear glasses, and the images in Sp are that the photographer wears glasses;
training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same person wearing glasses with images of the person not wearing glasses;
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
7. The apparatus of claim 6, wherein the acquisition module is further configured to:
acquiring an original image to be processed; and screening out the image of the person wearing the glasses from the original image by using a first classifier, and determining the image of the person wearing the glasses as the image to be processed.
8. A machine learning model training apparatus, comprising:
and a marking module: the system comprises a first classifier, a second classifier and a first image processing unit, wherein the first classifier is configured to mark the state of wearing glasses by a photographer in original sample data by using the trained first classifier;
a first acquisition module: is configured to acquire a first data set Sn and a second data set Sp from the marked images, wherein the images in Sn are that the shot person does not wear glasses, and the images in Sp are that the shot person wears glasses;
a first training module: training a conversion network implemented by a machine learning model based on the first data set Sn and the second data set Sp, the conversion network associating images of the same subject wearing glasses with images of the subject not wearing glasses;
further comprising an optimization module configured to:
selecting at least one sample Is in the first data set Sn and the second data set Sp;
converting each sample Is through a conversion network to generate a converted sample Ig, extracting features of the Is and the Ig to obtain a first feature expression of the Is and a second feature expression of the Ig respectively, and calculating a distance corresponding to the Is according to the first feature expression and the second feature expression;
determining a first penalty function based on the distance corresponding to the at least one sample Is;
and combining the first penalty function with a second penalty function of the conversion network to optimize the conversion network.
9. The apparatus as recited in claim 8, further comprising:
and a second training module: the sample tag is used for identifying whether a photographer in the sample image wears glasses or not, and the sample image comprises an image that the photographer does not wear the glasses and an image that the photographer wears the glasses; a first classifier is trained based on the training samples and the sample tags, the first classifier correlating images of a subject with a state of wearing glasses.
10. The apparatus of claim 8, wherein the tagging module is further configured to: marking whether the marked images have illumination or not by using a second classifier, and selecting images which are balanced in quantity and do not contain illumination information and images which contain illumination information as a random sampling basis; and/or marking the gender of the face in the marked image by using a third classifier, and selecting the face images with different sexes with balanced quantity as a basis of random sampling.
11. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the instructions, implements the steps of the method of any of claims 1-5.
12. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 5.
CN201811480882.0A 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device Active CN111274855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811480882.0A CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811480882.0A CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Publications (2)

Publication Number Publication Date
CN111274855A CN111274855A (en) 2020-06-12
CN111274855B true CN111274855B (en) 2024-03-26

Family

ID=71003186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811480882.0A Active CN111274855B (en) 2018-12-05 2018-12-05 Image processing method, image processing device, machine learning model training method and machine learning model training device

Country Status (1)

Country Link
CN (1) CN111274855B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866421A (en) * 2010-01-08 2010-10-20 苏州市职业大学 Method for extracting characteristic of natural image based on dispersion-constrained non-negative sparse coding
WO2018072102A1 (en) * 2016-10-18 2018-04-26 华为技术有限公司 Method and apparatus for removing spectacles in human face image
CN108280413A (en) * 2018-01-17 2018-07-13 百度在线网络技术(北京)有限公司 Face identification method and device
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9747701B2 (en) * 2015-08-20 2017-08-29 General Electric Company Systems and methods for emission tomography quantitation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866421A (en) * 2010-01-08 2010-10-20 苏州市职业大学 Method for extracting characteristic of natural image based on dispersion-constrained non-negative sparse coding
WO2018072102A1 (en) * 2016-10-18 2018-04-26 华为技术有限公司 Method and apparatus for removing spectacles in human face image
CN108280413A (en) * 2018-01-17 2018-07-13 百度在线网络技术(北京)有限公司 Face identification method and device
CN108846355A (en) * 2018-06-11 2018-11-20 腾讯科技(深圳)有限公司 Image processing method, face identification method, device and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于SVM和HOG的人脸检测算法;赵峰;;信息技术与信息化;20131215(06);全文 *

Also Published As

Publication number Publication date
CN111274855A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN107330408B (en) Video processing method and device, electronic equipment and storage medium
CN107333071A (en) Video processing method and device, electronic equipment and storage medium
CN110070029B (en) Gait recognition method and device
CN106611015B (en) Label processing method and device
CN107330904A (en) Image processing method, image processing device, electronic equipment and storage medium
CN110363084A (en) A kind of class state detection method, device, storage medium and electronics
CN114612987B (en) Expression recognition method and device
US20180075317A1 (en) Person centric trait specific photo match ranking engine
CN110969139A (en) Face recognition model training method and related device, face recognition method and related device
CN106203050A (en) The exchange method of intelligent robot and device
CN114239717A (en) Model training method, image processing method and device, electronic device and medium
CN116168274A (en) Object detection method and object detection model training method
CN112949456B (en) Video feature extraction model training and video feature extraction method and device
TW201828156A (en) Image identification method, measurement learning method, and image source identification method and device capable of effectively dealing with the problem of asymmetric object image identification so as to possess better robustness and higher accuracy
CN111274855B (en) Image processing method, image processing device, machine learning model training method and machine learning model training device
CN111814738A (en) Human face recognition method, human face recognition device, computer equipment and medium based on artificial intelligence
CN113407772A (en) Video recommendation model generation method, video recommendation method and device
CN116129319A (en) Weak supervision time sequence boundary positioning method and device, electronic equipment and storage medium
CN115482571A (en) Face recognition method and device suitable for shielding condition and storage medium
CN114445632A (en) Picture processing method and device
CN114549502A (en) Method and device for evaluating face quality, electronic equipment and storage medium
CN111079013B (en) Information recommendation method and device based on recommendation model
US11270155B2 (en) Duplicate image detection based on image content
CN114299572A (en) Face age estimation method and device and electronic equipment
CN114004974A (en) Method and device for optimizing images shot in low-light environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant