Nothing Special   »   [go: up one dir, main page]

WO2016050729A1 - Face inpainting using piece-wise affine warping and sparse coding - Google Patents

Face inpainting using piece-wise affine warping and sparse coding Download PDF

Info

Publication number
WO2016050729A1
WO2016050729A1 PCT/EP2015/072354 EP2015072354W WO2016050729A1 WO 2016050729 A1 WO2016050729 A1 WO 2016050729A1 EP 2015072354 W EP2015072354 W EP 2015072354W WO 2016050729 A1 WO2016050729 A1 WO 2016050729A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
module
mask
face image
occlusion
Prior art date
Application number
PCT/EP2015/072354
Other languages
French (fr)
Inventor
Joaquin ZEPEDA SALVATIERRA
Patrick Perez
Xavier BURGOS
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2016050729A1 publication Critical patent/WO2016050729A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present invention relates to the reconstruction of lost or deteriorated parts of images or videos and, in particular, to reconstruction of facial expressions of images or videos.
  • Inpainting is the process of reconstructing lost or deteriorated parts of images and videos.
  • inpainting refers to the process of reconstructing regions of a face that were hidden due to typical occlusions such as sunglasses, hair, etc. From this initial problem formulation one can derive a wide number of similar tasks (detailed below) such as facial transfer, facial hallucination or facial expression transfer, etc.
  • An aspect of the present disclosure involves applying sparse coding to efficiently recover large regions of a face.
  • Sparse coding methods have been successfully applied to a large number of image processing problems, including denoising, inpainting, compression, classification and face recognition.
  • the aim of sparse coding is to represent each signal vector using a linear combination of a few column vectors, called atoms, from a rectangular matrix called the dictionary.
  • a good dictionary will contain atoms including spatial patterns that occur commonly in natural images.
  • Many off-the-shelf dictionary matrices exist, such as the DCT dictionary, but better results can be obtained by learning the dictionary from a set of training images.
  • the dictionary matrix D required in equation (2) needs to be chosen carefully for the task at hand.
  • a good dictionary will contain atoms that represent commonly occurring spatial patterns.
  • Inpainting based on sparse coding works as follows: let A represent the indices of the available pixels of y. Letting y (respectively D ) denote the sub-vector (sub-matrix) obtained by retaining the coefficients (rows) at positions JL, an approximation of the whole image block can be obtained from Dx°(y ,D ).
  • a goal is to locate the position of a sparse set of pre-defined P 2D key- point landmark locations encoding shape S (commonly including, for example, the corners of the eyes, mouth, and nose):
  • Sparse coding has been successfully applied to create face-tailored image compression schemes.
  • An example is the work of Bryt and Elad which also employs a piecewise-affine warping of the face to normalize physiognomy and size.
  • the application targeted by Bryt and Elad is compression, not face inpainting and their method uses the standard block-by-block rasterization approach, not a whole-image rasterization approach.
  • Yuille, Hallinan and Cohen also use sparse coding for face restoration, but in the context of face super-resolution and not the face-inpainting problem. Furthermore, the method of Yuille, Hallinan and Cohen does not consider piecewise-affine face alignment as described in the present disclosure and the sparse coding stage applied subsequently uses a standard block-by-block sparse-coding approachusing dictionaries of approximately 100,000 patches of size 5x5 taken from many face images. In contrast, the principles of the present disclosure involve learning the dictionary for the reconstruction task.
  • Facial expression transfer has been studied with fully unoccluded faces with the goal of transferring expressions across individuals, or from a video stream into a 3D animated model by estimating 3D facial landmarks.
  • the principles of the present disclosure allows recovery of the original expression in large occluded regions of the face.
  • the proposed method applies sparse coding to inpainting of face images, particularly when large spatial regions of the face are missing.
  • An aspect comprises applying sparse coding to the entire face image following geometrical normalization via piecewise-affine warping. This allows exploitation of subtle spatial dependencies to inpaint in an expression-coherent manner, as it is the case that expressions are manifested in all parts of the face (for example, both the eyes and the mouth take a particular form when one smiles).
  • the proposed method has a wide range of applications. Examples include recovering full facial expression portrayed by a subject even when large regions of his/her face are hidden, useful for video-conferencing or network social communication. Other examples include security (e.g. , removal of face masks, sunglasses etc.) and video editing (removal of glasses or jewelry, removal of face-covering hair dos). Security is of particular interest to law enforcement and anti-terrorism.
  • HMDs head mounted displays
  • Oculus http://www.oculusvr.com/.
  • a method and apparatus for performing face occlusion removal are described including receiving a face image and an occlusion mask, the occlusion mask indicating missing pixels, receiving training images, performing face alignment on the received training images and the face image and the occlusion mask, receiving a mask, receiving a learned dictionary and reconstructing the face image using the mask and the learned dictionary.
  • Fig. 1 is an overview of the proposed approach for expression-aware inpainting through sparse coding.
  • Fig. 2 is the portion of Fig. 1 that deals with face alignment.
  • Fig. 3 shows two different image rasterization methods.
  • Fig. 4 shows the dictionary learning portion of Fig. 1.
  • Fig. 5 is a flowchart of an exemplary implementation of the proposed method shown in Fig. 1.
  • Fig. 6 is a flowchart of an exemplary implementation of the offline component of the face alignment step (act) 505 of Fig. 5.
  • Fig. 7 is a flowchart of an exemplary implementation of the online component of the face alignment step (act) 505 of Fig. 5.
  • Fig. 8 is a flowchart of an exemplary implementation of the face reconstruction (inpainting) portion of the proposed method.
  • Fig. 9 is a block diagram of an exemplary apparatus for face occlusion removal. It should be understood that the drawing(s) are for purposes of illustrating the concepts of the disclosure and is not necessarily the only possible configuration for illustrating the disclosure.
  • any switches shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software.
  • the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared.
  • explicit use of the term "processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage. Other hardware, conventional and/or custom, may also be included.
  • any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
  • any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function.
  • the disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
  • FIG. 1 An overview of the proposed approach can be seen in Fig. 1. It is composed of three main steps: 1) Face alignment (pre-processing) 2) Dictionary offline learning and 3) Face reconstruction through sparse coding.
  • the face alignment step is shown in a dark grey box with white letters. That is, face alignment is face landmark based warping.
  • the offline dictionary learning is shown by a grey box with black lettering.
  • the remaining boxes are the steps required for face reconstruction through sparse coding (inpainting). Each of the steps is detailed below.
  • Faces captured in uncontrolled conditions can present a heterogeneity of sizes and positions in the image due to 1) use of different cameras (each with a different field of view, pixel resolution, etc.) 2) the distance of the subject from the camera and 3) the subject's physiognomy.
  • the proposed method pre-processes images to align the observed face with a standard face, well centered and of a predefined fixed scale. This process is illustrated in Fig. 2.
  • the first step is to estimate the shape of the face S, encoded as a sparse set of predefined P 2D key-point landmark locations. This can be achieved using any state-of-the- art algorithm such as proposed by Burgos-Artizzu, Perona and Dollar.
  • the face is successfully warped onto the average shape, removing variations due to differences in pixel resolutions, camera projections and to different physiognomies.
  • the resulting set of training vectors ⁇ ⁇ is used to learn a dictionary by minimizing equation (3).
  • the dictionary learning portion of the proposed method is illustrated on the top of Fig. 4.
  • the image is first pre-processed using the face alignment method above. Then, letting fl. denote the indices of available pixels inside the shape-normalized face, and let M denote the indices of the occluded pixels (for an illustration of these masks, see the bottom of Fig. 4). If the occlusion mask is specified in the image before shape normalization, one only needs to apply the shape normalization function computed in the first step to the occluded pixel positions. The pixels indicated by are then concatenated to build the signal vector y that is decomposed via sparse coding using a dictionary D including only the of rows of D corresponding to
  • the resulting sparse code vector x is used to obtain an approximation of the pixels in M using D ⁇ x. This estimate is substituted in place of the occlusion in the shape normalized image, and the composite image is subsequently de-normalized to map it back to the original signal shape.
  • the occlusion mask M required in the above proposed method can be manually input by the user.
  • an automatic occlusion detection system that works as follows is proposed: A large training set of two parts is required. The first part are shape- normalized images without occlusions. The second part includes occluded shape- normalized images with known M. A feature vector (e.g., the well-known SIFT feature proposed by Lowe) is extracted from each pixel of every image. For each pixel, a binary classifier is learned using the occluded and non-occluded features as a training set.
  • Fig. 5 is a flowchart of an exemplary implementation of the proposed method shown in Fig. 1.
  • a face image and occlusion mask M are accepted (received, input).
  • face alignment is performed.
  • Mask A warped
  • the learned dictionary D is accepted (received, input).
  • face reconstruction using sparse coding (inpainting) is performed.
  • Fig. 6 is a flowchart of an exemplary implementation of the offline portion of the face alignment step (act) 505 of Fig. 5.
  • training images with or without occlusion are accepted (received, input).
  • cascaded regression landmark estimation is performed on the training images.
  • the average face shape is calculated (determined, computed).
  • Delaunay triangulation is performed. Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P).
  • the second component of the face alignment step (act) of 505 of Fig. 5 is an online component.
  • Fig. 7 is a flowchart of an exemplary implementation of the online component of the face alignment step (act) 505 of Fig. 5.
  • cascaded regression landmark estimation is performed on the face image with occlusion that was accepted (received, input) at 505.
  • the results of the Delaunay triangulation are accepted (received, input).
  • piece-wise affine transform estimation is performed.
  • the piece- wise affine transform estimation yields a warped face image of standard shape and size.
  • an affine transformation is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation.
  • An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.
  • Fig. 8 is a flowchart of an exemplary implementation of the face reconstruction (inpainting) portion of the proposed method.
  • vector A of available pixels is extracted from the warped face image of standard shape and size.
  • sparse coding using DA A rows of D
  • the result is sparse code vector x.
  • the missing pixels are reconstructed and DMX (matrix of reconstructed pixels) is substituted into positions M of the warped face image.
  • the inpainted (reconstructed face image) is unwarped.
  • the result of the unwarping module is a reconstructed face (a face image with inpainted occlusion).
  • Fig. 9 is a block diagram of an exemplary apparatus for face occlusion removal.
  • the apparatus in which the proposed method is performed may be any suitable processor.
  • a suitable processor will also include memory (storage), at least one communications interface, antennas if wireless communications are necessary or available, an internal communications means (such as but not limited to a bus, token ring etc.), at least one display device.
  • Such components are standard and not shown in Fig. 9 so as to not clutter Fig. 9.
  • the memory (storage) may include but is not limited to disks, CDs, any form of RAM, optical disks etc.
  • the at least one communications interface acts to accept (receive, input) the face image and occlusion mask, Mask A, the learned dictionary (if the learned dictionary processing is performed offline in a standalone processor).
  • the at least one communications interface also outputs the reconstructed (inpainted) face image.
  • That output may be to a printer (for hard copy) to a removable storage device, to a display device or by a network link to another computer system for further processing or face matching.
  • Any or all of the processors herein may be computer systems or may be partially or entirely implemented in application specific integrated circuits (ASICs), filed programmable gate arrays (FPGAs), reduced instructions set computers (RISCs) or any other form that a processor may take.
  • ASICs application specific integrated circuits
  • FPGAs filed programmable gate arrays
  • RISCs reduced instructions set computers
  • the learned dictionary portion of the proposed method may be performed in the same apparatus (processor) or in a standalone processor or a co-processor of the apparatus having the face alignment module and the face reconstruction module.
  • the face alignment module has two components.
  • the offline component accepts (receives) training images with or without occlusion.
  • the offline component of the face alignment module may be performed within the face occlusion removal apparatus or in a standalone processor or in a co-processor.
  • the offline component of the face alignment module then performs cascaded regression landmark estimation on the training images.
  • the average face shape is then calculated (determined, computed) in the offline component of the face alignment module. Delaunay triangulation is then performed in the offline component of the face alignment module.
  • the online component of the face alignment module then accepts (receives) a face image and occlusion mask.
  • the online component of the face alignment module then performs cascaded regression landmark estimation on the face image.
  • the online component of the face alignment module then performs piece-wise affine transform estimation using the results of the Delaunay triangulation to yield a warped face image of standard shape and size.
  • the warped face image of standard shape and size is provided to the face reconstruction module, which includes an extraction module, a sparse coding module, a substitution module and a unwarping module.
  • the extraction module also accepts Mask A (warped) specifying position of n ⁇ m available pixels.
  • the extraction module extracts vector A of available pixels from the warped face image of standard shape and size.
  • the results of the extraction module are provided to the sparse coding module.
  • the Mask A (warped) specifying position of n ⁇ m available pixels is also provided to the sparse coding module.
  • the sparse coding module also accepts the learned dictionary. As shown in Fig. 9 the sparse coding module accepts the learned dictionary from the learned dictionary module shown in a dashed outline to indicate that it may be performed within the inpainting (face reconstruction) apparatus and the learned dictionary is shown as input to the sparse coding module with a solid line (arrow) to indicate that the learned dictionary is provided from a standalone processor.
  • the sparse coding module uses DA (A rows of D), the learned dictionary and the vector of available pixels to generate (compute, determine, calculate) a sparse vector x. The results of the sparse coding module are provided to the substitution module.
  • the substitution module reconstructs the missing pixels (indicated by the occlusion mask) and substitutes DMX (matrix of reconstructed pixels) into positions M of the warped face image.
  • the results of the substitution module are provided to the unwarping module, which unwarps the inpainted face image.
  • the result of the unwarping module is a reconstructed face (a face image with inpainted occlusion).
  • the dictionary learning method described above can be specialized for the specific task addresed by the proposed method by considering the following learning problem in place of equation (3):
  • Equation (6) needs to be solved individually for each mask M.
  • random masks can be used that are varied for each sample in the training set.
  • the resulting dictionary is sub-optimal for any specific mask, but performs well on average for any mask.
  • each strip will provide an inpainting prediction for a subset of the missing pixels M. If the strips are not disjoint, the average of the available predicted pixel values is taken for each pixel.
  • the proposed method is applicable to a picture or video containing occluded faces for which it is desirable to reconstruct.
  • the proposed method attempts to preserve the true expression of the subject, where even if the eyes were originally occluded when the person smiles one can see changes in the expression of his/her eyes. This is in clear contrast which classical "static" reconstructions which are constant regardless of facial expression.
  • the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
  • Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs).
  • ASICs application specific integrated circuits
  • RISCs reduced instruction set computers
  • FPGAs field programmable gate arrays
  • the present invention is implemented as a combination of hardware and software.
  • the software is preferably implemented as an application program tangibly embodied on a program storage device.
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
  • the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s).
  • CPU central processing units
  • RAM random access memory
  • I/O input/output
  • the computer platform also includes an operating system and microinstruction code.
  • the various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system.
  • various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
  • the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces.
  • general-purpose devices which may include a processor, memory and input/output interfaces.
  • the phrase "coupled" is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for performing face occlusion removal are described as shown in Figures 5 and 6 including receiving a face image and an occlusion mask, the occlusion mask indicating missing pixels (505), receiving training images (605), performing face alignment on the received training images and the face image and the occlusion mask (510), receiving a mask (515), receiving a learned dictionary (520) and reconstructing the face image using the mask and the learned dictionary (525).

Description

FACE INPAINTING USING PIECE- WISE AFFINE WARPING AND SPARSE
CODING
FIELD OF THE INVENTION
The present invention relates to the reconstruction of lost or deteriorated parts of images or videos and, in particular, to reconstruction of facial expressions of images or videos.
BACKGROUND OF THE INVENTION
This section is intended to introduce the reader to various aspects of art, which may be related to the present embodiments that are described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light.
Inpainting is the process of reconstructing lost or deteriorated parts of images and videos. In the case of pictures containing human faces, inpainting refers to the process of reconstructing regions of a face that were hidden due to typical occlusions such as sunglasses, hair, etc. From this initial problem formulation one can derive a wide number of similar tasks (detailed below) such as facial transfer, facial hallucination or facial expression transfer, etc.
An aspect of the present disclosure involves applying sparse coding to efficiently recover large regions of a face. Sparse coding methods have been successfully applied to a large number of image processing problems, including denoising, inpainting, compression, classification and face recognition. The aim of sparse coding is to represent each signal vector using a linear combination of a few column vectors, called atoms, from a rectangular matrix called the dictionary. A good dictionary will contain atoms including spatial patterns that occur commonly in natural images. Many off-the-shelf dictionary matrices exist, such as the DCT dictionary, but better results can be obtained by learning the dictionary from a set of training images.
The vast majority of algorithms employing sparse coding for image processing operate by first splitting the image into small image blocks of the same size (e.g., 8x8), and then rasterizing each block to obtain a signal vector (see, for example, the top of Fig. 3). There are two main reasons for this. One reason is that the complexity of sparse coding increases with the size of the signal vector. Yet recent approaches by Rubenstein, Zibulevsky and Elad and also by Zepeda successfully address this complexity issue by structuring the dictionary to make larger dictionaries suitable for large signal vectors accessible. The second reason is that, for generic natural images, spatial patterns become more diverse with increasing signal vector size. Hence, it is more difficult to represent large signal vectors extracted from generic natural images with a small number of atoms, making large signal vectors taken from generic images ill-suited for sparse coding methods.
There exist nonetheless non- generic image classes that will display high spatial dependency even for large block sizes. This is the case for images of faces, particularly when pre-processed to a standard physiognomy and size via piece-wise affine warping. Indeed two existing approaches exploit this property of face images. The first approach proposed by Bryt and Elad deals with compression of face images and employs piecewise- affine warping. Yet Bryt and Elad subsequently apply sparse coding using a standard per- block approach, albeit using per-block learned dictionaries. The second approach by Wright, Yang, Ganesh, Sastry, and Ma addresses face recognition and does not employ a face warping mechanism. The dictionary in this case includes a concatenation of multiple images of each targeted subject, as opposed to being learned for a reconstruction task (or better yet for the recognition task that the authors address).
Given a signal vector G ?J that is to be represented using a sparse selection of columns of an over- complete matrix DGi?JxiV. The d are referred to as atoms, and D as the dictionary. A small number L of atoms are selected so that the atoms produce the best approximation error: rninxly-Dxl2s lxl0<L, (1) where Ixl denotes the number of non- zero coefficients of the vector x, or equivalently, the number of atoms selected. This problem is NP-hard, but standard algorithms exist that obtain approximate solutions using greedy methods or by convexifying the problem through substitution of the Ixl constraint with an additive penalty term 1x1 =∑ Ix.l as i follows: x°(y,D)=minxly-Dxl2+Wxl1, (2)
Given the decomposition x° of the vector y, an approximation "y of y can be obtained using "y=Dx°.
The dictionary matrix D required in equation (2) needs to be chosen carefully for the task at hand. A good dictionary will contain atoms that represent commonly occurring spatial patterns. Many off-the-shelf dictionary matrices exist, such as the DCT dictionary. But better results can be obtained by learning the dictionary from a training set of vectors { y Rd)t such as proposed by Aharon, Elad and Bruckstein and also by Mairal, Bach, Ponce and Sapiro using the following objective: argmin ∑ lyrDx Ι +λ Ιχ Ιχ, l¾=l,V*, (3)
' t
Inpainting based on sparse coding works as follows: let A represent the indices of the available pixels of y. Letting y (respectively D ) denote the sub-vector (sub-matrix) obtained by retaining the coefficients (rows) at positions JL, an approximation of the whole image block can be obtained from Dx°(y ,D ).
Estimating the shape of human faces from photos or videos is a widely studied field in computer vision. A goal is to locate the position of a sparse set of pre-defined P 2D key- point landmark locations encoding shape S (commonly including, for example, the corners of the eyes, mouth, and nose):
S=[x,y],where,x,y Rp
Early work on shape estimation includes Active Contours Models by Kass, Witkin and Terzopolos, Template Matching by Yuille, Hallinan and Cohen, Active Shape Models (ASM) by Cootes and Taylor and Active Appearance Models (AAM) by Cootes, Edwards and Taylor. Popular modern approaches such as described by Felzenszwalb, Girshivk , McAUester and Ramanan involve first detecting the object parts independently and then estimating shape through flexible parts models. Another family of approaches by Cao, Wei, Wen and Sun and by Burgos-Artizzu, Perona and Dollar and by Ren, cao, Wei and Sun, is that which tackles shape estimation as a regression problem, learning regressors that directly predict the object shape or the location of its parts, starting from a raw estimate of its position. These methods are extremely fast and precise, being able to deal with large amounts of occlusion. An aspect of the present disclosure comprises using the method proposed by Burgos-Artizzui, Perona and Dollar, but any other could equally be used.
Sparse coding has been successfully applied to create face-tailored image compression schemes. An example is the work of Bryt and Elad which also employs a piecewise-affine warping of the face to normalize physiognomy and size. However, the application targeted by Bryt and Elad is compression, not face inpainting and their method uses the standard block-by-block rasterization approach, not a whole-image rasterization approach.
The work of Yuille, Hallinan and Cohen also use sparse coding for face restoration, but in the context of face super-resolution and not the face-inpainting problem. Furthermore, the method of Yuille, Hallinan and Cohen does not consider piecewise-affine face alignment as described in the present disclosure and the sparse coding stage applied subsequently uses a standard block-by-block sparse-coding approachusing dictionaries of approximately 100,000 patches of size 5x5 taken from many face images. In contrast, the principles of the present disclosure involve learning the dictionary for the reconstruction task.
The work of Wright, Nowak and Figueiredo applies sparse-coding using a whole- image rasterization approach, but in the context of face recognition, not face inpainting, and does not employ a piecewise affine warping. In addition, their dictionary matrix is not learned, consisting rather of a concatenation of face examples of the subjects known to the system.
Inpainting parts of an image or video using content from other images/video ahs been considered by others but, their methods suppose that the content to be replaced can be found somewhere else in the image/video, which is not the case addressed by the present disclosure because it cannot be supposed or assumed that the user will have his face fully visible at some point (e.g. by removing his sunglasses or moving away from occluding objects).
Facial expression transfer has been studied with fully unoccluded faces with the goal of transferring expressions across individuals, or from a video stream into a 3D animated model by estimating 3D facial landmarks. In contrast, the principles of the present disclosure allows recovery of the original expression in large occluded regions of the face.
The work of Jenatton, Obozinski and Bach considers sparse coding and dictionary learning using whole-image rasterization but in a different context. Specifically, the authors impose a constraint that forces their dictionary atoms to be localized in space (e.g., an atom might correspond only to the right eye). Such constraints are contrary to the task of spatial prediction, which requires atoms to model the spatial dependencies in face images. Their approach is tailored for the face recognition application, not face inpainting. SUMMARY OF THE INVENTION
The proposed method applies sparse coding to inpainting of face images, particularly when large spatial regions of the face are missing. An aspect comprises applying sparse coding to the entire face image following geometrical normalization via piecewise-affine warping. This allows exploitation of subtle spatial dependencies to inpaint in an expression-coherent manner, as it is the case that expressions are manifested in all parts of the face (for example, both the eyes and the mouth take a particular form when one smiles).
The proposed method has a wide range of applications. Examples include recovering full facial expression portrayed by a subject even when large regions of his/her face are hidden, useful for video-conferencing or network social communication. Other examples include security (e.g. , removal of face masks, sunglasses etc.) and video editing (removal of glasses or jewelry, removal of face-covering hair dos). Security is of particular interest to law enforcement and anti-terrorism.
Another example involves virtual removal of head mounted displays (HMDs) such as those produced by Oculus (http://www.oculusvr.com/). A method and apparatus for performing face occlusion removal are described including receiving a face image and an occlusion mask, the occlusion mask indicating missing pixels, receiving training images, performing face alignment on the received training images and the face image and the occlusion mask, receiving a mask, receiving a learned dictionary and reconstructing the face image using the mask and the learned dictionary.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is best understood from the following detailed description when read in conjunction with the accompanying drawings. The drawings include the following figures briefly described below:
Fig. 1 is an overview of the proposed approach for expression-aware inpainting through sparse coding.
Fig. 2 is the portion of Fig. 1 that deals with face alignment.
Fig. 3 shows two different image rasterization methods.
Fig. 4 shows the dictionary learning portion of Fig. 1.
Fig. 5 is a flowchart of an exemplary implementation of the proposed method shown in Fig. 1.
Fig. 6 is a flowchart of an exemplary implementation of the offline component of the face alignment step (act) 505 of Fig. 5.
Fig. 7 is a flowchart of an exemplary implementation of the online component of the face alignment step (act) 505 of Fig. 5.
Fig. 8 is a flowchart of an exemplary implementation of the face reconstruction (inpainting) portion of the proposed method.
Fig. 9 is a block diagram of an exemplary apparatus for face occlusion removal. It should be understood that the drawing(s) are for purposes of illustrating the concepts of the disclosure and is not necessarily the only possible configuration for illustrating the disclosure.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its scope.
All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo-code, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage. Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
An overview of the proposed approach can be seen in Fig. 1. It is composed of three main steps: 1) Face alignment (pre-processing) 2) Dictionary offline learning and 3) Face reconstruction through sparse coding. In Fig. 1, the face alignment step is shown in a dark grey box with white letters. That is, face alignment is face landmark based warping. The offline dictionary learning is shown by a grey box with black lettering. The remaining boxes are the steps required for face reconstruction through sparse coding (inpainting). Each of the steps is detailed below.
Faces captured in uncontrolled conditions can present a heterogeneity of sizes and positions in the image due to 1) use of different cameras (each with a different field of view, pixel resolution, etc.) 2) the distance of the subject from the camera and 3) the subject's physiognomy.
In order for the proposed approach to be robust to these variations, the proposed method pre-processes images to align the observed face with a standard face, well centered and of a predefined fixed scale. This process is illustrated in Fig. 2. The first step is to estimate the shape of the face S, encoded as a sparse set of predefined P 2D key-point landmark locations. This can be achieved using any state-of-the- art algorithm such as proposed by Burgos-Artizzu, Perona and Dollar.
Then, from each shape training image a shape S is extracted and its associated scale invariant shape S' is computed by removing size variations:
SHx ,y ],where x = (4)
Figure imgf000011_0001
'
Then, the standard face ~S is computed as the average of all N training faces after size normalization:
1 N 1 N
-M„ (5) n=l n=l Now, given an input image and its estimated face shape S, a goal is to warp the current shape S onto the average shape "5. this is achieved by performing a piecewise affine transform, as illustrated on the bottom of Fig. 2.
First, a Delaunay triangulation DT(S) is computed from the set of P landmark locations in the average shape ~S. Then, the average shape is projected onto the current image, by performing the inverse of equation (4) and applying the same Delaunay triangulation to current shape, yielding DT(S). Finally, every triangle in DT(S) is warped to DT(S) using an affine transform A such that A*DT(S)=DT(S) (for which there is a closed form solution).
As a result, the face is successfully warped onto the average shape, removing variations due to differences in pixel resolutions, camera projections and to different physiognomies.
Prior to utilization of the system one needs to train the dictionary used to carry out sparse decompositions. To this end, it is assumed that a training set consisting of a large number of shape-normalized images without occlusion is available. Each shape- normalized image is rasterized using a mask <F indicating the position of face pixels in the normalized image, thus producing the signal vector y. The mask <F is computed from the standard face discussed in previous section, as illustrated on the bottom of Fig. 4.
The resulting set of training vectors { } is used to learn a dictionary by minimizing equation (3). The dictionary learning portion of the proposed method is illustrated on the top of Fig. 4.
Given a face image suffering from partial occlusion, the image is first pre-processed using the face alignment method above. Then, letting fl. denote the indices of available pixels inside the shape-normalized face, and let M denote the indices of the occluded pixels (for an illustration of these masks, see the bottom of Fig. 4). If the occlusion mask is specified in the image before shape normalization, one only needs to apply the shape normalization function computed in the first step to the occluded pixel positions. The pixels indicated by are then concatenated to build the signal vector y that is decomposed via sparse coding using a dictionary D including only the of rows of D corresponding to
A- available pixels.
The resulting sparse code vector x is used to obtain an approximation of the pixels in M using D^x. This estimate is substituted in place of the occlusion in the shape normalized image, and the composite image is subsequently de-normalized to map it back to the original signal shape.
The occlusion mask M required in the above proposed method can be manually input by the user. Alternatively, an automatic occlusion detection system that works as follows is proposed: A large training set of two parts is required. The first part are shape- normalized images without occlusions. The second part includes occluded shape- normalized images with known M. A feature vector (e.g., the well-known SIFT feature proposed by Lowe) is extracted from each pixel of every image. For each pixel, a binary classifier is learned using the occluded and non-occluded features as a training set. Standard classifier learning algorithms exist such as the Support Vector Machine (SVM) classifier that has been used extensively in image classification, for example in the work by Chatfield, Lempitsky, Vedaldi and Zisserman. Fig. 5 is a flowchart of an exemplary implementation of the proposed method shown in Fig. 1. At 505 a face image and occlusion mask M are accepted (received, input). At 510 face alignment is performed. At 515 Mask A (warped) specifying position of n<m available pixels is accepted (received, input). At 520 the learned dictionary D is accepted (received, input). At 525 face reconstruction using sparse coding (inpainting) is performed.
There are two components to the face alignment step (act) 510 of Fig. 5. The first component is an offline component. By offline it is meant that the method of the component can be performed ahead of time offline on the same or another processor as any other portions of the proposed method of Fig. 5. Fig. 6 is a flowchart of an exemplary implementation of the offline portion of the face alignment step (act) 505 of Fig. 5. At 605 training images with or without occlusion are accepted (received, input). At 610 cascaded regression landmark estimation is performed on the training images. At 615 the average face shape is calculated (determined, computed). At 620 Delaunay triangulation is performed. Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P).
The second component of the face alignment step (act) of 505 of Fig. 5 is an online component. Fig. 7 is a flowchart of an exemplary implementation of the online component of the face alignment step (act) 505 of Fig. 5. At 705 cascaded regression landmark estimation is performed on the face image with occlusion that was accepted (received, input) at 505. At 710 the results of the Delaunay triangulation are accepted (received, input). At 715 piece-wise affine transform estimation is performed. The piece- wise affine transform estimation yields a warped face image of standard shape and size. In geometry, an affine transformation is a function between affine spaces which preserves points, straight lines and planes. Also, sets of parallel lines remain parallel after an affine transformation. An affine transformation does not necessarily preserve angles between lines or distances between points, though it does preserve ratios of distances between points lying on a straight line.
Dictionary learning is accomplished by applying any of a number of available dictionary learning algorithms to the vectors y obtained from face images without occlusion. Fig. 8 is a flowchart of an exemplary implementation of the face reconstruction (inpainting) portion of the proposed method. At 805 vector A of available pixels is extracted from the warped face image of standard shape and size. At 810 sparse coding using DA (A rows of D) is performed using the learned dictionary and the vector of available pixels. The result is sparse code vector x. At 815 the missing pixels (indicated by the occlusion mask) are reconstructed and DMX (matrix of reconstructed pixels) is substituted into positions M of the warped face image. At 820 the inpainted (reconstructed face image) is unwarped. The result of the unwarping module is a reconstructed face (a face image with inpainted occlusion).
Fig. 9 is a block diagram of an exemplary apparatus for face occlusion removal.
The apparatus in which the proposed method is performed may be any suitable processor. Such a suitable processor will also include memory (storage), at least one communications interface, antennas if wireless communications are necessary or available, an internal communications means (such as but not limited to a bus, token ring etc.), at least one display device. Such components are standard and not shown in Fig. 9 so as to not clutter Fig. 9. The memory (storage) may include but is not limited to disks, CDs, any form of RAM, optical disks etc. The at least one communications interface acts to accept (receive, input) the face image and occlusion mask, Mask A, the learned dictionary (if the learned dictionary processing is performed offline in a standalone processor). The at least one communications interface also outputs the reconstructed (inpainted) face image. That output may be to a printer (for hard copy) to a removable storage device, to a display device or by a network link to another computer system for further processing or face matching. Any or all of the processors herein may be computer systems or may be partially or entirely implemented in application specific integrated circuits (ASICs), filed programmable gate arrays (FPGAs), reduced instructions set computers (RISCs) or any other form that a processor may take. The learned dictionary portion of the proposed method may be performed in the same apparatus (processor) or in a standalone processor or a co-processor of the apparatus having the face alignment module and the face reconstruction module. The face alignment module has two components. The offline component accepts (receives) training images with or without occlusion. The offline component of the face alignment module may be performed within the face occlusion removal apparatus or in a standalone processor or in a co-processor. The offline component of the face alignment module then performs cascaded regression landmark estimation on the training images. The average face shape is then calculated (determined, computed) in the offline component of the face alignment module. Delaunay triangulation is then performed in the offline component of the face alignment module. The online component of the face alignment module then accepts (receives) a face image and occlusion mask. The online component of the face alignment module then performs cascaded regression landmark estimation on the face image. The online component of the face alignment module then performs piece-wise affine transform estimation using the results of the Delaunay triangulation to yield a warped face image of standard shape and size. The warped face image of standard shape and size is provided to the face reconstruction module, which includes an extraction module, a sparse coding module, a substitution module and a unwarping module. The extraction module also accepts Mask A (warped) specifying position of n<m available pixels. The extraction module extracts vector A of available pixels from the warped face image of standard shape and size. The results of the extraction module are provided to the sparse coding module. The Mask A (warped) specifying position of n<m available pixels is also provided to the sparse coding module. The sparse coding module also accepts the learned dictionary. As shown in Fig. 9 the sparse coding module accepts the learned dictionary from the learned dictionary module shown in a dashed outline to indicate that it may be performed within the inpainting (face reconstruction) apparatus and the learned dictionary is shown as input to the sparse coding module with a solid line (arrow) to indicate that the learned dictionary is provided from a standalone processor. The sparse coding module uses DA (A rows of D), the learned dictionary and the vector of available pixels to generate (compute, determine, calculate) a sparse vector x. The results of the sparse coding module are provided to the substitution module. The substitution module reconstructs the missing pixels (indicated by the occlusion mask) and substitutes DMX (matrix of reconstructed pixels) into positions M of the warped face image. The results of the substitution module are provided to the unwarping module, which unwarps the inpainted face image. The result of the unwarping module is a reconstructed face (a face image with inpainted occlusion). The dictionary learning method described above can be specialized for the specific task addresed by the proposed method by considering the following learning problem in place of equation (3):
2
rgminO Ό ∑ \y tM~O ^° (Ό g,y A)\2, (6)
r s f where the columns of are constrained to be unit norm. Here the dictionary is a selection dictionary, and each column in is coupled to a column in the reconstruction dictionary D^. The above problem can be solved using standard gradient-based solvers.
Using whole image rasterization makes the sparse decomposition task in equation (2) computationally demanding. In order to reduce this complexity of the inpainting online stage, the learned atoms can be forced to have a compact support in the region specified by the available-mask. When using equation (6), for example, the resulting learning objective is:
2
argminO Ό ∑ Ιγ^Ό^φ^^+λΙΌ^ , (7)
r s { where the Ιγ matrix norm notation is used to denote the summation of the absolute values of all entries in the matrix. Other possibilities include enforcing a minimum support per atom.
The problem in equation (6) needs to be solved individually for each mask M. In order to avoid the extra overhead, random masks can be used that are varied for each sample in the training set. The resulting dictionary is sub-optimal for any specific mask, but performs well on average for any mask.
Rather than operating on the entire face image, one can instead define many image strips so that that the entire face is covered, and execute the proposed method on a per-strip basis. Each strip will provide an inpainting prediction for a subset of the missing pixels M. If the strips are not disjoint, the average of the available predicted pixel values is taken for each pixel. The proposed method is applicable to a picture or video containing occluded faces for which it is desirable to reconstruct. The proposed method attempts to preserve the true expression of the subject, where even if the eyes were originally occluded when the person smiles one can see changes in the expression of his/her eyes. This is in clear contrast which classical "static" reconstructions which are constant regardless of facial expression.
It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs). Preferably, the present invention is implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces. Herein, the phrase "coupled" is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.
It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.

Claims

CLAIMS:
A method for performing face occlusion removal, said method comprising:
receiving a face image and an occlusion mask, said occlusion mask indicating missing pixels (505);
receiving training images (605);
performing face alignment on said received training images and said face image and said occlusion mask (510);
receiving a mask (515);
receiving a learned dictionary (520); and
reconstructing said face image using said mask and said learned dictionary
(525).
The method according to claim 1, wherein said face alignment further comprises: performing cascaded regression landmark estimation on said training images (610);
determining an average face shape using said landmark estimation of said training images (615); and
performing triangulation on said average face shape (620).
The method according to claim 2, wherein said triangulation is Delaunay triangulation.
The method according to claim 2, wherein said face alignment further comprises: performing cascaded regression landmark estimation on said face image (705); and
performing piece-wise affine transform estimation using said landmark estimation of said face image and said occlusion mask and said triangulation of said average face shape to generate a warped face image (715).
The method according to claim 1 , wherein said mask is a warped mask specifying positions of available pixels.
The method according to claim 5, wherein said reconstruction of said face image further comprises:
extracting a vector of available pixels of said warped mask (805); performing sparse coding using said learned dictionary and said vector of available pixels to generate a sparse code vector (810);
reconstructing the missing pixels using the learned dictionary and the sparse code vector (815)
substituting the reconstructed pixels into positions of said warped face image to generate a warped inpainted face image (815); and
unwarping said warped inpainted face image to generate an unwarped inpainted face image (820).
7. The method according to claim 6, further comprising outputting said unwarped inpainted face image.
8. A face occlusion removal apparatus, comprising:
a communications interface, said communications interface receiving a face image and occlusion mask;
said communications interface receiving training images;
a face alignment module, said face alignment module performing face alignment on said received training images and said face image and said occlusion mask, said face alignment module in communication with said communications interface;
said communications interface receiving a mask;
said communications interface receiving a learned dictionary; and a face reconstruction module, said face reconstruction module reconstructing said face image using said mask and said learned dictionary, said face reconstruction module in communication with said communications interface and said face reconstruction module in communication with said face alignment module.
9. The face alignment module according to claim 8, wherein said face alignment module comprises an offline face alignment component and an online face alignment component, said offline face alignment component operates on said training images and accomplishes an offline portion of face alignment by: performing cascaded regression landmark estimation on said training images;
determining an average face shape using said landmark estimation of said training images; and
performing triangulation on said average face shape.
10. The offline component of said face alignment module according to claim 9, wherein said triangulation is Delaunay triangulation.
11. The online face alignment component according to claim 9, wherein said online face alignment operates on said face image and said occlusion mask and accomplishes an online portion of face alignment by:
performing cascaded regression landmark estimation on said face image; and
performing piece-wise affine transform estimation using said landmark estimation of said face image and said occlusion mask and said triangulation of said average face shape to generate a warped face image.
12. The face occlusion removal apparatus according to claim 8, wherein said mask is a warped mask specifying positions of available pixels.
13. The face occlusion removal apparatus according to claim 12, wherein said face reconstruction module further comprises:
an extraction module, said extraction module extracting a vector of available pixels of said warped mask, said extraction module in communication with said communications interface, said extraction module also in communication with said face alignment module;
a sparse coding module, said sparse coding module performing sparse coding using said learned dictionary and said vector of available pixels to generate a sparse code vector, said sparse coding module in communication with said communication interface and also in communication with said extraction module; a substitution module, said substitution module reconstructing said missing pixels using said learned dictionary and said sparse code vector and substituting said reconstructed pixels into positions of said warped face image to generate a warped inpainted face image, said substitution module in communication module in communication with said sparse coding module; and
an unwarping module, said unwarping module unwarping said warped inpainted face image, said unwarping module in communication with said substitution module.
14. The face occlusion removal apparatus according to claim 13, further comprising said unwarping module providing an unwarped inpainted face image to said communications interface for output.
PCT/EP2015/072354 2014-09-30 2015-09-29 Face inpainting using piece-wise affine warping and sparse coding WO2016050729A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14306545 2014-09-30
EP14306545.6 2014-09-30

Publications (1)

Publication Number Publication Date
WO2016050729A1 true WO2016050729A1 (en) 2016-04-07

Family

ID=51786907

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/072354 WO2016050729A1 (en) 2014-09-30 2015-09-29 Face inpainting using piece-wise affine warping and sparse coding

Country Status (1)

Country Link
WO (1) WO2016050729A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930878A (en) * 2016-06-24 2016-09-07 山东大学 Micro-expression recognition method based on differential slice energy diagram and sparse coding
CN106204667A (en) * 2016-07-01 2016-12-07 山东大学 A kind of similarity solved in image super-resolution rebuilding retains the sparse coding method of problem
WO2018002533A1 (en) * 2016-06-30 2018-01-04 Fittingbox Method for concealing an object in an image or a video and associated augmented reality method
FR3053509A1 (en) * 2016-06-30 2018-01-05 Fittingbox METHOD FOR OCCULATING AN OBJECT IN AN IMAGE OR A VIDEO AND ASSOCIATED AUGMENTED REALITY METHOD
US9886746B2 (en) * 2015-07-20 2018-02-06 Tata Consultancy Services Limited System and method for image inpainting
KR20180109634A (en) * 2017-03-28 2018-10-08 삼성전자주식회사 Face verifying method and apparatus
CN108664782A (en) * 2017-03-28 2018-10-16 三星电子株式会社 Face verification method and apparatus
CN109671034A (en) * 2018-12-26 2019-04-23 维沃移动通信有限公司 A kind of image processing method and terminal device
CN109961407A (en) * 2019-02-12 2019-07-02 北京交通大学 Face image inpainting method based on face similarity
US20190205616A1 (en) * 2017-12-29 2019-07-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting face occlusion
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN111127308A (en) * 2019-12-08 2020-05-08 复旦大学 Mirror image feature rearrangement repairing method for single sample face recognition under local shielding
CN111639602A (en) * 2020-05-29 2020-09-08 华中科技大学 Pedestrian shielding and orientation detection method
CN112001865A (en) * 2020-09-02 2020-11-27 广东工业大学 Face recognition method, device and equipment
US20200388017A1 (en) * 2019-06-04 2020-12-10 The Boeing Company System, apparatus and method for facilitating inspection of a target object
CN113496468A (en) * 2020-03-20 2021-10-12 北京航空航天大学 Method and device for restoring depth image and storage medium
CN118710559A (en) * 2024-08-30 2024-09-27 中央广播电视总台 High-quality face restoration method, device, equipment, and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALESSANDRO COLOMBO ET AL: "Detection and Restoration of Occlusions for 3D Face Recognition", PROCEEDINGS / 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2006 : JULY 9 - 12, 2006, HILTON, TORONTO, TORONTO, ONTARIO, CANADA, IEEE SERVICE CENTER, PISCATAWAY, NJ, 1 July 2006 (2006-07-01), pages 1541 - 1544, XP031033142, ISBN: 978-1-4244-0366-0 *
JINGU HEO ET AL: "Face Pose Correction With Eyeglasses and Occlusions Removal", BIOMETRICS SYMPOSIUM, 2007, IEEE, PI, 1 September 2007 (2007-09-01), pages 1 - 6, XP031202452, ISBN: 978-1-4244-1548-9 *
WRIGHT J ET AL: "Robust Face Recognition via Sparse Representation", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 31, no. 2, 1 February 2009 (2009-02-01), pages 210 - 227, XP011226073, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2008.79 *
WU C ET AL: "AUTOMATIC EYEGLASSES REMOVAL FROM FACE IMAGES", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 26, no. 3, 1 March 2004 (2004-03-01), pages 322 - 336, XP001190677, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2004.1262319 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886746B2 (en) * 2015-07-20 2018-02-06 Tata Consultancy Services Limited System and method for image inpainting
CN105930878A (en) * 2016-06-24 2016-09-07 山东大学 Micro-expression recognition method based on differential slice energy diagram and sparse coding
CN105930878B (en) * 2016-06-24 2020-01-14 山东大学 Micro-expression recognition method based on differential slice energy diagram and sparse coding
WO2018002533A1 (en) * 2016-06-30 2018-01-04 Fittingbox Method for concealing an object in an image or a video and associated augmented reality method
FR3053509A1 (en) * 2016-06-30 2018-01-05 Fittingbox METHOD FOR OCCULATING AN OBJECT IN AN IMAGE OR A VIDEO AND ASSOCIATED AUGMENTED REALITY METHOD
US9892561B2 (en) 2016-06-30 2018-02-13 Fittingbox Method of hiding an object in an image or video and associated augmented reality process
CN106204667B (en) * 2016-07-01 2019-07-30 山东大学 A kind of sparse coding method that the similarity solved the problems, such as in image super-resolution rebuilding retains
CN106204667A (en) * 2016-07-01 2016-12-07 山东大学 A kind of similarity solved in image super-resolution rebuilding retains the sparse coding method of problem
KR20180109634A (en) * 2017-03-28 2018-10-08 삼성전자주식회사 Face verifying method and apparatus
US12087087B2 (en) 2017-03-28 2024-09-10 Samsung Electronics Co., Ltd. Face verifying method and apparatus
US11163982B2 (en) * 2017-03-28 2021-11-02 Samsung Electronics Co., Ltd. Face verifying method and apparatus
CN108664782A (en) * 2017-03-28 2018-10-16 三星电子株式会社 Face verification method and apparatus
US10387714B2 (en) * 2017-03-28 2019-08-20 Samsung Electronics Co., Ltd. Face verifying method and apparatus
CN108664782B (en) * 2017-03-28 2023-09-12 三星电子株式会社 Face verification method and device
US20220044009A1 (en) * 2017-03-28 2022-02-10 Samsung Electronics Co., Ltd. Face verifying method and apparatus
KR102359558B1 (en) 2017-03-28 2022-02-09 삼성전자주식회사 Face verifying method and apparatus
US20190205616A1 (en) * 2017-12-29 2019-07-04 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting face occlusion
CN109671034A (en) * 2018-12-26 2019-04-23 维沃移动通信有限公司 A kind of image processing method and terminal device
CN109671034B (en) * 2018-12-26 2021-03-26 维沃移动通信有限公司 An image processing method and terminal device
CN109961407B (en) * 2019-02-12 2021-01-26 北京交通大学 Face image inpainting method based on face similarity
CN109961407A (en) * 2019-02-12 2019-07-02 北京交通大学 Face image inpainting method based on face similarity
US11544839B2 (en) * 2019-06-04 2023-01-03 The Boeing Company System, apparatus and method for facilitating inspection of a target object
US20200388017A1 (en) * 2019-06-04 2020-12-10 The Boeing Company System, apparatus and method for facilitating inspection of a target object
CN110263756A (en) * 2019-06-28 2019-09-20 东北大学 A kind of human face super-resolution reconstructing system based on joint multi-task learning
CN111127308B (en) * 2019-12-08 2023-06-30 复旦大学 Mirror feature rearrangement repair method for single-sample face recognition under partial occlusion
CN111127308A (en) * 2019-12-08 2020-05-08 复旦大学 Mirror image feature rearrangement repairing method for single sample face recognition under local shielding
CN113496468A (en) * 2020-03-20 2021-10-12 北京航空航天大学 Method and device for restoring depth image and storage medium
CN111639602B (en) * 2020-05-29 2022-04-12 华中科技大学 A pedestrian occlusion and orientation detection method
CN111639602A (en) * 2020-05-29 2020-09-08 华中科技大学 Pedestrian shielding and orientation detection method
CN112001865A (en) * 2020-09-02 2020-11-27 广东工业大学 Face recognition method, device and equipment
CN118710559A (en) * 2024-08-30 2024-09-27 中央广播电视总台 High-quality face restoration method, device, equipment, and storage medium
CN118710559B (en) * 2024-08-30 2024-11-29 中央广播电视总台 High-quality face restoration method, device, equipment, and storage medium

Similar Documents

Publication Publication Date Title
WO2016050729A1 (en) Face inpainting using piece-wise affine warping and sparse coding
Purohit et al. Bringing alive blurred moments
Shi et al. Just noticeable defocus blur detection and estimation
CN112052831B (en) Method, device and computer storage medium for face detection
Yang et al. Face hallucination via sparse coding
Hacohen et al. Deblurring by example using dense correspondence
Punnappurath et al. Face recognition across non-uniform motion blur, illumination, and pose
CN112823375A (en) Image resynthesis using forward warping, gap discriminator and coordinate-based inpainting
CN111275626A (en) Video deblurring method, device and equipment based on ambiguity
Tuzel et al. Global-local face upsampling network
US20230051960A1 (en) Coding scheme for video data using down-sampling/up-sampling and non-linear filter for depth map
Pons et al. Variational stereovision and 3D scene flow estimation with statistical similarity measures
Rajput et al. Noise robust face hallucination via outlier regularized least square and neighbor representation
Li et al. A maximum a posteriori estimation framework for robust high dynamic range video synthesis
Song et al. Multistage curvature-guided network for progressive single image reflection removal
Hu et al. Graph-based joint denoising and super-resolution of generalized piecewise smooth images
Zhou et al. Polarization-aware low-light image enhancement
Valente et al. Perspective distortion modeling, learning and compensation
Li et al. Robust foreground segmentation based on two effective background models
Li et al. Robust blind motion deblurring using near-infrared flash image
US12086931B2 (en) Methods of 3D clothed human reconstruction and animation from monocular image
JP2019096146A (en) Image identification device, image identification method, computer program, and storage medium
Bilgazyev et al. Sparse Representation-Based Super Resolution for Face Recognition At a Distance.
CN111754561B (en) Light field image depth recovery method and system based on self-supervision deep learning
Lu et al. Priors in deep image restoration and enhancement: A survey

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15770911

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15770911

Country of ref document: EP

Kind code of ref document: A1