1 Introduction

Face recognition is one of the most important and influential disciplines of modern biometrics. Several factors such as noninvasive nature of the method of data acquisition when compared to fingerprinting or iris recognition, applications in forensics and forensic science, localization of missing people, access control, passport and driver’s license verification, and finally, omnipresence of computers play a fundamental role here. However, there are a number of evident challenges to be addressed in the field of computational face identification.

Capturing a way people recognize individuals, particularly their faces, is still a challenging problem, and much of research has been focused on how to describe the essence of the recognition process. It is needless to say that we are extremely efficient in recognizing people, particularly if we have already seen the face to be recognized, or in comparing the specific facial parts (e.g., belonging to the significant from the recognition point of view, periocular region; see Hollingsworth 2014). However, it is impossible to remember and correctly recall thousands of faces and compare them in a reasonable amount of time producing acceptable results. On the other hand, computers are intensively used in face recognition systems by being endowed with highly sophisticated algorithms. Nevertheless, computers still cannot fully manage the problems such as various poses of the subject, illumination, noise, or the age of the depicted person. Finally, the machines still do not capture the human mechanisms of description and recognition of faces. It is worth noting that the way people describe faces and their parts in natural language is not very complicated and it is quite common for the whole population, which is particularly used by specialists from the field of criminology, at least from one culture regarding to the phenomenon of the own race bias, the finding that faces of one’s own race or ethnicity are better remembered than faces of another race/ethnicity (DeLozier and Rhodes 2015). In the studies reported by O’Toole et al. (2007), it was shown that the fusion of the subjects’ answers and the results obtained by the state-of-the-art algorithms improves the accuracy of face verification. One can make use of these facts by attempting to capitalize on the linguistic description of the face and the efficiency of the computational face recognition algorithms. Undoubtedly, the importance of specific facial regions may have the pivotal meaning here, particularly, from the following two points of view. First, it may help us save our time and memory when the nonimportant facial features are excluded from the classification process. The second reason is that the importance of information contained within the facial feature can vary from the information covered by the other facial part. Here, obviously, one is interested in estimating the relevance of the individual features.

On the other hand, humans process faces in a holistic manner (Sinha et al. 2006) with a pivotal role of spacing between features (second-order relations, Rotshtein et al. 2007). In particular, the internal features are more important in the process of recognition of familiar faces than features, which are external to the faces themselves (hair, face contour, etc.). The last ones exhibit more importance whenever unfamiliar faces are considered (Ellis et al. 1979; Young et al. 1985). In classic studies (Davies et al. 1977; Haig 1986; Matthews 1978), eyes/eyebrows followed by mouth and nose are found to be the most descriptive regions. Moreover, the importance of eyebrows was confirmed, for instance, by (Sadr et al. 2003). Interesting research was carried out by Tome et al. (2015a), where various ways of fusion of regions and their importance in the process of recognition were verified. Literature surveys on human recognition of trained and untrained (in other words, familiar and unfamiliar) faces and cue importance are covered in (Johnston and Edmonds 2009; Shepherd et al. 1981; Vignolo et al. 2013). Of course, there are the results discussing the saliency of the regions in the computational processes. For instance, when using the template-matching strategy, Brunelli and Poggio (1993) obtained the following ranking of importance of feature: eyes, mouth, nose, and whole face template. A brief survey of the works on the facial regions saliency in the process of human identification by people and by computers can be found in (Karczmarek et al. 2014). It is worth noting that the holistic manner of human proceeding aligns with the psychological Gestalt theory, where the concept of holism is one of the most important ideas (Wagemans et al. 2012). It is assumed that the whole is something more than the process of summing its composites since the summing is a meaningless procedure (Koffka 1935). Modern Gestalt theory introduces the global precedence hypothesis (Navon 1977) stating that in the context of visual object which is a hierarchical structure with dependencies among its parts processing proceeds from global structures (being at the top of hierarchy) toward local (positioned at the bottom) properties’ analysis (Wagemans et al. 2012). If we consider a face, it becomes apparent that it is defined by spatial relationships existing between its parts such as eyes, nose, and mouth, which in turns are defined by the spatial relationships between their subparts. These relationships between the components of the facial parts are more general than the detailed properties of their parts such as, for instance, width of the nose (Wagemans et al. 2012). Here, a very interesting question arises on how to capture the spatial relationships in an efficient manner, and, what is probably more important, how to apply this knowledge to efficiently recognize people in expert systems. To answer it, first, we have to estimate the importance of the facial parts in the process of human recognition. An understanding of this mechanism is clearly very difficult. However, beside the above-mentioned psychological and computational experiments, the subjective judgments of experts, being the professionals in the field of forensics and psychology, captured in a systematic way may be helpful here. Second, after determining these estimates, one can use the results as the weights in the applications at the level of classifier construction using many formal tools such as aggregation operators.

Furthermore, global precedence hypothesis corresponds to the paradigm of granular computing (Pedrycz 2013). Therefore, the facial features can be grouped into meaningful and semantically sound entities, referred to information granules such as internal/external facial features (e.g., eyes, and nose/chin and ears), upper/lower half of a face, and eyes/nose/mouth areas (the last partition was described by Kurach et al. 2014). Of course, each of these general information granules consists of “atomic” facial parts. For instance, in the case of upper half of face there can be eyebrows, eyelids, forehead, etc. Finally, these groups of features result in the entire face. However, taking into account the assumption about the difference between the direct summing of the features and the whole face, the general task should be to catch the essence of the process of dealing with the relations among the features at each level of abstraction–granulation (i.e., work with the granules such as atomic facial parts, groups (areas) of the facial features, and the whole face at the highest level). Such approach seems to be intuitively appealing, and the nature of the resulting feature space is linguistic and articulated in terms of granular information.

Therefore, despite the rapid growth of studies containing computational facial recognition methods the need of linguistic description of the facial features has been constantly present in the studies on face recognition. The idea comes from the belief that the human linguistic description of the face and its features can be understood (formalized) by computer to achieve a realistic and efficient human-machine interaction (Iwamoto and Ralescu 1992). However, the description can be relatively different for two different people. Furthermore, it may depend upon their culture, profession, or age. Nevertheless, these differences between descriptions can be corrected by appropriate modeling of the data, e.g., adjustment procedure (Fukushima and Ralescu 1995), an application of fuzzy sets and logic, or, which seems to be the most intuitive approach, by engaging experienced specialists such as psychologists or criminologists to estimate the facial parts. Interesting approaches were presented in (Kumar et al. 2011), where the labels were assigned to the images coming from a large dataset in order to describe the subjects and in (Rahman and Beg 2015), where sketching with words using so-called f-geometry (Zadeh 2009) was used. Moreover, Conilione and Wang (2012) applied fuzzy clustering and fuzzy inference methods to obtain the membership degrees for semantic labels to images in the process of image retrieval based on description of facial features. Finally, Tome et al. (2015b) proposed a tool automatically converting facial landmarks to a feature set. An extensive survey of the literature concerning the linguistic descriptors in the context of face recognition is contained in (Karczmarek et al. 2015).

The main objective of this study is to systematically quantify the importance of the main facial features being used in the process of facial recognition by humans and describe the effectiveness of the overall process. This quantification is of utmost relevance as the saliency of the features can be subsequently applied to the process of fusion or aggregation of the information contained about the facial parts at each stage of the process of recognition, particularly at the time of generating the final results of classification (Kwak and Pedrycz 2005). Moreover, we are interested in a sound selection of the most descriptive features and finding their general ranking. There are various facial features used in criminology, cognitive psychology, psychology of emotion, etc., and they are discussed at each level of abstraction (regions of the face or its features such as length of the nose). Hence, their appropriate choice becomes an essential task. The originality of our method stems from a systematic way of determining the most salient facial features through the use of the analytic hierarchy process (AHP), see (Saaty 1980; Saaty and Vargas 2012), which is based on the pairwise comparisons on the multilevel hierarchic structures delivering support to the process of decision making. In our study, we consider a three-layer processing hierarchy (general information together with the regions of face at the top of hierarchy and specific features at the bottom of hierarchy). To produce reliable results, in the experimental study the comparisons are carried out by experts who are experienced practitioners working in the fields of criminology and psychology. This way we can obtain the weights for the specific facial features, particularly for those most crucial in the process of facial recognition. Note that the novelty of our work resides in the fact that we use only the linguistic information (data) not numeric measurements. Such an approach may shed a new light on the essence of the recognition process, particularly in the context of utilizing the human innate ability to assess the other people.

Furthermore, our ultimate goal is to develop a method of verification of the confidence of information obtained from the AHP when applied to the real-world situation of face recognition, e.g., suspect identification. For this purpose, we apply the entropy measure using which we quantify partial results and overall results (producing some entropy measure) of the three-level AHP. The originality of the proposed model of identifying individuals is in the use of a collection of experts. Their evaluations of the abstract features and their weights, in general, and their opinion on the real facial features are aggregated to estimate a level of confidence regarding the identification process. The embedded mechanism of evaluation with using the entropy and a natural mechanism of preserving the so-called reciprocity property present in the AHP can be a good proposal to avoid any biases when the experts give opinion on the features. Such biases can occur when an expert collaborates with technology which takes over the more predominate and significant role (Dror et al. 2012). Moreover, as the newest findings show, the internal feelings can significantly change the ability to recognize emotions (see Zhang et al. 2016b).

It is worth noting that the application of the expert-based approach in the criminal field cannot be overrated; however, the door to apply the model proposed in this study to any automatic state-of-the-art content-based face recognition such as sparse representation (Wright et al. 2009), deep learning (Sun et al. 2014), or the latest works by Cament et al. (2015), Khan et al. (2016), Moeini et al. (2016), Zhang et al. (2016a) becomes widely open. Moreover, such incorporation would perfectly contribute to a model proposed in so far forensic literature (see, for instance, Arca et al. 2011). The process of data acquisition and assessment by a group of experts seems to be easily supplemented by the computational feature extraction algorithms followed by multicriteria decision-making theory that is based, for instance, on fuzzy logic, etc. Additionally, the data collected during the experiments can be a valuable source of information for studies in other scientific domains and applications to the specific problems. If the feature set is different than the one considered here our proposal, it can be a novel road map to manage such features and establish their saliency. Since the experts estimate the features from their subjective points of view and the so-called own race bias can be a dominant factor in the assessment, then the various methods of adjustments of the data can be supplied to the system by its developers. It is worth adding that the presence of experts can be an important factor when applying the system to the retrieval tasks, where the problem of so-called semantic gap between the low-level and high-level features is present (Liu et al. 2007). Finally, the hierarchical and tree structure of the AHP makes it a flexible vehicle to capture (quantify) the granularity of information being the outcome of the method. We can consider these granules of information at different levels of abstraction and evaluate their usability in the context of recognizing faces regarding to their uncertainty level based on the concept of entropy.

The paper is organized as follows. Analytic hierarchy process is briefly recalled in Sect. 2. The proposed method of investigating the saliency of the facial features is described in Sect. 3. In Sect. 4, presented are the results of experiments completed for the data delivered by the experts. Section 5 includes one of the applications of the AHP method cast in the context of confidence of the information gathered from the user while the last section covers several conclusions.

2 Analytic hierarchy process

In this section, we present the most important conceptual and algorithmic aspects of the AHP method as originally proposed by Saaty (1980, 1988). In essence, the method is a hierarchic approach to produce decisions about choice, ranking, prioritization, and others completed for a finite number of objects (entities). The procedure may be described as follows. First, the hierarchy of concepts present in the problem is outlined. There is a goal positioned at the top, next there are the criteria, and at the bottom of the hierarchy we have a collection of alternatives.

In our case, we are concerned with determining the importance of facial features. The goal is to find the ranking of the most important, useful from the point of view of person’s classification, features. The criterion is the saliency of facial features. The alternatives are the specific features (to be described in detail in the next section).

At the next step, the user (or users, viz. experts in the field of interest) quantifies the judgments between the elements (i.e., alternatives) of the hierarchy based on the pairwise comparisons of these elements. Given n elements of interest (alternatives), the results of judgments are organized in the \(n\times n\) matrix A, where n is a number of the alternatives (facial features).

The values of the pairwise comparison are produced by the experts using the following commonly used scale (Saaty 1988; Saaty and Vargas 2012):

  • equal importance (1),

  • weak importance (2),

  • moderate importance (3),

  • moderate plus (4),

  • essential/strong importance (5),

  • strong plus (6),

  • very strong/demonstrated importance (7),

  • very, very strong (8),

  • extreme importance (9).

The matrix A has a very important property of reciprocity, i.e., for each element \(a_{ij}\) of the matrix we also have \(a_{ij}=1/a_{ji}, i, j=1,\ldots ,n\). Obviously, \(a_{ii}=1\). To assess the consistency of the results of pairwise comparison, so-called inconsistency index and consistency ratio are being used. The inconsistency index reads as follows \(\nu =(\lambda _\mathrm{max}-n)/ (n-1)\), where \(\lambda _\mathrm{max}\ge n\) is a maximal eigenvalue of the reciprocal matrix A. The consistency ratio c is given in the form \(\mu =v/r\), where r is an average random inconsistency index whose values are empirically determined as equal to \(r=0\), 0, 0.52, 0.89, 1.11, 1.25, 1.35, 1.40, 1.45, 1.49 for \(n=1,\ldots ,10\), respectively. The values were obtained as mean consistency indices of 500 randomly generated reciprocal matrices (Saaty and Mariano 1979). Various methods of obtaining these values for matrices of higher dimension were discussed, for instance in (Saaty 2000) and (Alonso and Lamata 2006). It is commonly assumed that the consistency ratio should not exceed 0.1 so that the results can be deemed of satisfactory level (Saaty and Vargas 2012). However, it can be sometimes difficult to obtain such small value of the ratio, particularly when intangible features such as subjective ideas are compared. Finally, the eigenvector corresponding to the maximal eigenvalue \(\lambda _\mathrm{max}\) consists of the elements establishing the importance of the features. In our decision problem, we form the ranking of the most important facial features based on the judgments of the experts. Let us describe briefly the way we obtain the values for a particular facial feature. First, the principal eigenvectors \({{\varvec{w}}}_{i}=[w_{i,1},\ldots ,w_{i,n}]\) resulting from the AHP method are normalized, viz. \({{\varvec{y}}}_{i}={{\varvec{w}}}_{i}/max_{j}w_{i,j}\). Here, the index \(i=1,\ldots ,p\) denotes the respective expert. Next, the consistency ratios \(\mu _{i}\) are calculated. The weights are defined as \(\omega _{i}=1-\mu _{i}\), and, in a sequel, their values are normalized, i.e., \(u_{i}=\omega _{i}/(\omega _{1}+\ldots + \omega _{p})\). The importance describing the jth feature is expressed in the form

$$\begin{aligned} x_{j}=y_{1j}\,\, u_{1}+y_{2j}\,\, u_{2}+\ldots +y_{pj}\quad u_{pj}, j=1,\ldots ,n. \end{aligned}$$
(1)

It is worth noting that in the case of two or more experts the values of the final reciprocal matrix A can be obtained as the geometrical mean of the corresponding elements of the matrices created by each expert taking part in the experiment separately to preserve the property of reciprocity. However, when the individual priorities, i.e., the features rankings obtained by experts, are aggregated, then both geometric and arithmetic mean can be applied. For details, one can refer to (Aczél and Roberts 1989; Aczél and Saaty 1983; Forman and Peniwati 1998).

One can note here that there have been several applications of AHP in the context of object recognition. In Cheng et al. (2005) and Chou and Cheng (2006), AHP was applied to image semantic representation in the method of image retrieval. Moreover, Cheng et al. (2007) proposed a method of facial emotions recognition.

3 Saliency of the facial features

In many experimental studies concerning the way of the process of recognition by people or by computers, the authors assume the obligatory partition of the face, e.g., upper and lower half of the face (Haig 1986), forehead area (including the eyes), nose region (including cheeks and ears), and the mouth region (including chin, cf. Kurach et al. 2014), the areas of eyebrows, eyes, nose, mouth, cheeks (Karczmarek et al. 2014), the regions of eyebrows, eyes, nose, mouth, chin, and hair (Matthews 1978), and other partitions.

The main task is to determine an optimal set of the features exhibiting the significance in the potential practical recognition problems. We are interested in determining the ranking of the most important facial features useful in the context of face recognition. This ranking and the weights related to the particular features may be used in the applications including all the algorithms of facial recognition based on the computational operations only or both computational operations and psychological experiments (particularly expert’s opinions). As it was shown in (Karczmarek et al. 2014; O’Toole et al. 2007), an application of the psychological results in the computational processes may highly improve the accuracy of the algorithm.

Let us now describe in detail the proposed method of establishing the saliency of facial features. We apply the AHP method to produce (quantify) the importance of the facial cues. For this purpose, we ask experts coming from the fields of criminology and psychology for estimation of the features taking into account their own subjective knowledge and experience. The goal is to choose the most important facial features from a given set. The criterion is the importance of the facial feature in the process of human recognition. Finally, the alternatives are grouped into the following sets of high-level features (i.e., composed of other features): (a) general information which can be deduced while observing the entire face (e.g., age and gender), (b) eyes region (including a forehead), (c) nose region containing ears, (d) mouth region (including a chin)—representing internal facial features—(e) external features such as hair or neck.

The above partition of the features into their subsets is motivated by the observation that during the psychological examinations or forensic investigations the subjects or the witnesses of crimes are often asked to describe a given facial image containing both external or internal facial features of an unknown criminal using as many details of his/her appearance. However, in computational applications to databases containing two-dimensional frontal pictures it is difficult to compare the images taking into account hair or ears area. It follows from the fact that the hair area can vary and the ears can be hidden under hair.

The experts being the subjects in our experiment are practitioners in their areas of expertise. One of them is a police detective with over 30 years of practical experience, and the two of them are psychologists with over 10-year work experience. They are asked to complete pairwise comparisons of the facial features or regions. They have to determine that feature a is preferred over feature b to the value of n, where n is one of the numbers from the scale described in previous section, i.e., from the range 1 (equal) to 9 (extreme preference).

These values are organized into the corresponding reciprocal matrix. As a result of running AHP, we obtain the normalized eigenvector corresponding with the maximal eigenvalue along with the inconsistency index and consistency ratio. Those results offer a detailed insight into the super-features (a)–(e).

Next, we run the AHP for the features being the components of the above regions (forehead, eyebrows, eyes, nose, ears, cheeks, mouth, and chin areas). Finally, we use the AHP to produce the rankings in ten groups (or areas containing) of internal facial features for the particular measurable parts. Figure 1 depicts a three-level topology of the overall process.

Fig. 1
figure 1

Three-level AHP realized for the selected features. For the clarity of presentation, only the part regarding to the hair from the first AHP level is depicted. Similarly, only a few chosen features of the second level are presented

As presented in Table 1, these atomic cues have the properties easier to describe in linguistic terms that the areas containing a single feature with its neighborhood or more grouped features. One can easily see a specific detail such as the shape of the eyebrows or the length of the nose than the general impression regarding to a given part of face. On the other hand, the importance of regions containing the sets of such features seems to be covered in their possibility to affect the user’s perception in more general way which cannot be defined in terms of the physical measures such as length and width. The details of facial description can be found, for instance, in the document reported in (FISWG 2016), police suspect description sheets (e.g., one can refer to the description provided by the Chicago Police Department 2016). The table comprises a collection of selected facial features and their attributes which are, in our opinion, the most descriptive, and simultaneously, relatively easy to obtain from the 2D photographs of the individuals.

Table 1 Selected facial features and their linguistic descriptors
Fig. 2
figure 2

Entropies of the particular features with the corresponding weights and their hierarchical structure

Table 2 Hair features—experts’ reciprocal matrices
Table 3 Mouth region features—experts’ reciprocal matrix

4 Confidence of identification process realized by an observer

An observer (say, a witness of a crime) characterizes a specific individual (a suspect). The tree (the three-level hierarchy developed above) is useful in assessing the confidence of the identification process. The idea exploits a concept of weighted entropy. Let us consider that the observer describes eyebrows length using the AHP method being applied to the following quantification of the attribute: short, average, long (see Table 1). For instance, one of the questions can be: To which extent short eyebrows are preferred in describing the suspect’s eyebrows as opposed to long eyebrows? Let us assume that we have obtained the values \(z_{1}, z_{2}, z_{3}\) (which correspond to the three introduced values of this particular person’s eyebrows length) as a result of running AHP at this particular level. Using them, we can determine the entropy of the eyebrows length attribute, say H(length) and the remaining entropies H(direction), H(distance between the eyebrows), H(position), H(shape), H(thickness), and H(color). Following the hierarchy shown in Fig. 2, one can estimate the entropy of the eyebrows feature in the form

$$\begin{aligned}&H(\mathrm{Eyebrows}) \nonumber \\&\quad =\, v_\mathrm{length}H(\mathrm{Length})+v_\mathrm{dir}H(\mathrm{Direction})\nonumber \\&\qquad +\, v_\mathrm{dist}H(\hbox {Distance between the eyebrows})\nonumber \\&\qquad + \,v_\mathrm{pos}H(\mathrm{Position})+ v_\mathrm{shape}H(\mathrm{Shape})\nonumber \\&\qquad + \,v_\mathrm{thick}H(\mathrm{Thickness})+v_\mathrm{color}H(\mathrm{Color}), \end{aligned}$$
(2)

where \(v_\mathrm{length}\), etc. are the weights corresponding to the attributes length, direction, etc., respectively, and, for instance, H(Length) is given as

$$\begin{aligned} H(\mathrm{Length}) = - (z_{1} \log z_{1} + z_{2}\log z_{2}+ z_{3} \log z_{3}). \end{aligned}$$
(3)

Note that a similar calculation is proceeded for the other attributes. As discussed in the previous section, the values \(v_\mathrm{length}, \ldots , v_\mathrm{shape}\) are the experts’ evaluations of the facial attributes not being related to any specific face.

At the second level of AHP hierarchy, one can determine the entropies for each region of the face. For the eyes region, it will be

$$\begin{aligned}&H(\mathrm{Eyes\,region}) \nonumber \\&\quad =\,v_\mathrm{forehead} H(\mathrm{Forehead})\nonumber \\&\qquad +\, v_\mathrm{eyebrows}H(\mathrm{Eyebrows}) + v_\mathrm{eyes} H(\mathrm{Eyes}), \end{aligned}$$
(4)

where \(v_\mathrm{forehead}, v_\mathrm{eyebrows}\), and \(v_\mathrm{eyes}\) are the weights obtained in the AHP for forehead, eyebrows, and eyes, respectively.

Finally, we produce the total entropy using the following formula

$$\begin{aligned} H_\mathrm{total}= & {} v_\mathrm{gen}H(\mathrm{General\,info}) + v_\mathrm{eyes}H(\mathrm{Eyes\,region})\nonumber \\&+\, v_\mathrm{nose}H(\mathrm{Nose\,region})+ v_\mathrm{mouth}H(\mathrm{Mouth\,region})\nonumber \\&+\, v_\mathrm{ext}H(\mathrm{External\,area}), \end{aligned}$$
(5)

where \(v_\mathrm{gen}, \ldots , v_\mathrm{ext}\) are the weights corresponding to the results of AHP produced by the experts and related to the five groups of features at the highest level of AHP. The higher the value \(H_\mathrm{total}\) is, the lower the confidence one has in the identification of a suspect realized by the observer. Subsequently, this may call for another identification process or, if possible, considering an input coming from another observer.

Table 4 Most general facial features—experts’ reciprocal matrices

5 Experimental studies

As it was described above, each of the \(p=3\) experts was asked to make the pairwise comparisons between the facial features by taking into account his/her subjective and independent opinion about the importance of the considered facial part in the context of recognition of the subject. The experts from the field of criminology (one with about 30 years of experience in the branch) and psychology (each of them with more than 10 years of experience) estimated these relations using the AHP method in the above-mentioned ten groups of facial features filling the specially prepared questionnaires with a spreadsheet program. On the one hand, the number of experts may be too low. However, in our opinion, the results obtained in the experiments seem to be intuitively appealing and exhibit a relatively satisfying level of consistency. In a group decision making, the number of experts depends on many factors, namely their availability, the level of their heterogeneity, experience, and position in a group. In our opinion, the present number of experts is a good compromise between the representativeness of their preferences and the agility of the process to reduce the total number of pairwise comparisons in a three-level hierarchy of decisions. Moreover, choosing too large number of experts would cause in averaging their opinions and vanishing the differences between the preferences after the process of aggregation of their responds.

First, we use the AHP at level 1. Let us consider the case where the experts are asked a question to which extent length is suitable in assessing hair over texture. Similarly, they have to compare hair color and texture, and finally hair length and color.

The examples of the answers of the experts are presented in Table 2, where the features of the hair area are considered. Similarly, the questionnaires concerning mouth area are given in Table 3 while the main groups of features are shown in Table 4. All the values of consistency ratios regarding to the experts and the features they estimated are presented in Fig. 3.

Fig. 3
figure 3

Consistency ratios associated with opinions of experts. Note that the CRs corresponding to the ears and chin are zeros

Fig. 4
figure 4

Average importance of the features (concerning general information and upper half of the face) normalized to the sum per group equal to 1

Fig. 5
figure 5

Average importance of the features (lower half of the face) normalized to the sum per group equal to 1

The final results are displayed in Figs. 4 and 5. Here, we present the values of the average weighted eigenvectors corresponding to the principal eigenvalues of the reciprocal matrices being the outputs of the AHPs. As visible from Figs. 4 and 5, the most intuitive and common opinions regarding to the importance of particular facial features are fully present in the ranking, e.g., that the most descriptive regions are eyes, nose, and mouth areas. It is quite surprising that the experts do not envision the importance of the eyebrows when eyes are excluded. However, this fact is often reported in the publications devoted to computational face recognition, but is not necessarily present in the real-world scenarios, where the people concentrate on the eye area rather than on the eyebrows. Lower position of an origin of the subject may be influenced by the fact that all the experts work in relatively homogeneous society. Very dominating feature in many classes of consideration is the color (e.g., the color of eyes or hair). It may be difficult to utilize this information in the computations when the images are gray scale. On the other hand, this fact may be very useful when the color model is RGB. Other observation coming from the table is that the experts do not consider the importance of such details as the width of the philtrum. Nevertheless, we think that the presence of such detailed features in the ranking may be helpful from the practical point of view in the situation when the images in the dataset have a good quality, e.g., high resolution.

Fig. 6
figure 6

Importance of facial features occurring at the second level of AHP

Fig. 7
figure 7

Consistency ratios obtained at second level of the AHP

Fig. 8
figure 8

Most general facial areas and their importance obtained through the AHP process at the third level

Fig. 9
figure 9

AHP consistency ratios for the most general areas of consideration

Next, we proceed with the use of AHP realized at level 2. The experts are asked the questions in the following form

  • To which extent hair is regarded more important than forehead in face recognition?

Beside these two features considered are also general information, eyebrows, eyes, nose, ears, cheeks, mouth, and chin areas. The results are presented in Fig. 6 while the consistency ratios are shown in Fig. 7. In light of the experts’ opinions, information about hair and the eyes area is the most important one.

In addition, we considered another arrangement of the features, namely the more general group features such as general information, eyes \(+\) forehead region, nose \(+\) ears region, mouth \(+\) chin region, and external features, i.e., the third level of AHP. If we take into account only the face, it is easy to note that the following dependency holds: the lower feature the most importance is associated with it (refer to Figs. 8 and 9 for the corresponding consistency ratios).

Now, let us consider the confidence of identification process realized by an observer described in Sect. 4. In this series of experiments, our experts were to make an assessment on the six real photos coming from the ColorFERET dataset (see Fig. 10, Phillips et al. 1998, 2000). We have chosen the first six images from this database to be RGB, not grayscale, and the photographed people not wearing glasses. The experts estimated in detail, using the AHP process, the points 3, 4, and 5 from Table 1; namely, they answered the questions of the form

  • To which extent the hair of the eyebrows is regarded short versus being long?

In this way, we determine the following sum

$$\begin{aligned} H(\mathrm{Eyes\,region})= & {} v_\mathrm{forehead}H(\mathrm{Forehead})\nonumber \\&+\,v_\mathrm{eyebrows}H(\mathrm{Eyebrows})\!+\!v_\mathrm{eyes}H(\mathrm{Eyes})\nonumber \\ \end{aligned}$$
(6)

for each expert and made an observation on his/her confidence as the observer. In the example, we narrowed the scope of the examination to the eye and forehead area only. However, it does not cause losses of generalization.

Fig. 10
figure 10

Selected images from the ColorFERET Dataset (Phillips et al. 1998, 2000). Let us denote them by Person 1, Person 2, Person 3, Person 4, Person 5, and Person 6

Table 5 Results of the AHP conducted by the experts taking part in the experiment regarding to the particular facial images and facial features
Table 6 Entropies associated with the individuals (being accumulated through all the features) and the number of AHP resulting vectors containing the values higher than 0.5

Table 5 presents several estimates of the images produced by the experts. The values presented are the averages from the weights for the forehead width and height, eyebrows length and thickness, and eye length and color obtained through the pairwise comparisons carried by the three experts. Bolded are the winner values of the features. The average sums of all considered features entropies for all the persons are listed in Table 6. The values of entries are simple sums of the entropies divided by the number of experts taking part in the experiment. From the data, it is easy to see that the Persons 1 and 2 are the most difficult to estimate while the Persons 5 and 6 are the easiest to describe. Partially, it can be the result of the difficulties with estimation of the forehead area covered by hair the pose of the subject. However, the second characteristics included in Table 6 are appealing here. The resulting AHP vectors for the three last persons contained 50 or 51 features where the experts’ opinions gave more than 50 % certainty to have a specific linguistic value. Table 7 shows the entropies obtained when using all the features considered in the experiments. Here, we see that the features such as facial shape, distance between the eyebrows, their shape, and, quite unexpectedly, the color of eyes seem to be the most uncertain to estimate. Finally, in Table 8 presented are the average entropies obtained from the results of all the faces descriptions by each experts. It was calculated in two manners. First, all the entropies for all the features estimated by the experts were summed and averaged. In the second one, the entropies were accumulated using the introduced hierarchy and weights obtained by the same expert; see the formula (6). The results can be explained that estimating the concrete and specific facial features the most confidence is with police expert (no. 1) and first psychologist. However, if we consider the weights applied to abstract features, we see that the most confident is the expert no. 3 (second psychologist). This means that this expert found the best relationships between the abstract features.

Table 7 Average entropies of all the features considered in the experiments
Table 8 Entropies related to the experts taking part in the experiment

6 Conclusions

In this paper, we have proposed a model of estimation of the weights of facial features in the process of facial recognition by humans. The approach is based on the experts opinion and knowledge and, depending upon some practical needs, could be easily adjusted. The AHP method has been proposed as a generic method to produce weighting of the essential features to be used by humans in describing faces. The three-level hierarchical structure has been developed along with the entropy-based way of evaluation of the relevance of the assessment of individuals. The originality and novelty of our proposal stem from the fact that only the linguistic rather than numerical values were used to produce the detailed experimental results which come with interesting and intuitively appealing results. The dataset contained all the experts’ pairwise comparisons which can be obtained from the authors of this paper can be an interesting example to any other studies and applications.

Some future studies may focus on tracking the evolution of the methodology by further validation of the hypothesis that increasing face recognition rates can be obtained through the definition of features derived from linguistic descriptors. Of course, it can be done by practical implementations. Other directions of development can be the application of the obtained results in the process of facial recognition using the information acquired from several sources such as face regions and multicriteria decision-making algorithms (e.g., aggregation function), or application in other algorithms using, for instance, neural networks. Moreover, our intent is to design the process of AHP for the particular images and to construct membership functions using fuzzy set-based representations of the facial features. Some improvements can be done by realizing optimization of the AHP results by particle swarm optimization, differential evolution, genetic algorithms, etc. (Kacprzyk and Pedrycz 2015). Additionally, we are interested in automation of our proposal. The method can be easily integrated with the content-based face recognition systems where the experts’ opinions are supported by the computational algorithms. Moreover, the entropy determination process can be effectively applied to the estimate of the witness of the crime. Finally, the results of this study and the method of obtaining the information from both the experts and the witnesses evaluating the photographies of the suspects can influence the processes of development of novel facial components software kits similar EvoFIT (Frowd et al. 2004), FACES (IQ Biometrix 2016), Identi-Kit (Identi-Kit Solutions 2016), SketchCop (SketchCop Solutions 2016), Mac-a-Mug and Photo-Fit (Wells and Hasel 2007), etc.