1. Introduction
Modern organizations are facing a higher level of increasingly complex competition in global markets. The future survival of organizations depends primarily on personnel performance. Personnel performance, such as capability, conscientiousness, achievement motivation, and other characteristics, plays a critical role in maintaining the competitive advantages of an organization. Accurate personnel performance prediction can help decision makers to recruit and select the most appropriate people for each job. Consequently, personnel performance predicted by computer-aided techniques is an extraordinary challenging research topic.
Many researchers pay close attention to the study on human performance modeling. In the literature, there are many efforts toward forecasting human performance using various methods from psychology, management and data mining areas, such as correlation analysis and rule-based methods [
1]. However, rule-based methods have limited performance, because domain experts or professionals need to manually predesign large bodies of rules to explore the meaningful patterns to some extent. Recently, machine-learning-based methods have attracted great attention from some researchers. These methods have also been successfully applied to human performance modeling. Chien et al. [
2] presented a data-mining framework using decision tree and association rules for human performance modeling. This study extracted useful patterns and rules between personnel characteristics and work behaviors. A self-regulating clustering algorithm was presented to analyze bank personnel performance in order to improve the performance of human resource management (HRM) [
3]. Gobert et al. [
4] developed an interactive environment to evaluate students’ scientific research capability via data mining technology. Li et al. [
5] proposed an improved KNN algorithm to deal with human performance prediction in a manufacturing system. This method utilized a distance calculation formula based on entropy, a classification rule, and a quantitative description way of human performance. Wang et al. [
6] applied a hybrid feature selection method to handle human resource selection, which increased the classification performance. Although many methods have been proposed, depending on the characteristics of good features, these traditional machine-learning-based methods have weak generalization and learning ability. The performance relies too much on prior knowledge of domain designers, which is cumbrous and time-consuming, and heavily affects classification accuracy. However, human performance modeling research is still at an early stage, and there is great potential for further improvements in its performance.
In recent years, deep learning [
7,
8] and artificial neural networks have exhibited outstanding performance, which have become the state-of-the-art method for many pattern recognition problems. They have been successfully applied in a range of computer tasks, including image classification [
9], breast mass classification [
10], semantic segmentation [
11], object detection and recognition [
12,
13], emotion recognition [
14], language identification [
15], agricultural areas [
16], drug–drug interaction extraction [
17], and so on. However, few works can be found in the field of HRM.
Lately, the rapid development of deep learning has brought new inspiration to HRM tasks. Convolutional neural networks (CNN) [
18], which have made a significant breakthrough in feature learning, are a well-known deep learning method. Rather than handcrafted features being extracted, CNN can automatically learn more discriminative features for the current classification task. However, CNN cannot capture the dependency between features which are some distance apart and have semantic relevance. Therefore, we propose a hybrid CRNN model combining CNN with long short-term memory (LSTM) to extract the target features of given personnel performance prediction samples. To better capture global personnel performance information, we introduced self-attention to our CRNN model.
Particularly in the current circumstances, there is generally no prior knowledge about data distribution characteristics. The KNN classifier is the most suitable for processing these data. In addition, KNN can acquire more complicated decision information than the general softmax/sigmoid activation functions at the last layer of self-attention-based CRNN. Self-attention-based CRNN and KNN are completely complementary in terms of feature extraction and decision information. Consequently, we put forward a novel Convolutional Hybrid Recurrent Neural Networks with a Self-Attention Mechanism (called CHRNNA) to make full use of their advantages and be able to make up for their deficiencies.
Recent advances in machine learning suggest that, given sufficient labeled data, there should be an opportunity to construct better prediction models. However, there is no manual labeling of data publicly available. In this work, we created a labeled personnel performance dataset and exploited it using deep learning methods to build an accurate model for personnel performance prediction.
Overall, our contributions in this paper are as follows:
- (1)
Grounded in psychological theory and management theory of personnel performance, we constructed a high-quality dataset with performance prediction characteristics. This is key to ensuring data quantity and quality;
- (2)
To the best of our knowledge, this is first time deep learning has been applied to the field of personnel performance prediction, which fills the gap in this field;
- (3)
Instead of considering each attribute equally, a self-attention mechanism was used to automatically select the informative features of personnel performance. Our proposed CHRNNA framework can be viewed as a universal framework for personnel performance prediction, which has greatly improved classification performance.
The remainder of this paper is structured as follows: The related background is introduced in
Section 2; data are characterized in detail, and our proposed method is described thoroughly in
Section 3;
Section 4 evaluates the effectiveness of our method in a wide range of experiments and presents experimental results as well as analysis; finally, we present general conclusions from our work in
Section 5.
3. Methods
In this paper, the prediction problem is treated as a classification problem, where a record with a prediction value 1 (0) indicates a positive (negative) instance. Public datasets and a personnel performance prediction dataset were used to train and test the proposed framework. The overview of our framework is illustrated in
Figure 1. We propose a CHRNNA framework to achieve our goal. It contains the following steps.
- (1)
Two kinds of neural networks, CNNs and RNNs, where the latter refers to LSTM, are employed in personnel prediction.This hybrid model combining CNN with LSTM is called the CRNN model;
- (2)
The self-attention mechanism is introduced to a hybrid CRNN model, which aims to automatically capture the informative features;
- (3)
The learned features, extracted from the last layer of CRNN model based on self-attention, are directly fed into the KNN classifier as inputs.
Next, we thoroughly describe the data and the two stages contained in the hybrid CRNN model with a self-attention mechanism and classification.
3.1. Personnel Performance Prediction Data
3.1.1. Data Description
To further estimate the performance of our method, we used a collected real dataset for a high-technology industry to forecast personnel performance. The characteristics of this dataset are shown in
Table 1.
Next, the data collection and data preprocessing process are described in detail.
3.1.2. Data Collection
To be able to employ deep learning for modeling personnel performance, we required a dataset with labeled performance. Because there is no such human-labeled dataset publicly available, we should collect a personnel performance dataset with a performance-carrying label. The experimental dataset is collected from the Human Resource Department of a high-technology industry.
The first issue to consider is what attributes should be collected in this dataset. Salgado proposed the famous Big Five Model, which included five personality factors, i.e., conscientiousness, emotional stability, extraversion, openness, and agreeableness [
41]. Güngör et al. thought that determining the most eligible person was dependent on some factors, such as work experience, foreign language, basic computer skill, personal goal, long life learning, etc. [
42]. Li et al. pointed out that human and task characteristics are related to a person’s performance. We adopted an expert panel discussion and behavioral event interviews as the main method of collecting data. The panel of experts conducted a content analysis of the interview content to determine the competency characteristics exhibited by the respondents. Many aspects like “self-control”, “confidence”, “initiative”, “self-motivation”, and so on were also identified. After reaching an agreement through discussion, each evaluator was scored. That is, the determination of the attributes values was carried out by the expert group to analyze the content of the interviews. In order to make the collected data satisfy our demands, we created a personnel performance dataset including 5 categories and 22 attributes. These attributes were deemed to have an impact on personnel performance and are shown in detail in
Table 1.
The second issue to contemplate is how to describe a person’s performance in this dataset. Just like many decision problems, the personnel performance problem is too sophisticated in real life. Since human behaviors and characteristics are complicated, it is difficult to quantify a person’s performance. People usually forecast inaccurately for quantitative problems, while relatively having an accurate prediction for qualitative problems. Therefore, we used qualitative fuzzy levels to describe attributes.
The components of the dataset are shown in
Table 2. It contains 23 items, which includes 22 attributes and 1 personnel performance class. Each feature is represented by 5 fuzzy levels. For instance, a person’s memory capability has been divided into five fuzzy levels, with “very poor”, “poor”, “middle”, “good”, and “very good”. A person’s confidence has five fuzzy levels, which are “very unconfident”, “unconfident”, “medium”, “confident”, and “very confident”. Further, we used five fuzzy levels to describe experience, representing “completely inexperienced”, “inexperienced”, “middle”, “experienced”, and “well experienced”.
For different types of tasks, various means can be used to evaluate the personnel’s performance. Taking a high-technology industry as an example, the performance of personnel is determined by the completion of tasks and the meetings. To facilitate performance prediction by employing a classification algorithm, actual performance values correspond and are transformed to 2 grades, and we used the 2 integers of 0 and 1 to represent different grades of performance in the sample data. That is to say, real performance values are 0 and 1. If personnel performance is achieved, performance value equals to 1, otherwise it equals to 0.
3.1.3. Data Preprocessing
The raw data included some samples that were not applicable. Since these samples would reduce the ability to build a model, we needed to clean the data and remove all the duplicate data. Then, we manually carried out a random inspection of 300 instances from the dataset and found no duplicates. The initial dataset has 1151 samples. After discarding anomalous samples from the dataset, the number of applicable instances was reduced to 1139. The anomalous samples refer to duplicates and the data whose attributes value are all five. In this way, the dataset we attained can utilize deep learning. Thus, we now turn to describing the deep learning method we adopted.
3.2. First Stage: A Hybrid CRNN Model with Self-Attention
We call the hybrid model of CNN combined with LSTM as a CRNN model. Next, we describe the details of this hybrid CRNN model with a self-attention mechanism.
3.2.1. CNN for Feature Extraction
A critical problem for personnel performance prediction is feature representation, whereas traditional feature selection methods rely mainly on human-designed features. On one hand, hand-crafted feature extraction would be too time-consuming. On the other hand, hand-crafted feature extraction depends on human experience and requires designers to have strong professional background knowledge. Thus, the classification performance will be affected. Obviously, the disadvantages of this approach are apparent.
Recently, the rapid development of deep neural networks has brought new inspiration to feature extraction. In this section, we used CNNs to model, which we shall now introduce.
Our network uses two convolutional layers and a max-pooling layer, as shown in
Figure 1. The convolution layer is used to automatically capture features, and the max-pooling layer is utilized to automatically extract which features play key roles in personnel performance prediction. The first procedure in our model is to train the network by inputting the sample itself and its label. Convolution layers contain a series of feature maps, which are formed by sliding diverse kernels over an input sample. A max-pooling manipulation is employed to capture the most important feature by extracting the biggest value from a feature map. More details about models are described in
Section 4.
3.2.2. LSTM for Contextual Information
When people perform tasks, they need some basic qualities, such as perception, learning, creativity, memory, engagement, experience, health, etc. These information sequences are rich in content, and the information has a complex temporal correlation with each other. For example, for creativity, perception, learning, and confidence used in the design process, we need to tackle multidimensional input information at the same time, because it is constantly changing. Moreover, people do not start to carry out their tasks from scratch. As humans do computational design, they perform it based on the previous information. People do not abandon all known information and start to perform tasks from scratch again. However, CNN is unsuitable for modeling temporal information or dependency. LSTM has a memory function, which is extraordinary suitable for addressing this problem.
In this section, for our modeling, we used long short-term memory (LSTM) [
43], a variation of RNNs. For a mathematical notation, we denote scalars with an italic lower case (e.g.,
h), vectors with a bold lower case (e.g.,
h), and matrices with a bold upper case (e.g.,
U).
LSTM is a kind of neural network architecture that is especially adapted to model sequential information. As illustrated in
Figure 2, LSTM includes an input gate, a forget gate, and an output gate. LSTM networks tackle the problem of long-term dependencies of features via enhancing a memory cell at each time step
t. We used LSTM to model the contextual dependencies and semantic relevance from our datasets, as shown in
Figure 1.
LSTM takes an input vector
, a hidden vector
, and a memory cell state vector
and generates
and
through the following calculations:
where
,
,
, and
are the input gates, forget gates, output gates, and memory cell, respectively. The
represents a new memory cell vector with candidates which could be added to the state.
is logistic sigmoid function, and
is hyperbolic tangent function. ⊙ refers to the element-wise multiplication operation. The LSTM parameters
W,
U are weights, and
b is bias, where
,
, and
are for
.
3.2.3. Self-Attention
By combining convolutional layers, max-pooling layer, and recurrent neural networks, our model adopts the strength of both convolutional neural models and recurrent neural models. Moreover, we want to capture long-range dependencies from personnel performance information. Recently, a self-attention mechanism has led to new ideas for solving these problems.
We think that 5 categories and 22 attributes have impacts on personnel performance in this paper. However, different features have various importance. Therefore, we require a strategy to discriminate the importance of the 22 attributes. To forecast personnel performance, the self-attention mechanism has the ability to identify the significance of diverse attributes rather than considering each attribute on average. Thus, we used the self-attention mechanism for the hybrid CRNN model.
In this section, we thoroughly describe self-attention. After obtaining the contextual features of the input sample, the self-attention mechanism [
44] is used to learn the weight coefficient, which reflects the importance of each feature in the sample. Suppose that for each instance, we have N series of output from N LSTM cells, then the self-attention can be formulated as:
We compute the weight coefficient
of each
according to the following formulation:
where
is the representation of the sequence as the weighted sum of hidden representation,
is the normalized importance,
indicates the score about the degree of dependency between
, and
, Fscore is a function to compute the score about
and
.
calculated by function F-score is normalized by softmax via Equation (8). The output of this self-attention mechanism is a weighted sum.
Self-attention is seen as a separate layer, mixed with the CNN and LSTM model, which is able to more fully integrate their respective strengths.
3.2.4. Model Training
In recent years, the cross-entropy loss function has been widely used as a loss function in the model training of various tasks. Cross-entropy refers to the gap between the true probability distribution and the predicted probability distribution. In model training processes, we expect the probability that the model predicts the instance to be as similar as possible to the true probability. The formulation is defined as:
where
i denotes the sample,
N represents the total number of the samples,
y is the one-hot vector corresponding to the true category of the sample, and
P is the predicted probability. In this study, we used cross-entropy as a loss function. We expect the cross-entropy loss function to be minimized, because the lower the cross-entropy is, the closer the prediction distribution obtained by our model is to the true distribution.
3.3. Second Stage: Classification
The previous stage (neural network training) can be viewed as preprocessing because it can be executed independently before the classification stage.
At present, our society is besieged with large-scale data, and there is usually no prior knowledge of data distribution. The KNN classifier is very suitable for addressing these data. Further, KNN can get more complex decision information than the activation function of the last layer of our hybrid CRNN model. The idea of proposing our method is that these processes can complement each other perfectly and obtain the synergy of large datasets.
When it comes to classifying samples that need to be classified, our solution is described below:
- (1)
The raw instances are propagated via our proposed network, and their feature vectors are extracted from the last layer of the hybrid CRNN model with self-attention;
- (2)
The learned features mentioned above are fed into the KNN classifier as inputs;
- (3)
The distances of the samples are computed, and the nearest training samples belonging to the test samples are selected;
- (4)
The conventional KNN classification is carried out within these chosen data.
Next, we did extensive experiments on a personnel performance dataset to validate the goodness of our approach. More details about experiments are given in
Section 4.
5. Conclusions
High-technology industries depend on personnel performance to maintin their competitive advantages. In this paper, we presented a novel CHRNNA framework, which is used to forecast personnel performance in the future and help decision-makers to select the most adequate talents. We designed and collected a dataset with 22 attribute items, which were used to reflect personnel performance.
The proposed framework remarkably improves prediction performance. In the first stage, we employ a hybrid CRNN model with a self-attention mechanism to automatically capture the global informative features. In the second stage, we use a KNN-based classifier to forecast personnel future performance. Experimental results demonstrate that our proposed method yields a significant performance.
In our opinion, there is some room for further improvement with minimal accuracy losses, such as studying more advanced network architectures and designing a loss function, which will be the focus of our future work.