Article Open Access

Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection

Manosij Ghosh , Ritam Guha , Imran Alam , Priyank Lohariwal , Devesh Jalan and Ram Sarkar

Published/Copyright: September 14, 2019

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Intelligent Systems Volume 29 Issue 1

Abstract

Feature selection (FS) is a technique which helps to find the most optimal feature subset to develop an efficient pattern recognition model under consideration. The use of genetic algorithm (GA) and particle swarm optimization (PSO) in the field of FS is profound. In this paper, we propose an insightful way to perform FS by amassing information from the candidate solutions produced by GA and PSO. Our aim is to combine the exploitation ability of GA with the exploration capacity of PSO. We name this new model as binary genetic swarm optimization (BGSO). The proposed method initially lets GA and PSO to run independently. To extract sufficient information from the feature subsets obtained by those, BGSO combines their results by an algorithm called average weighted combination method to produce an intermediate solution. Thereafter, a local search called sequential one-point flipping is applied to refine the intermediate solution further in order to generate the final solution. BGSO is applied on 20 popular UCI datasets. The results were obtained by two classifiers, namely, k nearest neighbors (KNN) and multi-layer perceptron (MLP). The overall results and comparisons show that the proposed method outperforms the constituent algorithms in 16 and 14 datasets using KNN and MLP, respectively, whereas among the constituent algorithms, GA is able to achieve the best classification accuracy for 2 and 7 datasets and PSO achieves best accuracy for 2 and 4 datasets, respectively, for the same set of classifiers. This proves the applicability and usefulness of the method in the domain of FS.

Keywords: Feature selection; binary genetic swarm optimization; genetic algorithm; particle swarm optimization; UCI dataset; average weighted combination; sequential one-point flipping

MSC classification (according to 2010 database): 68T10

1 Introduction

Every object in real life has certain features, the unique entities which define its characteristics. For identifying the patterns distinctively, researchers have been relying on various feature extraction techniques. Many such features are heuristically chosen based on domain understanding and/or inherent properties of the object such as statistical, morphological and so on. However, the extracted features are not always capable of predicting the pattern classes with absolute accuracy. There might be cases where two features being highly correlated or similar to each other, so inclusion of both the features in the learning model may lead to redundancy. There is also the case where features being uncorrelated to the pattern class to be predicted, i.e. the features are not useful enough to represent the pattern classes properly. To combat these issues, researchers since decades have been working on various methods to select an optimal as well as a useful set of features which perform well in any given classification scenario. However, the balance of exploitation and exploration is hard to achieve. As a result, it becomes difficult to find out the optimal feature subset for a problem so as to maximize the objective function (assuming that a higher value of the objective function is desired). Our proposed algorithm tries to achieve a good trade-off in this regard in order to choose an optimal feature subset.

Basically, the key purpose of feature selection (FS) methods is to maximize the classification ability of a learning model by selecting a near-optimal feature subset. Therefore, to choose an effective subset of features, we need a robust algorithm which correctly identifies the subset of features required to classify the patterns under consideration. It is to be noted that the objective of FS is not to find out an individual feature that correlates to the classification problem, but rather it is the combination of different features which when taken together represent the pattern profoundly. FS makes classification problems computationally efficient by reducing the classification cost of patterns with a large feature dimension. This implies that data storage and computation resources required for the training phase can be reduced.

Different searching algorithms are employed to find out the optimal subset of features. Blind search (BS) [11] iterates through each and every combination of subsets to reach the optimal one, but this trivial searching approach has exponential time complexity which is not feasible when the feature dimension is large. As an improvement over BS, researchers have invented heuristic searching algorithms [22], [41] which introduce various directed searches based on domain knowledge. These solutions interact locally with each other to reach a near-optimal solution within a reasonable time. This searching process reduces the time requirement substantially. Problem-specific properties [36] and greedy approaches [29] are often used for subset selection. Meta-heuristic algorithms [13], [16], [44] are used to overcome heuristic algorithms’ inability to circumvent local optima and are applicable to a wide range of problems. They are also problem independent in nature and have the ability to explore the search space more thoroughly which makes these algorithms robust.

FS methods are broadly classified into three categories, namely, filter, wrapper and embedded methods. Filter methods use characteristics of the features to assign a score to each feature. The classification ability of the features is then evaluated based on the score values. In this method, no learning algorithm is involved, which is why the cost of computation is tolerable. Some of the well-known filter methods are the chi-squared test [40], information gain [31], Fisher score [23] and so on. Wrapper methods, on the other hand, consult a learning algorithm to proceed toward an optimal solution. Although computationally expensive, wrapper methods tend to give better results than filter methods more often due to the administration of the learning algorithm. Genetic algorithm (GA) [44], particle swarm optimization (PSO) [16] and gravitation search algorithm [37] are examples of some popular wrapper methods. Researchers have incorporated the advantages of wrapper and filter methods in a single technique which is known as the embedded method [15], [20], [45]. These methods incorporate both filter-based mechanism and supervision of a learning algorithm to test the fitness measure of the solutions. In this paper, we use a new FS method named binary genetic swarm optimization (BGSO), which combines two wrapper-based evolutionary algorithms, namely, GA and PSO.

The remaining paper is organized as follows. Section 2 gives a brief review of the works accomplished with the help of GA and PSO, which are the two ancestors of our proposed FS model. Section 3 provides an introduction of GA and PSO followed by a detailed explanation and analysis of BGSO. Section 4 contains the results obtained by BGSO over some well-known UCI datasets, and its comparison with GA, PSO and histogram-based multi-objective GA (HMOGA) along with the parameter setting we used for our experimentations. Some conclusions drawn from the experimentations and scope of future work regarding BGSO are reported in Section 5.

In this paper our aim is to improve the exploitation capacity of GA and the exploration ability of PSO. A new algorithm is proposed to combine the results (feature subsets in the population) of the two algorithms, namely, GA and PSO, in order to produce a near-optimal feature subset. The algorithm is tested on 20 popular UCI datasets. The results are generated using two classifiers, namely, k nearest neighbors (KNN) and multi-layer perceptron (MLP).

2 Related Work

GA has been used for FS as early as 1997 in [44] where selected feature subsets were evaluated using the neural network as the classifier. GA has been used for FS in a variety of domains like spectral datasets [32], and Colon and Yeast datasets [17], as well as to optimize the kernel parameters of support vector machine (SVM). GA in [17] has utilized the natural phenomenon of death on old age, war and disease to shrink an exploding population. A hybrid model of GA and PSO has been proposed where both GA and PSO run in parallel, and after a specified number of iterations, the subsets in populations of GA and PSO are interchanged. A hybrid of GA and ant colony optimization (ACO) has been used for load prediction in [38] using the MLP classifier. At first, the initial population is enhanced using the genetic operations of GA and the best feature subset is passed for further refinement by ACO. The use of GA to find biomarkers in microarray data [19] is also quite prevalent. A modified version of GA named HMOGA has been proposed in [18]. HMOGA basically divides the entire dataset into a number of smaller datasets. These datasets are then fed to the classifier individually. After this process, the outcomes of the different parts are combined by drawing a histogram and using a cutoff to get to the final solution. In [24], Guha et al. proposed a hybrid GA called Deluge-based GA (DGA) which uses Great Deluge algorithm in place of mutation in order to achieve significant perturbation of the system of solutions. They tested the proposed algorithm over UCI datasets which shows the superiority of DGA over some well-established contemporary metaheuristic algorithms. As an improvement over HMOGA, Guha et al. proposed a memory-oriented HMOGA named M-HMOGA in [25] which uses a memory and stores best population of GA across multiple generations. Abualigah and Hanandeh applied adaptive GA to perform information retrieval using the vector space model in [2].

PSO has been heavily used for the purpose of FS. Binary PSO or discrete PSO was first proposed for FS in [30] where the position values were converted into probability for the inclusion of features using a sigmoid function. In some cases, both continuous and discrete PSO were used together to optimize the parameters of SVM and features, respectively [28]. A distributed system was adopted in which the server performed PSO calculations, and SVM training and testing were done on the client. A different approach could be found in [8] where PSO was modified to form geometric PSO where new agents were generated using crossover operation on the current agents, the local best and global best, and then mutating the agent formed after crossover. The algorithm was used for gene selection. Another minor modification of PSO was the use of a rough set to perform FS [42]. A hybrid algorithm encapsulating PSO with genetic operators was proposed by Abualigah and Khader in [3] to perform text clustering. The proposed hybrid model improved the performance of the k-means clustering algorithm by selecting a better set of informative features. Another approach for text document clustering was proposed in [5] which used adaptive PSO to find a more informative subset of features and also reduced the time requirement to some extent.

ACO was adopted for FS for text classification in [7]. A graph was made with nodes representing features, and instead of assigning pheromones to links, the nodes were assigned pheromone deposits. Each node had a pheromone deposit and a heuristic desirability which determined if the node was selected or not. A hybrid of GA and ACO was proposed in [34] where the two algorithms ran in parallel, and in each iteration, the best result of the two was taken. The hybrid algorithm performed FS for protein function prediction. In [9], a very similar approach was adopted, but here the application domain was text classification. The work reported in [35] compared the usage of ACO, GA and PSO using the SVM classifier on siRNA data. An important observation there was that both GA and PSO had performed better than ACO. In [21], Ghosh et al. proposed an embedded ACO named wrapper filter ACOFS which uses a filter method to evaluate the feature subsets to reduce the time requirement of the overall model. The authors also used a memory to store the best results throughout all the generations. An innovative text clustering method based on Krill herd (KH) was proposed by Abualigah et al. in [6]. The authors introduced a hybrid improved KH algorithm called MMKHA which was applied on eight text datasets. Another approach for text document clustering was proposed in [4] which combined objective functions and hybrid KH algorithm. Abualigah proposed the enhanced KH algorithm for text document clustering in [1].

3 Present Work

The proposed model BGSO is a metaheuristic which considers only the good sides of both GA and PSO in order to overcome the limitations of the individual algorithms. It is to be noted that GA lacks in terms of exploitation ability because the only source of exploitation in GA is mutation which performs very small perturbation of the chromosomes. But, on the other hand, GA can achieve notable exploration of the search space through crossover operation. When PSO is considered, it can be noticed that PSO has good local search capabilities which enhance its exploitation ability but it is unable to achieve suitable exploration. PSO frequently gets stuck in local optima [43], which hampers its exploration abilities. These complementary exploitation-exploration trade-offs of GA and PSO motivate us to combine their results so that an optimal and useful outcome can be achieved. In this section, the constituent algorithms, GA and PSO, as well as their combination BSGO are explained in detail.

3.1 GA

GA is a popular evolutionary algorithm computational method developed by Holland in early 1975 [27] and later enhanced by Goldberg [26]. It is a global search technique that solves a given problem by mimicking the natural process of evolution. Based on Darwin’s theory, GA utilizes the concept of reproduction and survival of the fittest. GA exploits new and better solutions without any presumption such as continuity or unimodality. As a process, GA has large potential, and due to this, over the years GA has been used for designing, optimizing telecommunication, traffic and shipment routing, gaming, market and financial analysis and many more [10], [12], [33]. The increase in its use in different sectors is because of the fact that GA can handle a large number of parameters, and it comes with a solution which is satisfying enough though may not be the best.

GA consists of a set of solutions, chromosomes or individuals which are strings of binary values, “0”s and “1”s. Each value (“0” or “1”) determines the state of attributes in the chromosome. A set of such chromosomes is referred to as a population. Each chromosome is then evaluated using a fitness function. After ranking the chromosomes according to their fitness values, they undergo genetic operations such as crossover and mutation. For this, two chromosomes are selected on the basis of their positions on a roulette wheel (biased according to each chromosome’s fitness). The two chromosomes first go through crossover and then mutation is applied to increase the local coverage of search space by the chromosomes, thereby decreasing the chances of being stuck at a local optimum. If the evolution process generates stronger offspring chromosomes than the previous ones, the algorithm replaces them. The evolution process repeats until it meets the end criteria.

The main steps of GA (as proposed by Holland in 1975) along with the framework are as follows:

Representation of structure data in genetic space, with different combinations to form a candidate solution.
Initialization of randomly generated individuals who constitute the first generation.
Evaluation of each individual to determine the fitness value.
Selection of good chromosomes for breeding purposes.
Crossover and mutation to produce the offspring set.
Evaluation of the new individuals to pass them to the next generation.
Termination if end criteria are met, else back to Step (3).

End criteria for termination can be either of the following:

Highest possible accuracy for the model is reached.
The accuracy results for consecutive generations remaining unchanged.
Prefixed maximum number of generations (or value of time) set is reached.

3.2 PSO

It is a population-based stochastic optimization technique inspired by the social behavior of bird flocking. PSO was proposed by Eberhart and Kennedy in 1995 [16]. It is a metaheuristic as it can explore over a search space making no or few previous assumptions about the given problem and converges to an optimal solution. The candidate solutions, referred to as particles in the technique, fly around in a multi-dimensional search space, to find out an optimal or sub-optimal solution by competition as well as by cooperation among them [39]. Like GA, PSO is also initialized with a group of random particles and then it looks for optima through the movement of candidate solutions in the search space. Each particle is represented by a vector xi=(xi1,xi2,….,xiD), where D represents the number of features in the dataset. Each particle hence has a D-dimensional velocity represented as vi=(vi1,vi2,…,viD). In every iteration, each particle is updated with three values: (1) previous velocity, which gives the trend of flow of the particles over the search space; (2) pbest, which gives the particles’ best fitness values till the present iteration and (3) gbest, which gives the whole generation’s best fitness value till the present iteration. The position and velocity of the particles are updated using the following equations:

(1) vidk+1=w∗vidk+c1∗r1∗(pid−xidk)+c2∗r2∗(pgd−xidt)

(2) xidk+1=xidk+vidk+1

Here k represents the kth iteration and d represents the dth feature in the vector. w represents the inertia factor which assigns a weight to the impact of previous velocity. c1 and c2 are acceleration constants. r1 and r2 are random numbers in the range [0, 1]. p_gd and g_id denote the state of dth feature in pbest and gbest.

3.3 BGSO: Combination of GA and PSO

GA and PSO belong to two different categories of FS algorithms, namely, evolutionary algorithm and swarm intelligence, respectively. GA is very useful in passing down useful features from one generation to the next. PSO has the advantage of a thorough search of the search space using particles which relate feature information to one another. These advantages of PSO and GA have been combined to form BGSO which has balanced exploitation and exploration abilities. At the first step of our proposed method, GA and PSO are employed to run separately to produce their final set of population. Then the combination of the population is done by evaluating the importance of all features belonging to any of the two sets of population. This process of combination is done by a method called the average weighted combination method (AWCM). A new feature subset is created based on the mean of the importance of all features (AWCM cutoff). A local search, called sequential one-point flipping (SOPF), is applied thereafter to further enhance the subset’s discriminative abilities. Here lie the main characteristics of the present work.

In AWCM, at first, the sum of accuracy of all the particles (in PSO) or chromosomes (in GA) is calculated. Let us consider a feature that is selected by a chromosome of GA giving an accuracy of 85% and also selected in two particles of PSO having an accuracy of 90% and 80% each. The importance of the feature is calculated as 2.25(0.85 + 0.80 + 0.90 = 2.25). The AWCM cutoff is set as the mean of the importance of all the features selected by either GA or PSO. The features having the importance value higher than the AWCM cutoff are included in the new subset. For easy understanding, the steps of AWCM with a simplified example showing the calculation of importance of a feature are shown in Figure 1. “F” represents the normal feature state (either a 1 or 0) and “WF” represents the weighted feature state (accuracy value of the candidate selecting the feature). The size of the population is taken as n. After using AWCM over 2n feature subsets produced by GA and PSO, we get a single feature subset which includes relatively important features.

Figure 1:

A hypothetical example to illustrate the concept of AWCM in order to measure the importance of a feature to be considered in the final and optimal population.

Another significant contribution of the present work is the inclusion of the local search – SOPF in the proposed FS model. This is a superficial non-greedy approach which helps to improve the final result. SOPF sequentially goes through each feature state of a candidate solution. By flipping the state of each feature successively, it checks the effect of considering the neighboring features on the feature under consideration. After flipping a feature, the algorithm accepts the intermediate solution only when it achieves higher accuracy than the current solution. In this way, SOPF confirms the acceptance of similar or better solution but never worse solution than the current one. The algorithm of SOPF is as follows.

Sequential one-point flipping algorithm:

Input : s_old, n

Output : s_new

s_inter : intermediate solution generated from combination

number n : of features in s

Start

s_inter = s_old

for i =1 to n {

s_temp = flip value of s_interi in s_inter

if(accuracy(s_temp) > accuracy(s_inter) {

s_inter = s_temp

}

}

s_new = s_inter

End

To summarize the steps of BGSO, a flowchart is displayed in Figure 2. It is to be noted that the detailing of GA and PSO is not included in the flowchart as it has already been mentioned earlier in this section.

Figure 2:

Flowchart of the proposed FS model called BGSO.

4 Results and Analysis

This section focuses on measuring the strength of the proposed FS model by applying it to various datasets. The performance of the proposed model is tabulated against the performance of GA, PSO and HMOGA. The related information of the experimentations is provided in this section.

4.1 Dataset Description

For evaluation of BGSO, we selected 20 well-known datasets from the UCI repository [14]. The datasets vary in terms of dimensions, number of classes and domain. Chosen datasets can be classified into three categories based on their size: small (number of features <10), medium (10 ≤ number of features ≤ 100) and large (number of features >100). To test for all variations, we used 5 small, 11 medium and 4 large datasets. The names of the datasets under these tags (small, medium and large) are shown in Table 1. The details of the said datasets are represented in Table 2. It contains number of features, number of instances and number of classes of all the datasets we have used.

Table 1:

Category-wise Names of the Datasets.

Dataset
Small	Medium	Large
BreastCancer	Horse	Arrhythmia
Monk1	Ionosphere	Hill-valley
Monk2	Sonar	Madelon
Monk3	Soybean-small	PenglungEW
Tic-tac-toe	Wine
	Zoo
	Vowel
	Glass
	BreastEW
	CongressEW
	Exactly

Table 2:

Description of 20 UCI Datasets Used for Evaluation of the Proposed FS Method.

Datasets	No. of features	No. of instances	No. of classes
Arrhythmia	279	452	16
BreastCancer	9	699	2
Glass	10	214	7
Hill-valley	101	606	2
Horse	27	368	2
Ionosphere	34	351	2
Madelon	500	4400	2
Monk1	6	124	2
Monk2	6	124	2
Monk3	6	124	2
Sonar	60	208	2
Soybean-small	35	47	4
Vowel	10	528	11
Wine	13	178	3
Zoo	16	101	7
BreastEW	569	30	2
CongressEW	435	16	2
Exactly	1000	13	2
Tic-tac-toe	958	9	2
PenglungEW	73	325	7

4.2 Parameter Values

The proposed FS model mainly uses two parameters: population size and number of iterations. To find the optimal parameter values, we first evaluated BGSO using different combinations of parameters. Changes of performance by varying the population size for BreastCancer (small), BreastEW (medium) and Hill-valley (large) are shown in Figure 3 where we have varied the size as 5, 10, 15, 20 and 25. Similar experimentations are performed to find out the optimal number of iterations. After performing several experiments, we set the values to the parameters as follows:

Population size: 20No. of iterations: 20

Figure 3:

Changes in classification accuracy of BGSO for different population sizes over the BreastCancer, BreastEW and Hill-valley datasets.

For the rest of the experimentations, we used this optimal set of parameter values.

4.3 Classifiers Used

As the proposed model is a wrapper approach, it needs to consult with a learning algorithm (classifier) to evaluate the generated candidate solutions. To establish the environment-independent nature of BGSO, we used two classifiers of varying complexity, namely, KNN and MLP. KNN is a simple classifier which uses voting of k number of nearest neighbors to properly classify a point in search space. On the other hand, MLP is a more complex classifier which adjusts network weights using the backpropagation algorithm for training. For a uniform comparison of BGSO, we also evaluated the other methods using both of these classifiers and compared their results with BGSO.

4.4 Analysis of Outcomes

To ensure the effectiveness of the proposed method, individual results of both GA and PSO are recorded separately and are compared with the final results of the proposed model. A recently proposed meta-heuristic approach named HMOGA [18] is also used for the comparison. Tables 3 and 4 display the results obtained using KNN and MLP, respectively. The best results for each dataset are made bold.

Table 3:

Comparison of BGSO with Its Constituent Algorithms (GA and PSO) and HMOGA Using KNN Classifier.

Dataset	GA		PSO		HMOGA		BGSO		Rank
	No. of features	Accuracy (%)	No. of features	Accuracy (%)	No. of features	Accuracy (%)	No. of features	Accuracy (%)
Arrhythmia	210	57.89	180	58.4	195	56.58	158	61.84	1
BreastCancer	6	98.79	6	98.79	3	96.32	7	99	1
Glass	7	85.71	6	77.14	6	80.12	7	88.57	1
Hill-valley	73	54.76	57	54.76	54	51.5	59	55.68	1
Horse	22	97.06	17	97.05	14	97.05	13	100	1
Ionosphere	25	93.37	24	92.05	17	93.38	12	96.03	1
Madelon	375	57.33	277	54.17	240	60.33	309	59.67	2
Monk1	3	88.89	3	88.89	3	83.23	3	88.89	1
Monk2	6	74.77	6	74.77	2	55.09	6	74.77	1
Monk3	2	97.22	3	97.22	3	97.12	2	97.22	1
Sonar	48	56.72	34	58.21	35	68	27	79.1	1
Soybean-small	24	100	17	100	19	85.71	18	100	1
Vowel	9	89.61	9	88.74	8	87.85	7	88.53	3
Wine	9	97.87	8	100	5	70.21	6	100	1
Zoo	13	82.93	10	85.37	11	84	5	82.93	3
BreastEW	21	74.12	15	90.59	17	94.80	9	95.29	1
CongressEW	10	92.31	8	90.00	8	97.00	6	97.69	1
Exactly	9	91.50	6	69.50	7	72.00	9	89.00	2
Tic-tac-toe	7	82.77	5	73.63	6	78.00	7	82.77	1
PenglungEW	271	86.21	174	82.76	209	86.00	208	89.66	1

Highest classification accuracy for each dataset is in bold.

Table 4:

Comparison of the Performance BGSO with Its Constituent Algorithms (GA and PSO) and HMOGA Using MLP Classifier.

Dataset	GA		PSO		HMOGA		BGSO		Rank
	No. of features	Accuracy (%)	No. of features	Accuracy (%)	No. of features	Accuracy (%)	No. of features	Accuracy (%)
Arrhythmia	215	67.76	166	65.78	202	66.95	167	68.42	1
BreastCancer	7	98.32	6	98.32	6	98.32	5	98.66	1
Glass	8	84.28	7	82.86	7	81.88	5	84.28	1
Hill-valley	71	54.22	73	55.31	75	53.22	55	56.04	1
Horse	21	100	18	100	19	100	13	100	1
Ionosphere	25	97.35	20	96.12	26	96.56	19	97.35	1
Madelon	400	60.17	278	57.83	271	59.8	251	60.5	1
Monk1	4	92.59	3	97.22	4	94.54	3	100	1
Monk2	6	81.94	6	74.31	5	69.21	2	67.13	4
Monk3	3	100	3	97.22	3	97	3	97.22	2
Sonar	48	76.11	34	80.59	43	77.12	34	79.1	2
Soybean-small	25	100	14	100	19	100	18	100	1
Vowel	9	91.77	8	89.83	8	87.25	7	88.74	3
Wine	11	100	8	100	9	99.98	9	100	1
Zoo	12	85.37	8	85.37	11	81.98	6	82.93	3
BreastEW	17	74.71	13	92.35	15	93.33	13	95.29	1
CongressEW	10	89.23	10	94.62	8	96.30	7	97.69	1
Exactly	9	91.50	10	69.25	11	88.25	9	90.75	2
Tic-tac-toe	7	80.68	6	74.41	6	75.00	6	82.25	1
PenglungEW	246	86.21	200	82.76	183	83.25	177	86.21	1

Highest classification accuracy for each dataset is in bold.

AWCM combines the outcomes of GA and PSO to produce a vector containing importance of the features. Using AWCM cutoff relatively more important features are selected and the rest of them are eliminated. This allows BGSO to lower the number of features by an impressive margin. Tables 3 and 4 represent a thorough comparison among the results obtained by GA, PSO and HMOGA algorithms, and the proposed model. It can be easily deduced from the results in Table 3 that the proposed model generally decreases the number of features required for classification and increases the accuracy of the classification model. Of 20 datasets, for multiple (3 for KNN and 4 for MLP) datasets, the proposed method provides 100% accuracy, which signifies that the most discriminatory feature subset has been selected by the proposed FS method. However, from the results it is to be observed that the proposed method has the ability to decrease the number of features required for classification in all cases. We get accuracy of more than 80% for more than half of the datasets. Even for the datasets having a large number of attributes like Arrhythmia, Madelon and PenglungEW, the proposed model shows its ability to increase the classification accuracy considerably. Although for some of the datasets the proposed method could not produce an enhanced accuracy, it decreases the number of required features used for classification. Hence, it can be considered as a better FS model than its ancestors. We can see that BGSO outperforms its constituents and HMOGA for 16 of 20 datasets when KNN is used as a learning algorithm. On the other hand, Table 4 shows that BGSO obtained best results for 14 of 20 datasets when MLP was used for classification. The dominance of the proposed model in most of the datasets for both the classifiers concludes that our model is classifier independent in nature.

In order to prove the robustness of the proposed model, we plot the convergence graphs for BGSO, GA and PSO over the iterations in Figure 4. We select one dataset from each category, i.e. small, medium and large, to observe the convergence of the three algorithms. Figure 4A-C represents the convergence graph for the BreastCancer (small), BreastEW (medium) and Hill-valley (large) datasets, respectively. From the graphs, we notice that starting from the same point, BGSO has been able to achieve higher accuracy than its constituents in almost every iteration, which proves the stability of the proposed model over iterations.

Figure 4:

Convergence graphs for datasets from different categories (size-wise).

Convergence graphs representing the changes of classification accuracies over iterations for BGSO and its constituent algorithms (GA and PSO) for the BreastCancer (A), BreastEW (B) and Hill-valley (C) datasets. The three datasets used in (A), (B) and (C) belong to three different categories namely small (BreastCancer), medium (BreastEW) and large (Hill-valley) to show the differences in convergence in terms of size of the dataset.

5 Conclusion

FS has a lot of applications in various real-world scenarios. This makes it a very interesting and impactful research domain. The use of GA and PSO in the domain of FS is widespread. Literature reveals that there have been multiple numbers of combinations of these two methods proposed by various researchers. However, most of those proposed models tried to build a hybrid by running GA and PSO in parallel or one after another. As an alternative, in the present work, our proposed model BGSO combines the results of the two algorithms in a simple way. The combination is done by assigning an importance to each feature and taking only the features above the mean of the importance of the features. This is followed by a local search which allows for better exploitation of the search space. The proposed FS model is applied on 20 UCI datasets. The datasets are selected in such a way that they have a varying number of features, classes and samples. Two different classifiers, KNN and MLP, are used as the learning algorithm. For KNN, BGSO performs better in 16 datasets, while for MLP it is 14 of 20 datasets. In future, this concept of combining the results of algorithms can be used for other algorithms. Additionally, this algorithm can be applied to other real-life pattern recognition problems like handwritten word or digit recognition.

Bibliography

[1] L. M. Q. Abualigah, Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering, in: Studies in Computational Intelligence, vol. 816, Springer, Cham, 2019.10.1007/978-3-030-10674-4Search in Google Scholar

[2] L. M. Q. Abualigah and E. S. Hanandeh, Applying genetic algorithms to information retrieval using vector space model, Int. J. Comput. Sci. Eng. Appl. 5 (2015), 19.10.5121/ijcsea.2015.5102Search in Google Scholar

[3] L. M. Abualigah and A. T. Khader, Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering, J. Supercomput. 73 (2017), 4773–4795.10.1007/s11227-017-2046-2Search in Google Scholar

[4] L. M. Abualigah, A. T. Khader and E. S. Hanandeh, A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis, Eng. Appl. Artif. Intell. 73 (2018), 111–125.10.1016/j.engappai.2018.05.003Search in Google Scholar

[5] L. M. Abualigah, A. T. Khader and E. S. Hanandeh, A new feature selection method to improve the document clustering using particle swarm optimization algorithm, J. Comput. Sci. 25 (2018), 456–466.10.1016/j.jocs.2017.07.018Search in Google Scholar

[6] L. M. Abualigah, A. T. Khader and E. S. Hanandeh, Hybrid clustering analysis using improved krill herd algorithm, Appl. Intell. 48 (2018), 4047–4071.10.1007/s10489-018-1190-6Search in Google Scholar

[7] M. H. Aghdam, N. Ghasem-Aghaee and M. E. Basiri, Text feature selection using ant colony optimization, Expert Syst. Appl. 36 (2009), 6843–6853.10.1016/j.eswa.2008.08.022Search in Google Scholar

[8] E. Alba, J. Garcia-Nieto, L. Jourdan and E. Talbi, Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms, in: 2007 IEEE Congress on Evolutionary Computation, Singapore, pp. 284–290, 2007.10.1109/CEC.2007.4424483Search in Google Scholar

[9] M. E. Basiri and S. Nemati, A novel hybrid ACO-GA algorithm for text feature selection, in: 2009 IEEE Congress on Evolutionary Computation, Trondheim, pp. 2561–2568, 2009.10.1109/CEC.2009.4983263Search in Google Scholar

[10] H. Ceylan and M. G. H. Bell, Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing,Transport. Res. 38 (2004), 329–342.10.1016/S0191-2615(03)00015-8Search in Google Scholar

[11] J. Culberson, On the futility of blind search, in: Technical Report 96-19, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, July 1996.Search in Google Scholar

[12] B. Dengiz, F. Altiparmak and A. E. Smith, Local search genetic algorithm for optimal design of reliable networks, IEEE Trans. Evol. Comput. 1 (1997), 179–188.10.1109/4235.661548Search in Google Scholar

[13] M. Dorigo and M. Birattari, Ant Colony Optimization, in: C. Sammut and G. I. Webb, eds., Encyclopedia of Machine Learning, Springer, Boston, MA, 2011.10.1007/978-0-387-30164-8_22Search in Google Scholar

[14] D. Dua and C. Graff, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, 2019. http://archive.ics.uci.edu/ml (accessed January 7, 2019).Search in Google Scholar

[15] B. Duval, J.-K. Hao and J. C. Hernandez Hernandez, A memetic algorithm for gene selection and molecular classification of cancer, in: Proc. 11th Annu. Conf. Genet. Evol. Comput. – GECCO ’09,201, 2009.10.1145/1569901.1569930Search in Google Scholar

[16] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory, in: Micro Mach. Hum. Sci. Proc. Sixth Int. Symp., IEEE, pp. 39–43, 1995.Search in Google Scholar

[17] H. Frohlich, O. Chapelle and B. Scholkopf, Feature selection for support vector machines by means of genetic algorithm, in: Proc 15th IEEE Int. Conf. Tools Artif. Intell., pp. 142–148, 2016.Search in Google Scholar

[18] M. Ghosh, R. Guha, R. Mondal, P. K. Singh and R. Sarkar, Feature Selection Using Histogram-Based Multi-objective GA for Handwritten Devanagari Numeral Recognition, in: Intelligent Engineering Informatics. Advances in Intelligent Systems and Computing, vol. 695, Springer, Singapore, 471–479, 2018.10.1007/978-981-10-7566-7_46Search in Google Scholar

[19] M. Ghosh, S. Adhikary, K. K. Ghosh, A. Sardar, S. Begum and R. Sarkar, Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods, Med. Biol. Eng. Comput. 57 (2019), 159–176.10.1007/s11517-018-1874-4Search in Google Scholar PubMed

[20] M. Ghosh, S. Begum, R. Sarkar, D. Chakraborty and U. Maulik, Recursive memetic algorithm for gene selection in microarray data, Expert Syst. Appl. 116 (2019), 172–185.10.1016/j.eswa.2018.06.057Search in Google Scholar

[21] M. Ghosh, R. Guha, R. Sarkar and A. Abraham, A wrapper-filter feature selection technique based on ant colony optimization, Neural Comput. Appl. (2019), 1–19 [Online 11 April 2019].10.1007/s00521-019-04171-3Search in Google Scholar

[22] F. Glover and M. Laguna, Tabu search, in: Handbook of Combinatorial Optimization, Springer, Boston, MA, 1998.10.1007/978-1-4615-6089-0Search in Google Scholar

[23] Q. Gu, Z. Li and J. Han, Generalized Fisher score for feature selection: a brief review of Fisher score, Ratio, p. 19, Citado na, 2010.Search in Google Scholar

[24] R. Guha, M. Ghosh, S. Kapri, S. Shaw, S. Mutsuddi, V. Bhateja and R. Sarkar, Deluge based genetic algorithm for feature selection, Evol. Intell. (2019), 1–11 [Online 7 March 2019].10.1007/s12065-019-00218-5Search in Google Scholar

[25] R. Guha, M. Ghosh, P. K. Singh, R. Sarkar and M. Nasipuri, M-HMOGA: a new multi-objective feature selection algorithm for handwritten numeral classification, J. Intell. Syst. 29 (2020), 1453–1467.10.1515/jisys-2019-0064Search in Google Scholar

[26] G. R. Harik, F. G. Lobo and D. E. Goldberg, IEEE Trans. Evol. Comput. 3 (1999), 287–297.10.1109/4235.797971Search in Google Scholar

[27] J. H. Holland, Genetic algorithms, Sci. Am. 1 (1992), 66–73.10.1038/scientificamerican0792-66Search in Google Scholar

[28] C. Huang and J. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization, Appl. Soft Comput. 8 (2008), 1381–1391.10.1016/j.asoc.2007.10.007Search in Google Scholar

[29] A. L. Kazakovtsev, A. N. Antamoshkin and V. V. Fedosov, Greedy heuristic algorithm for solving series of eee components classification problem, in: IOP Conf. Ser. Mater. Sci. Eng., 2016.10.1088/1757-899X/122/1/012011Search in Google Scholar

[30] J. Kennedy and R. C. Eberhart, A discrete binary version of the particle swarm algorithm, in: 1997 IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul., IEEE, pp. 4104–4108, 1997.Search in Google Scholar

[31] J. T. Kent, Information gain and a general measure of correlation, Biometrika. 70 (1983), 163–173.10.1093/biomet/70.1.163Search in Google Scholar

[32] R. Leardi, Application of genetic algorithm – PLS for feature selection in spectral data sets, J. Chemometr. 14 (2000), 643–655.10.1002/1099-128X(200009/12)14:5/6<643::AID-CEM621>3.0.CO;2-ESearch in Google Scholar

[33] C. Miles, S. J. Louis, N. Cole and J. McDonnell, Learning to play like a human: case injected genetic algorithms for strategic computer gaming, in: Proc. 2004 Congr. Evol. Comput. (IEEE Cat. No. 04TH8753), vol. 2, pp. 1441–1448, IEEE, Portland, OR, USA, 2004.10.1109/CEC.2004.1331066Search in Google Scholar

[34] S. Nemati, M. Ehsan, N. Ghasem-aghaee and M. Hosseinzadeh, Expert systems with applications A novel ACO–GA hybrid algorithm for feature selection in protein function prediction, Expert Syst. Appl. 36 (2009), 12086–12094.10.1016/j.eswa.2009.04.023Search in Google Scholar

[35] Y. Prasad, K. K. Biswas and C. K. Jain, SVM classifier based feature selection using GA, ACO and PSO for siRNA design, in: Advances in Swarm Intelligence, pp. 307–314, Springer, Berlin, 2010.10.1007/978-3-642-13498-2_40Search in Google Scholar

[36] Problem-specific knowledge in heuristics. 2016. http://antor.uantwerpen.be/problem-specific-knowledge-in-heuristics/ (accessed January 7, 2019).Search in Google Scholar

[37] E. Rashedi, H. Nezamabadi-Pour and S. Saryazdi, GSA: a gravitational search algorithm, Inf. Sci. (NY). 179 (2009), 2232–2248.10.1016/j.ins.2009.03.004Search in Google Scholar

[38] M. Sheikhan and N. Mohammadi, Neural-based electricity load forecasting using hybrid of GA and ACO for feature selection, Neural. Comput. Appl. 21 (2012), 1961–1970.10.1007/s00521-011-0599-1Search in Google Scholar

[39] J. Sun, B. Feng and W. Xu, Particle swarm optimization with particles having quantum behavior, in: Proc. 2004 Congr. Evol. Comput. (IEEE Cat. No. 04TH8753), pp. 325–331, IEEE, Portland, OR, USA, 2004.10.1109/CEC.2004.1330875Search in Google Scholar

[40] R. J. Tallarida and R. B. Murray, Chi-square test, in: Man. Pharmacol. Calc., pp. 140–142, Springer, New York, NY, 1987.10.1007/978-1-4612-4974-0_43Search in Google Scholar

[41] P. J. Van Laarhoven and E. H. Aarts, Simulated annealing, in: Simulated Annealing: Theory and Applications, 7–15, Springer, Dordrecht, 1987.10.1007/978-94-015-7744-1_2Search in Google Scholar

[42] X. Wang, J. Yang, X. Teng, W. Xia and R. Jensen, Feature selection based on rough sets and particle swarm optimization, Pattern Recognit. Lett. 28 (2007), 459–471.10.1016/j.patrec.2006.09.003Search in Google Scholar

[43] J. Wei, R. Zhang, Z. Yu, R. Hu, J. Tang, C. Gui and Y. Yuan, A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection, Appl. Soft Comput. J. 58 (2017), 176–192.10.1016/j.asoc.2017.04.061Search in Google Scholar

[44] J. Yang and V. Honavar, Feature subset selection using a genetic algorithm, IEEE Intell. Syst. 13 (1998), 44–49.10.1007/978-1-4615-5725-8_8Search in Google Scholar

[45] Z. Zhu, Y. S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognit. 40 (2007), 3236–3248.10.1016/j.patcog.2007.02.007Search in Google Scholar

Received: 2019-03-06

Published Online: 2019-09-14

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Articles in the same Issue

https://doi.org/10.1515/jisys-2019-0062

Keywords for this article

Feature selection; binary genetic swarm optimization; genetic algorithm; particle swarm optimization; UCI dataset; average weighted combination; sequential one-point flipping

Creative Commons

BY 4.0