Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2463372.2463432acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

The benefits of using multi-objectivization for mining pittsburgh partial classification rules in imbalanced and discrete data

Published: 06 July 2013 Publication History

Abstract

A large number of rule interestingness measures have been used as objectives in multi-objective classification rule mining algorithms. Aggregation or Pareto dominance are commonly used to deal with these multiple objectives. This paper compares these approaches on a partial classification problem over discrete and imbalanced data. After performing a Principal Component Analysis (PCA) to select candidate objectives and find conflictive ones, the two approaches are evaluated. The Pareto dominance-based approach is implemented as a dominance-based local search (DMLS) algorithm using confidence and sensitivity as objectives, while the other is implemented as a single-objective hill climbing using F-Measure as an objective, which combines confidence and sensitivity. Results shows that the dominance-based approach obtains statistically better results than the single-objective approach.

References

[1]
J. Alcalá-Fdez, L. Sánchez, S. García, M. del Jesus, S. Ventura, J. Garrell, J. Otero, C. Romero, J. Bacardit, V. Rivas, J. Fernández, and F. Herrera. Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 13:307--318, 2009.
[2]
G. H. B. Bergmann. Improvements of general multiple test procedures for redundant systems of hypotheses. Proc. Symp. on Multiple Hypotheses Testing, Springer, Berlin, pages 100--115, 1987.
[3]
J. Bacardit. Pittsburgh Genetic-Based Machine Learning in the Data Mining Era: Representations, generalization, and run-time. PhD thesis, Universitat Ramon Llull Barcelona, 2004.
[4]
D. R. Carvalho and A. A. Freitas. A genetic algorithm-based solution for the problem of small disjuncts. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD '00, pages 345--352, London, UK, UK, 2000.
[5]
K. Deb and A. Raji Reddy. Reliable classification of two-class cancer data using evolutionary algorithms. Biosystems, 72(1--2):111--129, 2003.
[6]
J. Demšar. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1--30, 2006.
[7]
A. Fernández, S. Garciá, J. Luengo, E. Bernadó-Mansilla, and F. Herrera. Genetics-based machine learning for rule induction: State of the art, taxonomy, and comparative study. Evolutionary Computation, IEEE Transactions on, 14(6):913--941, dec. 2010.
[8]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[9]
M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200):pp. 675--701, 1937.
[10]
S. García, F. Herrera, and J. Shawe-taylor. An extension on 'statistical comparisons of classifiers over multiple data sets' for all pairwise comparisons. Journal of Machine Learning Research, pages 2677--2694, 2008.
[11]
L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys (CSUR), Volume 38 Issue 3, 2006.
[12]
I. Guyon, C. F. Aliferis, G. F. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, and A. R. Statnikov. Design and analysis of the causation and prediction challenge. Journal of Machine Learning Research - Proceedings Track, 3:1--33, 2008.
[13]
J. Handl, S. C. Lovell, and J. Knowles. Investigations into the effect of multiobjectivization in protein structure prediction. In Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X, pages 702--711, 2008.
[14]
R. L. Iman and J. M. Davenport. Approximations of the critical region of the friedman statistic. Communications in Statistics, pages 571--595, 1980.
[15]
H. Ishibuchi, Y. Nojima, and T. Doi. Comparison between single-objective and multi-objective genetic algorithms: Performance comparison and performance measures. In Evolutionary Computation, 2006. CEC 2006. IEEE Congress on, pages 1143--1150, 0-0 2006.
[16]
J. Jacques, J. Taillard, D. Delerue, L. Jourdan, and C. Dhaenens. MOCA-I: discovering rules and guiding decision maker in the context of partial classification in large and imbalanced datasets. Learning and Intelligent OptimizatioN, LNCS, page (in press), 2013.
[17]
M. Jensen. Guiding single-objective optimization using multi-objective methods. In Applications of Evolutionary Computing, volume 2611 of LNCS, pages 268--279. Springer Berlin Heidelberg, 2003.
[18]
J. Josse. Factominer : An R package for multivariate analysis. Journal of Statistical Software, 25(1):1--18, 2008.
[19]
J. Knowles and D. Corne. Approximating the non-dominated front using the pareto archived evolution strategy. Evolutionary Computation, 8:149--172, 1999.
[20]
J. Knowles, R. Watson, and D. Corne. Reducing local optima in single-objective problems by multi-objectivization. In Proceedings of the First International Conference on Evolutionary Multi-Criterion Optimization, EMO '01, pages 269--283. Springer-Verlag, 2001.
[21]
A. Liefooghe, J. Humeau, S. Mesmoudi, L. Jourdan, and E.-G. Talbi. On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems. J. of Heuristics, 18:317--352, 2012.
[22]
M. Ohsaki, H. Abe, S. Tsumoto, H. Yokoi, and T. Yamaguchi. Evaluation of rule interestingness measures in medical knowledge discovery in databases. Artificial Intelligence in Medicine, 41:177--196, 2007.
[23]
L. Paquete, M. Chiarandini, and T. Stützle. Pareto local optimum sets in the biobjective traveling salesman problem: An experimental study. In Metaheuristics For Multiobjective Optimization, Lecture, pages 177--200, 2004.
[24]
J. C. Platt. Advances in kernel methods. chapter Fast training of support vector machines using sequential minimal optimization, pages 185--208. MIT Press, 1999.
[25]
J. Rissanen. Modeling by shortest data description. Automatica, 14(5):465--471, 1978.
[26]
S. Srinivasan and S. Ramakrishnan. Evolutionary multi objective optimization for rule mining: a review. Artificial Intelligence Review, 36(3):205--248, 2011.
[27]
K. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering, 14(3):659--665, 2002.
[28]
S. Watanabe and K. Sakakibara. A multiobjectivization approach for vehicle routing problems. In Proceedings of the 4th international conference on Evolutionary multi-criterion optimization, EMO'07, pages 660--672. Springer-Verlag, 2007.
[29]
G. M. Weiss. Timeweaver : a Genetic Algorithm for Identifying Predictive Patterns in Sequences of Events, volume 1, pages 718--725. Morgan Kaufmann, 1999.

Cited By

View all
  • (2024)Design and Maintenance Optimisation of Substation Automation Systems: A Multiobjectivisation Approach ExplorationJournal of Engineering10.1155/2024/93905452024:1Online publication date: 6-Jan-2024
  • (2022)Metaheuristics for data mining: survey and opportunities for big dataAnnals of Operations Research10.1007/s10479-021-04496-0314:1(117-140)Online publication date: 18-Jan-2022
  • (2020)Multi-objective Automatic Algorithm Configuration for the Classification Problem of Imbalanced Data2020 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC48606.2020.9185785(1-8)Online publication date: Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '13: Proceedings of the 15th annual conference on Genetic and evolutionary computation
July 2013
1672 pages
ISBN:9781450319638
DOI:10.1145/2463372
  • Editor:
  • Christian Blum,
  • General Chair:
  • Enrique Alba
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. imbalance data
  2. multiobjective
  3. partial classification

Qualifiers

  • Research-article

Conference

GECCO '13
Sponsor:
GECCO '13: Genetic and Evolutionary Computation Conference
July 6 - 10, 2013
Amsterdam, The Netherlands

Acceptance Rates

GECCO '13 Paper Acceptance Rate 204 of 570 submissions, 36%;
Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Design and Maintenance Optimisation of Substation Automation Systems: A Multiobjectivisation Approach ExplorationJournal of Engineering10.1155/2024/93905452024:1Online publication date: 6-Jan-2024
  • (2022)Metaheuristics for data mining: survey and opportunities for big dataAnnals of Operations Research10.1007/s10479-021-04496-0314:1(117-140)Online publication date: 18-Jan-2022
  • (2020)Multi-objective Automatic Algorithm Configuration for the Classification Problem of Imbalanced Data2020 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC48606.2020.9185785(1-8)Online publication date: Jul-2020
  • (2020)Automatic Configuration of a Multi-objective Local Search for Imbalanced ClassificationParallel Problem Solving from Nature – PPSN XVI10.1007/978-3-030-58112-1_5(65-77)Online publication date: 31-Aug-2020
  • (2020)Impact of the Discretization of VOCs for Cancer Prediction Using a Multi-Objective AlgorithmLearning and Intelligent Optimization10.1007/978-3-030-53552-0_16(151-157)Online publication date: 18-Jul-2020
  • (2019)Metaheuristics for data mining4OR10.1007/s10288-019-00402-4Online publication date: 6-Apr-2019
  • (2016)BibliographyMetaheuristics for Big Data10.1002/9781119347569.biblio(161-186)Online publication date: 17-Sep-2016
  • (2015)Using multi-objective evolutionary algorithms for single-objective constrained and unconstrained optimizationAnnals of Operations Research10.1007/s10479-015-2017-z240:1(217-250)Online publication date: 22-Sep-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media