Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297280.3297281acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

An insight into biological datamining based on rarity and correlation as constraints

Published: 08 April 2019 Publication History

Abstract

Association-rules mining techniques have been widely applied to identify differently expressed gene expressions among micro-array data. Rare correlated patterns are identified as efficient in generating accurate association rules. To our knowledge, no any algorithm which is able to perform this challenging task is currently available. Therefore, we designed CPMiner, a new generic method for mining interesting correlations from data. It performs the extraction of the sets of both frequent correlated and rare correlated patterns as well as their associated condensed representations according to two distinct measures: all-confidence and bond. CPMiner has been applied on processing biological data. To this end we developed CoRaM, the first unified framework dedicated to the extraction of a generic basis of Correlated-Rare association rules from gene expression data. It relies on CPMiner extracting the rare correlated patterns and both a specific discretization method and the derivation of the generic basis of the rare correlated association rules. Our proposed approach has been successfully applied on a breast-cancer Gene Expression Matrix (GSE1379) with very promising results.

References

[1]
S. Ben Yahia, Ghada Gasmi, and Engelbert Mephu Nguifo. 2009. A new generic basis of "factual" and "implicative" association rules. Intell. Data Anal. 13, 4, 633--656.
[2]
N. Ben Younes, T. Hamrouni, and S. Ben Yahia. 2010. Bridging Conjunctive and Disjunctive Search Spaces for Mining a New Concise and Exact Representation of Correlated Patterns. In Proceedings of the 13th intl. Conference Discovery Science (DS 2010), LNCS, volume 6332, Springer-Verlag, Canberra, Australia (2010). 189--204.
[3]
S. Bouasker and S. Ben Yahia. 2013. Inferring Knowledge from Concise Representations of both Frequent and Rare Jaccard Itemsets. In Proceedings of the 24th intl. Conference on Databases and Expert Systems Applications (DEXA 2013), Prague, Czech Republic. 109--123.
[4]
S. Bouasker, T. Hamrouni, and S. Ben Yahia. 2012. New Exact Concise Representation of Rare Correlated Patterns: Application to Intrusion Detection. In Proceedings of the 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2012), Kuala Lumpur, Malaysia. 61--72.
[5]
B. Ganter and R. Wille. 1999. Formal Concept Analysis. Springer.
[6]
T. Hamrouni, S. Ben Yahia, and E. Mephu Nguifo. 2013. Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts. Inf. Sci. 222 (2013), 343--361.
[7]
W. Y. Kim, Y. K. Lee, and J. Han. 2004. CCM<scp>ine</scp>: Efficient mining of confidence-closed correlated patterns. In Proceedings of the 8th intl. Pacific-Asia Conference on Knowledge Data Discovery (PAKDD 2004), LNAI, volume 3056, Springer-Verlag, Sydney, Australie. 569--579.
[8]
R. Uday Kiran and M. Kitsuregawa. 2013. Mining Correlated Patterns with Multiple Minimum All-Confidence Thresholds. In Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2013). 295--306.
[9]
Y. Le Bras, P. Lenca, and S. Lallich. 2011. Mining Classification Rules without Support: an Anti-monotone Property of Jaccard Measure. In Proceedings of the 14th international conference on Discovery science - DS 2011, Espoo, Finland. 179--193.
[10]
X.J. Ma, Z. Wang, and P.D. Ryan et al. 2004. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell Journal 5 (2004), 607--616.
[11]
K. C. Mondal and N. Pasquier. 2014. Galois Closure Based Association Rule Mining from Biological Data. In Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data, New Jersey, USA. 761--802.
[12]
A. Mouakher and S. S. Ben Yahia. 2016. QualityCover: Efficient binary relation coverage guided by induced knowledge quality. Inf. Sci. 355--356 (2016), 58--73.
[13]
E. Omiecinski. 2003. Alternative Interest Measures for Mining Associations in Databases. IEEE Transactions on Knowledge and Data Engineering 15, 1 (2003), 57--69.
[14]
J. Quackenbush. 2002. Microarray data normalization and transformation. In Nature Genetics Journal, volume 32, December 2002. 496 -- 501.
[15]
M. Segond and C. Borgelt. 2011. Item Set Mining Based on Cover Similarity. In Proceedings of the PAKDD 2011, Shenzhen, China. 493--505.
[16]
D. Szklarczyk, A. Franceschini, and S. Wyder et al. 2015. STRING v10: protein-protein interaction networks, integrated over the tree of life. In Nucleic Acids Research, 43 (Database issue).

Cited By

View all
  • (2023)A Top-K formal concepts-based algorithm for mining positive and negative correlation biclusters of DNA microarray dataInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01949-915:3(941-962)Online publication date: 12-Sep-2023
  • (2020)Pregnancy Associated Breast Cancer Gene Expressions : New Insights on Their Regulation Based on Rare Correlated PatternsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.301523618:3(1035-1048)Online publication date: 10-Aug-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
April 2019
2682 pages
ISBN:9781450359337
DOI:10.1145/3297280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. association rule
  2. breast cancer
  3. condensed representation
  4. constraint
  5. correlation
  6. gene-expression
  7. rarity

Qualifiers

  • Research-article

Conference

SAC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Top-K formal concepts-based algorithm for mining positive and negative correlation biclusters of DNA microarray dataInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01949-915:3(941-962)Online publication date: 12-Sep-2023
  • (2020)Pregnancy Associated Breast Cancer Gene Expressions : New Insights on Their Regulation Based on Rare Correlated PatternsIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.301523618:3(1035-1048)Online publication date: 10-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media