Abstract
Modern data mining algorithms frequently need to address learning from heterogeneous data and knowledge sources, including ontologies. A data mining task in which ontologies are used as background knowledge is referred to as semantic data mining. A special form of semantic data mining is semantic subgroup discovery, where ontology terms are used in subgroup describing rules. We propose to enhance ontology-based subgroup identification by Community-Based Semantic Subgroup Discovery (CBSSD), taking into account also the structural properties of complex networks related to the studied phenomenon. The application of the developed CBSSD approach is demonstrated on two use cases from the field of molecular biology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Plotted with the Py3Plex library (https://github.com/SkBlaz/Py3Plex).
References
Drummond, A.J., Rambaut, A.: Beast: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7(1), 214 (2007)
Madahian, B., Deng, L., Homayouni, R.: Development of a literature informed Bayesian machine learning method for feature extraction and classification. BMC Bioinform. 16(Suppl. 15), P9 (2015)
Lavrač, N., Džeroski, S.: Inductive Logic Programming (1994)
Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2012)
Balcan, N., Blum, A., Mansour, Y.: Exploiting structures and unlabeled data for learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. 1112–1120 (2013)
Liu, H., Dou, D., Jin, R., LePendu, P., Shah, N.: Mining biomedical ontologies and data using RDF hypergraphs. In: 2013 Proceedings of the 12th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 141–146. IEEE (2013)
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, pp. 2–5 (2003)
Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)
Eronen, L., Toivonen, H.: Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinform. 13(1), 119 (2012)
Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 294–307. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_20
Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H.: Contrasting subgroup discovery. Comput. J. 56(3), 289–303 (2012)
Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. Mach. Learn. 105(1), 3–39 (2016)
Cohen, R., Havlin, S.: Complex Networks: Structure, Robustness and Function. Cambridge University Press, Cambridge (2010)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. arXiv preprint physics/0506133 (2005)
Vrabič Rok, H.D., Butala, P.: Discovering autonomous structures within complex networks of work systems. CIRP Ann. Manuf. Technol. 61(1), 423–426 (2012)
Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268 (2001)
Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72(2), 027104 (2005)
The UniProt Consortium, et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2017)
Kanehisa, M., Goto, S.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: Genbank. Nucleic Acids Res. 41(D1), D36–D42 (2012)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Topics 178(1), 13–23 (2009)
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Škrlj, B., Konc, J., Kunej, T.: Identification of sequence variants within experimentally validated protein interaction sites provides new insights into molecular mechanisms of disease development. Mol. Inform. 36, 1–8 (2017)
Škrlj, B., Kunej, T.: Computational identification of non-synonymous polymorphisms within regions corresponding to protein interaction sites. Comput. Biol. Med. 79, 30–35 (2016)
Schröder, N.W., Schumann, R.R.: Single nucleotide polymorphisms of toll-like receptors and susceptibility to infectious disease. Lancet Infect. Dis. 5(3), 156–164 (2005)
Kamburov, A., Lawrence, M.S., Polak, P., Leshchiner, I., Lage, K., Golub, T.R., Lander, E.S., Getz, G.: Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Nat. Acad. Sci. 112(40), E5486–E5495 (2015)
Garrett, J.E., Capuano, I.V., Hammerland, L.G., Hung, B.C., Brown, E.M., Hebert, S.C., Nemeth, E.F., Fuller, F.: Molecular cloning and functional expression of human parathyroid calcium receptor cDNAs. J. Biol. Chem. 270(21), 12919–12925 (1995)
Nanda, J.S., Kumar, R., Raghava, G.P.: dbEM: a database of epigenetic modifiers curated from cancerous and normal genomes. Sci. Rep. 6, 19340 (2016)
Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., Stephens, R., Baseler, M.W., Lane, H.C., et al.: David bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(2), W169–W175 (2007)
Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., et al.: Segmine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinform. 12(1), 416 (2011)
Acknowledgments
This research was funded by the Slovenian Research Agency funded project HinLife: Analysis of Heterogeneous Information Networks for Knowledge Discovery in Life Sciences (J7-7303), as well as the The Human Brain Project (FET Flagship grant FP7-ICT-604102). The authors also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan-XP GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Škrlj, B., Kralj, J., Vavpetič, A., Lavrač, N. (2018). Community-Based Semantic Subgroup Discovery. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2017. Lecture Notes in Computer Science(), vol 10785. Springer, Cham. https://doi.org/10.1007/978-3-319-78680-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-78680-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78679-7
Online ISBN: 978-3-319-78680-3
eBook Packages: Computer ScienceComputer Science (R0)