Nothing Special   »   [go: up one dir, main page]

Skip to main content

Community-Based Semantic Subgroup Discovery

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10785))

Included in the following conference series:

Abstract

Modern data mining algorithms frequently need to address learning from heterogeneous data and knowledge sources, including ontologies. A data mining task in which ontologies are used as background knowledge is referred to as semantic data mining. A special form of semantic data mining is semantic subgroup discovery, where ontology terms are used in subgroup describing rules. We propose to enhance ontology-based subgroup identification by Community-Based Semantic Subgroup Discovery (CBSSD), taking into account also the structural properties of complex networks related to the studied phenomenon. The application of the developed CBSSD approach is demonstrated on two use cases from the field of molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Plotted with the Py3Plex library (https://github.com/SkBlaz/Py3Plex).

References

  1. Drummond, A.J., Rambaut, A.: Beast: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7(1), 214 (2007)

    Article  Google Scholar 

  2. Madahian, B., Deng, L., Homayouni, R.: Development of a literature informed Bayesian machine learning method for feature extraction and classification. BMC Bioinform. 16(Suppl. 15), P9 (2015)

    Article  Google Scholar 

  3. Lavrač, N., Džeroski, S.: Inductive Logic Programming (1994)

    Google Scholar 

  4. Vavpetič, A., Lavrač, N.: Semantic subgroup discovery systems and workflows in the SDM-toolkit. Comput. J. 56(3), 304–320 (2012)

    Article  Google Scholar 

  5. Balcan, N., Blum, A., Mansour, Y.: Exploiting structures and unlabeled data for learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. 1112–1120 (2013)

    Google Scholar 

  6. Liu, H., Dou, D., Jin, R., LePendu, P., Shah, N.: Mining biomedical ontologies and data using RDF hypergraphs. In: 2013 Proceedings of the 12th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 141–146. IEEE (2013)

    Google Scholar 

  7. Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, pp. 2–5 (2003)

    Google Scholar 

  8. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)

    Article  Google Scholar 

  9. Eronen, L., Toivonen, H.: Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinform. 13(1), 119 (2012)

    Article  Google Scholar 

  10. Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic data mining of financial news articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 294–307. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_20

    Chapter  Google Scholar 

  11. Langohr, L., Podpečan, V., Petek, M., Mozetič, I., Gruden, K., Lavrač, N., Toivonen, H.: Contrasting subgroup discovery. Comput. J. 56(3), 289–303 (2012)

    Article  Google Scholar 

  12. Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J.: Explaining mixture models through semantic pattern mining and banded matrix visualization. Mach. Learn. 105(1), 3–39 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cohen, R., Havlin, S.: Complex Networks: Structure, Robustness and Function. Cambridge University Press, Cambridge (2010)

    Book  MATH  Google Scholar 

  14. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. arXiv preprint physics/0506133 (2005)

    Google Scholar 

  15. Vrabič Rok, H.D., Butala, P.: Discovering autonomous structures within complex networks of work systems. CIRP Ann. Manuf. Technol. 61(1), 423–426 (2012)

    Article  Google Scholar 

  16. Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268 (2001)

    Article  MATH  Google Scholar 

  17. Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72(2), 027104 (2005)

    Article  Google Scholar 

  18. The UniProt Consortium, et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45(D1), D158–D169 (2017)

    Google Scholar 

  19. Kanehisa, M., Goto, S.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  20. Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: Genbank. Nucleic Acids Res. 41(D1), D36–D42 (2012)

    Article  Google Scholar 

  21. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)

    Google Scholar 

  22. Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  23. Rosvall, M., Axelsson, D., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Topics 178(1), 13–23 (2009)

    Article  Google Scholar 

  24. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  25. Škrlj, B., Konc, J., Kunej, T.: Identification of sequence variants within experimentally validated protein interaction sites provides new insights into molecular mechanisms of disease development. Mol. Inform. 36, 1–8 (2017)

    Google Scholar 

  26. Škrlj, B., Kunej, T.: Computational identification of non-synonymous polymorphisms within regions corresponding to protein interaction sites. Comput. Biol. Med. 79, 30–35 (2016)

    Article  Google Scholar 

  27. Schröder, N.W., Schumann, R.R.: Single nucleotide polymorphisms of toll-like receptors and susceptibility to infectious disease. Lancet Infect. Dis. 5(3), 156–164 (2005)

    Article  Google Scholar 

  28. Kamburov, A., Lawrence, M.S., Polak, P., Leshchiner, I., Lage, K., Golub, T.R., Lander, E.S., Getz, G.: Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc. Nat. Acad. Sci. 112(40), E5486–E5495 (2015)

    Article  Google Scholar 

  29. Garrett, J.E., Capuano, I.V., Hammerland, L.G., Hung, B.C., Brown, E.M., Hebert, S.C., Nemeth, E.F., Fuller, F.: Molecular cloning and functional expression of human parathyroid calcium receptor cDNAs. J. Biol. Chem. 270(21), 12919–12925 (1995)

    Article  Google Scholar 

  30. Nanda, J.S., Kumar, R., Raghava, G.P.: dbEM: a database of epigenetic modifiers curated from cancerous and normal genomes. Sci. Rep. 6, 19340 (2016)

    Article  Google Scholar 

  31. Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., Stephens, R., Baseler, M.W., Lane, H.C., et al.: David bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(2), W169–W175 (2007)

    Article  Google Scholar 

  32. Podpečan, V., Lavrač, N., Mozetič, I., Novak, P.K., Trajkovski, I., Langohr, L., Kulovesi, K., Toivonen, H., Petek, M., Motaln, H., et al.: Segmine workflows for semantic microarray data analysis in Orange4WS. BMC Bioinform. 12(1), 416 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

This research was funded by the Slovenian Research Agency funded project HinLife: Analysis of Heterogeneous Information Networks for Knowledge Discovery in Life Sciences (J7-7303), as well as the The Human Brain Project (FET Flagship grant FP7-ICT-604102). The authors also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan-XP GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Blaž Škrlj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Škrlj, B., Kralj, J., Vavpetič, A., Lavrač, N. (2018). Community-Based Semantic Subgroup Discovery. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2017. Lecture Notes in Computer Science(), vol 10785. Springer, Cham. https://doi.org/10.1007/978-3-319-78680-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78680-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78679-7

  • Online ISBN: 978-3-319-78680-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics