Abstract
Data Mining (DM) algorithms are able to construct models from available data that can be very useful for both business and science. However, a powerful representation language is required to express the highly complex models that stem from structured data. Multirelational algorithms can then take advantage of this representation for both data and models. The drawback is that for very large or highly complex domains multirelational algorithms may require long running times. This problem can be substantially reduced using parallel implementations. In this chapter, we present a survey on parallel approaches to run Inductive Logic Programming (ILP), a flavor of multirelational algorithms. We also analyze different scheduling approaches for those implementations and describe two applications where the proposed approaches may be very useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We refer to John Loyd’s book [30] for basic concepts and definitions of Logic Programming.
- 2.
A partial order is a reflexive, antisymmetric, and transitive binary relation.
- 3.
Counting the number of examples derivable from the hypothesis and the background knowledge.
- 4.
As many as the size of the sample
- 5.
Propositional level algorithms
- 6.
or a sheet of a spreadsheet
- 7.
An approach where the activity of a compound, for example, is predicted only based on its structure features.
- 8.
A detailed description of this study can be found in [66]
- 9.
Source data for both data sets is available from the Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network from the US Environmental Protection Agency http://www.epa.gov/ncct/dsstox/index.html, accessed Dec 2008.
- 10.
Available from the Oxford University Machine Learning repository http://www.cs.ox.ac.uk/activities/machlearn/applications.html
- 11.
Except for the carcinogenesis data set
References
Alves, A., Camacho, R., Oliveira, E.: Discovery of functional relationships in multi-relational data using inductive logic programming. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 Nov 2004, Brighton, UK, pp. 319–322 (2004)
EC Amazon: Amazon elastic compute cloud (amazon ec2), 2010. https://aws.amazon.com/ec2/
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Blaták J., Popelínský, L.: dRAP: A framework for distributed mining firts-order frequent patterns. In: Proceedings of the 16th Conference on Inductive Logic Programming, pp. 25–27. Springer, (2006)
Bone, P., Somogyi, Z., Schachte, P.: Estimating the overlap between dependent computations for automatic parallelization. TPLP 11(4–5), 575–591 (2011)
Bratko, I., Muggleton, S., Varsek, A.: Learning qualitative models of dynamic systems. In: Proceedings of the Eighth International Machine Learning Workshop, San Mateo, Ca, 1991. Morgan-Kaufmann
Buntine, W.: Generalised subsumption and its applications to induction and redundancy. Artif. Intell. J. 36(2):149–176 (1988). revised version of the paper that won the A.I. Best Paper Award at ECAI-86
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw.: Pract. Experience 41(1):23–50 (2011)
Clare, A., King, R.D.: Data mining the yeast genome in a lazy functional language. In: Proceedings of the Fifth International Symposium on Practical Aspects of Declarative Languages, pp. 19–36 (2003)
Costa, V.S., de Castro Dutra, I., Rocha, R.: Threads and or-parallelism unified. TPLP 10(4–6), 417–432 (2010)
Dasgupta, K., Mandal, B., Dutta, P., Kumar Mandal, J., Dam, S.: A genetic algorithm (ga) based load balancing strategy for cloud computing. Proc. Technol. 10, 340–347 (2013)
Jeffrey, D., Sanjay, G.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases (1995)
Delgado, J., Salah Eddin, A., Adjouadi, M., Sadjadi, S.M.: Paravirtualization for scientific computing: performance analysis and prediction. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pp. 536–543. IEEE (2011)
Fayyad, U.M., Uthurusamy, R., (eds.) In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, 1995. AAAI Press (1995)
Nuno, A., Ashwin Srinivasan, F., Silva, F.M.A., Camacho, R.: Parallel ilp for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)
Message Passing Interface Forum: MPI: A message-passing interface standard. Technical Report UT-CS-94-230, University of Tennessee, Knoxville, TN, USA (1994)
Ian Foster and Carl Kesselman: The Grid 2: Blueprint for a new computing infrastructure. Elsevier (2003)
Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.: Above the clouds: A berkeley view of cloud computing. Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28:13 (2009)
Garnett, M.J., Edelman, E.J., Heidorn, S.J., Greenman, C.D., Dastur, A., Lau, K.W., Patricia Greninger, I., Thompson, R., Luo, X., Soares, J., et al.: Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483(7391), 570–575 (2012)
Cloud Google: Google cloud platform. https://cloud.google.com/
Graham, J., Page, D., Kamal, A.: Accelerating the drug design process through parallel inductive logic programming data mining. In: Proceeding of the Computational Systems Bioinformatics (CSB’03). IEEE (2003)
Gupta, G., Pontelli, E., Ali, K.A.M., Carlsson, M., Hermenegildo, M.V.: Parallel execution of prolog programs: a survey. ACM Trans. Program. Lang. Syst. 23(4), 472–602 (2001)
Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: Proceedings of the 20th annual international conference on Supercomputing, pp. 125–134. ACM (2006)
Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads 16(3), 14–18 (2010)
King, R., Muggleton S., Lewis, R., Sternberg, M.: Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. National Acad. Sci. 89(23) (1992)
King, R., Sternberg, M.J.E.: A machine learning approach for the prediction of protein secondary structure. J. Mol. Biol. 216, 441–457 (1990)
Konstantopoulos, S.K.: A data-parallel version of Aleph. In: Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning, co-located with ECML/PKDD’2003, Dubrovnik, Croatia, 2003
Krishna, P.V.: Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl. Soft Comput. 13(5), 2292–2303 (2013)
Lloyd, J.W.: Foundations of Logic Programming. Springer, New York (1984)
Martinez-Angeles, C.A., de Castro Dutra, I., Costa, V.S., Buenabad-Chavez, J.: A datalog engine for gpus. In: Declarative Programming and Knowledge Management - Declarative Programming Days, KDPD 2013, Unifying INAP, WFLP, and WLP, Kiel, Germany, September 11-13, 2013, Revised Selected Papers, vol. 8439 of Lecture Notes in Computer Science, pp. 152–168. Springer (2013)
Matsui, T., Inuzuka, N., Seki, H., Itoh, H.: Comparison of three parallel implementations of an induction algorithm. In: 8th International Parallel Computing Workshop, pp. 181–188. Singapore (1998)
Mauch, Viktor, Kunze, Marcel, Hillenbrand, Marius: High performance cloud computing. Future Gen. Comput. Syst. 29(6), 1408–1416 (2013)
Menden, M.P., Iorio, F., Garnett, M., McDermott, U., Benes, C.H., Ballester, P.J., Saez-Rodriguez, J.: Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PloS one 8(4), e61318 (2013)
Michalski, R.S.: Pattern recognition as rule-guided inductive inference. In: Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 349–361 (1980)
Mitchell, T.M.: Generalization as search. Artificial intell. 18(2), 203–226 (1982)
Muggleton, S.: Inductive logic programming. New Gener. Comput. 8(4), 295–317 (1991)
Muggleton, S.: Inductive logic programming: derivations, successes and shortcomings. In: Proceedings of the European Conference on Machine Learning: ECML-93, pp. 21–37, Vienna, Austria, April 1993
Muggleton, S.: Inverse entailment and progol. New Gener. Comput. Special issue on Inductive Logic Programming 13(3–4), 245–286 (1995)
Muggleton, S.: Learning from positive data. In: Inductive Logic Programming, 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers, pp. 358–376 (1996)
Muggleton, S., Firth, J.: Relational rule induction with CProgol4.4: a tutorial introduction. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 160–188. Springer (2001)
Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19(20), 629–679 (1994)
Ohwada, H., Mizoguchi, F.: Parallel execution for speeding up inductive logic programming systems. In: Proceedings of the 9th International Workshop on Inductive Logic Programming, number 1721 in LNAI, pp. 277–286. Springer (1999)
Ohwada, H., Nishiyama, H., Mizoguchi, F.: Concurrent execution of optimal hypothesis search for inverse entailment. In: Cussens, J., Frisch, A. (eds.) Proceedings of the 10th International Conference on Inductive Logic Programming, vol. 1866 of LNAI, pp. 165–173. Springer (2000)
Pacini, Elina, Mateos, Cristian, Garino, Carlos García: Distributed job scheduling based on swarm intelligence: a survey. Comput. Electr. Eng. 40(1), 252–269 (2014)
Plotkin, G.D.: A note on inductive generalisation, pp. 153–163. In: Meltzer, B., Michie, D. (eds.) Edinburgh University Press, Edinburgh (1969)
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Brazdil, P. (ed.) Proceedings of the 6th European Conference on Machine Learning, vol. 667, pp. 3–20. Springer (1993)
Ramakrishnan, L., Zbiegel, P.T., Campbell, S., Bradshaw, R., Canon, R.S., Coghlan, S., Sakrejda, I., Desai, N., Declerck, T., Liu, A.: Magellan: experiences from a science cloud. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 49–58. ACM (2011)
Ramezani, F., Lu, J., Hussain, F.: Task based system load balancing approach in cloud environments. In: Knowledge Engineering and Management, pp. 31–42. Springer (2014)
Ramezani, F., Jie, L., Hussain, F.K.: Task-based system load balancing in cloud computing using particle swarm optimization. Int. J. Parallel Programm. 42(5), 739–754 (2014)
RD1, K., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc Natl Acad Sci USA, 9(93(1)):438–42 (1996)
Reinaldo, F., Fernandes, C., Rahman, A., Malucelli, A., Camacho, R.: Assessing the eligibility of kidney transplant donors. In: Machine Learning and Data Mining in Pattern Recognition, 6th International Conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings, pp. 802–809 (2009)
Robinson, J.A.: A machine-oriented logic based on the resolution principle. J. ACM 12(1), 23–41 (1965)
Vítor, S.C., Ashwin, S., Rui, C., Hendrik, B., Bart, D., Gerda, J., Jan, S., Henk, V., Wim, V.L: Query transformations for improving the efficiency of ILP systems. J. Mach. Learn. Res. 4, 465–491 (2003)
Skillicorn, David B., Wang, Yu.: Parallel and sequential algorithms for data mining using inductive logic. Knowl. Inf. Syst. 3(4), 405–421 (2001)
Smith, R.G.: The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Trans. Comput. 29(12), 1104–1113 (1980)
Srinivasan, A.: The Aleph Manual, 2003. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Inductive Logic Programming, 7th International Workshop, ILP-97, Prague, Czech Republic, Sept. 17–20, 1997, Proceedings, pp. 273–287 (1997)
Fonseca, N.A., Pereira, M., Santos Costa, V., Camacho, R.: Interactive discriminative mining of chemical fragments. In: Proceedings of the 2010 International Conference on Inductive Logic Programming (ILP 2010), number 6489 in Lecture Notes in Artificial Intelligence, pp. 59–66. Springer (2011)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2012)
Wielemaker, J.: Native preemptive threads in SWI-Prolog. In: Palamidessi, C. (ed.) Proceedings of the 19th International Conference on Logic Programming, vol. 2916 of LNAI, pp. 331–345. Springer (2003)
Wirth., R.: Learning by failure to prove. In: Proceedings Third European Working Session on Learning, pp. 237–251. London (1988) Pitman
Woo, Y.T., Lai, D., McLain, J.L., Manibusan, M.K., Dellarco, V.: Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products. Environ. Health Perspect 110((Suppl 1)), 75–87 (2002)
Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 1–46 (2015)
Xu, Y., Wu, L., Guo, L., Chen, Z., Yang, L., Shi, Z.: An intelligent load balancing algorithm towards efficient cloud computing. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Zaverucha, G., Santos Costa, V., Paes, A. (eds.) Inductive logic programming—23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers, volume 8812 of Lecture Notes in Computer Science. Springer (2014)
Zhan, Z.H., Fang Liu, X., Jiao Gong, Y., Zhang, J., Shu-Hung Chung, H., Li. Y.: Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Computing Surveys (CSUR) 47(4), 63 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Camacho, R., Barbosa, J.G., Sampaio, A., Ladeiras, J., Fonseca, N.A., Costa, V.S. (2016). Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-44881-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)