Abstract
Relational learning algorithms learn the definition of a new relation in terms of existing relations in the database. The same database may be represented under different schemas for various reasons, such as efficiency, data quality, and usability. Unfortunately, the output of current relational learning algorithms tends to vary quite substantially over the choice of schema, both in terms of learning accuracy and efficiency. We introduce the property of schema independence of relational learning algorithms, and study both the theoretical and empirical dependence of existing algorithms on the common class of (de) composition schema transformations. We show theoretically and empirically that current relational learning algorithms are generally not schema independent. We propose Castor, a relational learning algorithm that achieves schema independence.
Similar content being viewed by others
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level. Addison-Wesley, Boston (1994)
Abouzied, A., Angluin, D., Papadimitriou, C., Hellerstein, J., Silberschatz, A.: Learning and verifying quantified boolean queries by example. In: PODS, pp. 49–60 (2013)
Angluin, D., Frazier, M., Pitt, L.: Learning conjunctions of Horn clauses. Mach. Learn. 9(2–3), 147–164 (1992)
Arias, M., Khardon, R., Maloberti, J.: Learning Horn expressions with LOGAN-H. J. Mach. Learn. Res. 8, 549–587 (2007)
Atzeni, P., Ausiello, G., Batini, C., Moscarini, M.: Inclusion and Equivalent Between Relational Database Schemata. TCS, Mumbai (1982)
Cate, B.T., Dalmau, V., Kolaitis, P.G.: Learning schema mappings. TODS 38(4), 28:1–28:31 (2013)
Chen, Y., Goldberg, S., Wang, D.Z., Johri, S.S.: Ontological pathfinding: mining first-order knowledge from large knowledge bases. In: SIGMOD, pp. 835–846 (2016)
Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Laer, W.V.: Query transformations for improving the efficiency of ILP systems. J. Mach. Learn. Res. 4, 465–491 (2003)
Davis, J., Burnside, E.S., de Castro Dutra, I., Page, D., Ramakrishnan, R., Costa, V.S., Shavlik, J.W.: View learning for statistical relational learning: with an application to mammography. In: IJCAI, pp. 677–683 (2005)
De Raedt, L.: Logical and Relational Learning, 1st edn. Springer, Berlin (2010)
Fagin, R.: Inverting schema mappings. TODS 32(4), 25 (2007)
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT, pp. 207–224. Springer, Berlin, Heidelberg (2003)
Fan, W., Bohannon, P.: Information preserving xml schema embedding. TODS 33(1), 1–44 (2008)
Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)
Getoor, L., Machanavajjhala, A.: Entity resolution in big data. In: KDD, p. 1527 (2013)
Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Hull, R.: Relative information capacity of simple relational database schemata. In: PODS, pp. 97–109 (1984)
Khardon, R.: Learning function-free Horn expressions. Mach. Learn. 37(3), 241–275 (1999)
Kuželka, O., Železný, F.: A restarted strategy for efficient subsumption testing. Fundam. Inf. 89(1), 95–109 (2009)
Li, H., Chan, C.-Y., Maier, D.: Query from examples: an iterative, data-driven approach to query construction. PVLDB 8(13), 2158–2163 (2015)
Muggleton, S.: Inverse Entailment and Progol. New Gener. Comput. Special Issue Inductive Log. Program. 13, 245–286 (1995)
Muggleton, S., Feng, C.: Efficient induction of logic programs. In: ALT. Springer/Ohmsha, pp. 368–381 (1990)
Muggleton, S., Santos, J.C.A., Tamaddoni-Nezhad, A.: Progolem: a system based on relative minimal generalisation. In: De Raedt, L. (ed.) ILP, vol. 5989. Springer, Berlin, Heidelberg (2009)
Picado, J., Termehchy, A., Fern, A., Ataei, P.: Schema independent relational learning. arXiv:1508.03846 (2015)
Picado, J., Termehchy, A., Fern, A., Ataei, P.: Schema independent relational learning. In: SIGMOD, pp. 929–944 (2017)
Quinlan, J.R.: Learning logical definitions from relations. Mach. Learn. 5, 239–266 (1990)
Reddy, C., Tadepalli, P.: Learning Horn definitions: theory and an application to planning. New Gener. Comput. 17, 77–98 (1998)
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)
Srinivasan, A.: The Aleph manual (2007). http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/aleph. Accessed 15 Oct 2018
Termehchy, A., Winslett, M., Chodpathumwan, Y.: How schema independent are schema free query interfaces? In: ICDE, pp. 645–660 (2011)
Zeng, Q., Patel, J.M., Page, D.: QuickFOIL: scalable inductive logic programming. PVLDB 8(3), 197–208 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Picado, J., Termehchy, A., Fern, A. et al. Logical scalability and efficiency of relational learning algorithms. The VLDB Journal 28, 147–171 (2019). https://doi.org/10.1007/s00778-018-0523-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0523-8