Abstract
The relational database model constituted a major breakthrough in database technology. It provided a conceptual model for data storage and retrieval that made querying databases much easier. A crucial aspect of this was the introduction of declarative query languages, such as SQL. Today, databases are not only used for retrieving data, but also for analyzing them. While the science and technology of data analysis have made tremendous progress in the last decades, they have not made the leap that database technology has made. This observation was made already in the nineties. Two decades later, the situation has hardly improved. Yet, progress has been made in several relevant areas. This article discusses the concept of declarative data analysis, the scientific fields that are relevant for this, the state of the art, and some future directions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The term “versatile models” has elsewhere been used for models that can be used under different contexts, rather than for different tasks [1]. In this text, “versatility” consistently refers to versatility with respect to inference tasks, not contexts.
L. De Raedt, http://synth.cs.kuleuven.be/.
Project G079416N supported by Research Foundation—Flanders.
References
Al-Otaibi, R., Prudêncio, RBC., Kull, M., Flach, PA.: Versatile decision trees for learning over multiple contexts. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Proceedings, Part I, pp. 184–199. Springer (2015)
Blockeel, H.: Data mining: from procedural to declarative approaches. N. Gen. Comput. 33(2), 115–135 (2015)
Blockeel, H., Calders, T., Fromont, É., Goethals, B., Prado, A., Robardet, C.: An inductive database system based on virtual mining views. Data Min. Knowl. Discov. 24(1), 247–287 (2012)
Boulicaut, J., Jeudy, B.: Constraint-based data mining. In: The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer (2005)
Brazdil, P., Giraud-Carrier, C.G., Soares, C., Vilalta, R.: Metalearning—Applications to Data Mining. Cognitive Technologies. Springer, New York (2009)
Campbell, S.: Flaws and Fallacies in Statistical Thinking. Prentice Hall, Upper Saddle River (1974)
Castro Sotos, A., Vanhoof, S., Van den Noortgate, W., Onghena, P.: Students misconceptions of statistical inference: a review of the empirical evidence from research on statistics education. Educ. Sci. Rev. 2, 98–113 (2007)
Sotos, A. E. C., Vanhoof, S., Van den Noortgate, W., Onghena, P.: How confident are students in their misconceptions about hypothesis tests?. J. Stat. Educ. 17(2) (2009). https://doi.org/10.1080/10691898.2009.11889514
De Raedt, L.: Languages for learning and mining. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4107–4111 (2015)
Dzeroski, S., Goethals, B., Panov, P. (eds.): Inductive Databases and Constraint-Based Data Mining. Springer, New York (2010)
Guns, T., Dries, A., Nijssen, S., Tack, G., Raedt, L.D.: MiningZinc: a declarative framework for constraint-based mining. Artif. Intell. 244, 6–29 (2017)
Hand, D.J.: Expert systems in statistics. Knowl. Eng. Rev. 1(3), 2–10 (1986)
Huff, D.: How to Lie with Statistics. Norton, New York (1954)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39(11), 58–64 (1996)
Kolb, S., Paramonov, S., Guns, T., De Raedt, L.: Learning constraints in spreadsheets and tabular data. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5640-x
Mansinghka, V., Shafto, P., Jonas, E., Petschulat, C., Gasner, M., Tenenbaum, J.: Crosscat: a fully bayesian nonparametric method for analyzing heterogeneous, high dimensional data. J. Mach. Learn. Res. 17, 1–49 (2016)
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, pp. 122–133 (1996)
Nuzzo, R.: Statistical errors. Nature 506, 150–152 (2014)
Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pp. 269–274 (1995)
Siegfried, T.: Odds are, it’s wrong. Sci. News 177(7), 26 (2010)
Thornton, C., Hutter, F., Hoos, HH., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)
Van Craenendonck, T., Blockeel, H.: Constraint-based clustering selection. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5643-7
Vanwinckelen, G., Blockeel, H.: A declarative query language for statistical inference. In: ECML/PKDD 2013 Workshop: Languages for Data Mining and Machine Learning (2013). https://lirias.kuleuven.be/handle/123456789/417739
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016)
Wicker, J., Richter, L., Kessler, K., Kramer, S.: SINDBAD and SiQL: an inductive database and query language in the relational model. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Proceedings, Part II, pp. 690–694. Springer (2008)
Acknowledgements
The ideas mentioned in this paper were inspired by discussions with Luc De Raedt and other members of the DTAI lab at KU Leuven. They were developed further in the course of consecutive projects funded by the Research Foundation—Flanders, in particular G068211N (Declarative experimentation for machine learning) and G079416N (MERCS: efficient modeling of big data with multi-directional ensembles of decision trees), and within the SYNTH project (ERC-ADG-2015, Project 694980) funded by the European Research Council. The author thanks the anonymous reviewers for several comments that improved the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Blockeel, H. Declarative data analysis. Int J Data Sci Anal 6, 217–223 (2018). https://doi.org/10.1007/s41060-017-0081-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-017-0081-y