Declarative data analysis

Hendrik Blockeel ORCID: orcid.org/0000-0003-0378-3699¹

563 Accesses
Explore all metrics

Abstract

The relational database model constituted a major breakthrough in database technology. It provided a conceptual model for data storage and retrieval that made querying databases much easier. A crucial aspect of this was the introduction of declarative query languages, such as SQL. Today, databases are not only used for retrieving data, but also for analyzing them. While the science and technology of data analysis have made tremendous progress in the last decades, they have not made the leap that database technology has made. This observation was made already in the nineties. Two decades later, the situation has hardly improved. Yet, progress has been made in several relevant areas. This article discusses the concept of declarative data analysis, the scientific fields that are relevant for this, the state of the art, and some future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing ExecuteSQL

Safety and Domain Independence

Data Management—Relational Database Systems (RDBMS)

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.automaticstatistician.com/index/.
The term “versatile models” has elsewhere been used for models that can be used under different contexts, rather than for different tasks [1]. In this text, “versatility” consistently refers to versatility with respect to inference tasks, not contexts.
http://probcomp.csail.mit.edu/bayesdb/.
L. De Raedt, http://synth.cs.kuleuven.be/.
Project G079416N supported by Research Foundation—Flanders.

References

Al-Otaibi, R., Prudêncio, RBC., Kull, M., Flach, PA.: Versatile decision trees for learning over multiple contexts. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Proceedings, Part I, pp. 184–199. Springer (2015)
Blockeel, H.: Data mining: from procedural to declarative approaches. N. Gen. Comput. 33(2), 115–135 (2015)
Article Google Scholar
Blockeel, H., Calders, T., Fromont, É., Goethals, B., Prado, A., Robardet, C.: An inductive database system based on virtual mining views. Data Min. Knowl. Discov. 24(1), 247–287 (2012)
Article Google Scholar
Boulicaut, J., Jeudy, B.: Constraint-based data mining. In: The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer (2005)
Brazdil, P., Giraud-Carrier, C.G., Soares, C., Vilalta, R.: Metalearning—Applications to Data Mining. Cognitive Technologies. Springer, New York (2009)
MATH Google Scholar
Campbell, S.: Flaws and Fallacies in Statistical Thinking. Prentice Hall, Upper Saddle River (1974)
Google Scholar
Castro Sotos, A., Vanhoof, S., Van den Noortgate, W., Onghena, P.: Students misconceptions of statistical inference: a review of the empirical evidence from research on statistics education. Educ. Sci. Rev. 2, 98–113 (2007)
Google Scholar
Sotos, A. E. C., Vanhoof, S., Van den Noortgate, W., Onghena, P.: How confident are students in their misconceptions about hypothesis tests?. J. Stat. Educ. 17(2) (2009). https://doi.org/10.1080/10691898.2009.11889514
De Raedt, L.: Languages for learning and mining. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4107–4111 (2015)
Dzeroski, S., Goethals, B., Panov, P. (eds.): Inductive Databases and Constraint-Based Data Mining. Springer, New York (2010)
MATH Google Scholar
Guns, T., Dries, A., Nijssen, S., Tack, G., Raedt, L.D.: MiningZinc: a declarative framework for constraint-based mining. Artif. Intell. 244, 6–29 (2017)
Article MathSciNet Google Scholar
Hand, D.J.: Expert systems in statistics. Knowl. Eng. Rev. 1(3), 2–10 (1986)
Article MathSciNet Google Scholar
Huff, D.: How to Lie with Statistics. Norton, New York (1954)
Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39(11), 58–64 (1996)
Article Google Scholar
Kolb, S., Paramonov, S., Guns, T., De Raedt, L.: Learning constraints in spreadsheets and tabular data. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5640-x
Article MathSciNet MATH Google Scholar
Mansinghka, V., Shafto, P., Jonas, E., Petschulat, C., Gasner, M., Tenenbaum, J.: Crosscat: a fully bayesian nonparametric method for analyzing heterogeneous, high dimensional data. J. Mach. Learn. Res. 17, 1–49 (2016)
MathSciNet MATH Google Scholar
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, pp. 122–133 (1996)
Nuzzo, R.: Statistical errors. Nature 506, 150–152 (2014)
Article Google Scholar
Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pp. 269–274 (1995)
Siegfried, T.: Odds are, it’s wrong. Sci. News 177(7), 26 (2010)
Article Google Scholar
Thornton, C., Hutter, F., Hoos, HH., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)
Van Craenendonck, T., Blockeel, H.: Constraint-based clustering selection. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5643-7
Article MathSciNet MATH Google Scholar
Vanwinckelen, G., Blockeel, H.: A declarative query language for statistical inference. In: ECML/PKDD 2013 Workshop: Languages for Data Mining and Machine Learning (2013). https://lirias.kuleuven.be/handle/123456789/417739
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016)
Article MathSciNet Google Scholar
Wicker, J., Richter, L., Kessler, K., Kramer, S.: SINDBAD and SiQL: an inductive database and query language in the relational model. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Proceedings, Part II, pp. 690–694. Springer (2008)

Download references

Acknowledgements

The ideas mentioned in this paper were inspired by discussions with Luc De Raedt and other members of the DTAI lab at KU Leuven. They were developed further in the course of consecutive projects funded by the Research Foundation—Flanders, in particular G068211N (Declarative experimentation for machine learning) and G079416N (MERCS: efficient modeling of big data with multi-directional ensembles of decision trees), and within the SYNTH project (ERC-ADG-2015, Project 694980) funded by the European Research Council. The author thanks the anonymous reviewers for several comments that improved the paper.

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Celestijnenlaan 200A, Box 2402, 3001, Leuven, Belgium
Hendrik Blockeel

Authors

Hendrik Blockeel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hendrik Blockeel.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blockeel, H. Declarative data analysis. Int J Data Sci Anal 6, 217–223 (2018). https://doi.org/10.1007/s41060-017-0081-y

Download citation

Received: 20 March 2017
Accepted: 01 November 2017
Published: 14 November 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s41060-017-0081-y

Abstract

Access this article

Subscribe and save

Buy Now