Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Declarative data analysis

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The relational database model constituted a major breakthrough in database technology. It provided a conceptual model for data storage and retrieval that made querying databases much easier. A crucial aspect of this was the introduction of declarative query languages, such as SQL. Today, databases are not only used for retrieving data, but also for analyzing them. While the science and technology of data analysis have made tremendous progress in the last decades, they have not made the leap that database technology has made. This observation was made already in the nineties. Two decades later, the situation has hardly improved. Yet, progress has been made in several relevant areas. This article discusses the concept of declarative data analysis, the scientific fields that are relevant for this, the state of the art, and some future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.automaticstatistician.com/index/.

  2. The term “versatile models” has elsewhere been used for models that can be used under different contexts, rather than for different tasks [1]. In this text, “versatility” consistently refers to versatility with respect to inference tasks, not contexts.

  3. http://probcomp.csail.mit.edu/bayesdb/.

  4. L. De Raedt, http://synth.cs.kuleuven.be/.

  5. Project G079416N supported by Research Foundation—Flanders.

References

  1. Al-Otaibi, R., Prudêncio, RBC., Kull, M., Flach, PA.: Versatile decision trees for learning over multiple contexts. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Proceedings, Part I, pp. 184–199. Springer (2015)

  2. Blockeel, H.: Data mining: from procedural to declarative approaches. N. Gen. Comput. 33(2), 115–135 (2015)

    Article  Google Scholar 

  3. Blockeel, H., Calders, T., Fromont, É., Goethals, B., Prado, A., Robardet, C.: An inductive database system based on virtual mining views. Data Min. Knowl. Discov. 24(1), 247–287 (2012)

    Article  Google Scholar 

  4. Boulicaut, J., Jeudy, B.: Constraint-based data mining. In: The Data Mining and Knowledge Discovery Handbook, pp. 399–416. Springer (2005)

  5. Brazdil, P., Giraud-Carrier, C.G., Soares, C., Vilalta, R.: Metalearning—Applications to Data Mining. Cognitive Technologies. Springer, New York (2009)

    MATH  Google Scholar 

  6. Campbell, S.: Flaws and Fallacies in Statistical Thinking. Prentice Hall, Upper Saddle River (1974)

    Google Scholar 

  7. Castro Sotos, A., Vanhoof, S., Van den Noortgate, W., Onghena, P.: Students misconceptions of statistical inference: a review of the empirical evidence from research on statistics education. Educ. Sci. Rev. 2, 98–113 (2007)

    Google Scholar 

  8. Sotos, A. E. C., Vanhoof, S., Van den Noortgate, W., Onghena, P.: How confident are students in their misconceptions about hypothesis tests?. J. Stat. Educ. 17(2) (2009). https://doi.org/10.1080/10691898.2009.11889514

  9. De Raedt, L.: Languages for learning and mining. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4107–4111 (2015)

  10. Dzeroski, S., Goethals, B., Panov, P. (eds.): Inductive Databases and Constraint-Based Data Mining. Springer, New York (2010)

    MATH  Google Scholar 

  11. Guns, T., Dries, A., Nijssen, S., Tack, G., Raedt, L.D.: MiningZinc: a declarative framework for constraint-based mining. Artif. Intell. 244, 6–29 (2017)

    Article  MathSciNet  Google Scholar 

  12. Hand, D.J.: Expert systems in statistics. Knowl. Eng. Rev. 1(3), 2–10 (1986)

    Article  MathSciNet  Google Scholar 

  13. Huff, D.: How to Lie with Statistics. Norton, New York (1954)

    Google Scholar 

  14. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Commun. ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  15. Kolb, S., Paramonov, S., Guns, T., De Raedt, L.: Learning constraints in spreadsheets and tabular data. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5640-x

    Article  MathSciNet  MATH  Google Scholar 

  16. Mansinghka, V., Shafto, P., Jonas, E., Petschulat, C., Gasner, M., Tenenbaum, J.: Crosscat: a fully bayesian nonparametric method for analyzing heterogeneous, high dimensional data. J. Mach. Learn. Res. 17, 1–49 (2016)

    MathSciNet  MATH  Google Scholar 

  17. Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, pp. 122–133 (1996)

  18. Nuzzo, R.: Statistical errors. Nature 506, 150–152 (2014)

    Article  Google Scholar 

  19. Siebes, A.: Data surveying: foundations of an inductive query language. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pp. 269–274 (1995)

  20. Siegfried, T.: Odds are, it’s wrong. Sci. News 177(7), 26 (2010)

    Article  Google Scholar 

  21. Thornton, C., Hutter, F., Hoos, HH., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)

  22. Van Craenendonck, T., Blockeel, H.: Constraint-based clustering selection. Mach. Learn. (2017). https://doi.org/10.1007/s10994-017-5643-7

    Article  MathSciNet  MATH  Google Scholar 

  23. Vanwinckelen, G., Blockeel, H.: A declarative query language for statistical inference. In: ECML/PKDD 2013 Workshop: Languages for Data Mining and Machine Learning (2013). https://lirias.kuleuven.be/handle/123456789/417739

  24. Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016)

    Article  MathSciNet  Google Scholar 

  25. Wicker, J., Richter, L., Kessler, K., Kramer, S.: SINDBAD and SiQL: an inductive database and query language in the relational model. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Proceedings, Part II, pp. 690–694. Springer (2008)

Download references

Acknowledgements

The ideas mentioned in this paper were inspired by discussions with Luc De Raedt and other members of the DTAI lab at KU Leuven. They were developed further in the course of consecutive projects funded by the Research Foundation—Flanders, in particular G068211N (Declarative experimentation for machine learning) and G079416N (MERCS: efficient modeling of big data with multi-directional ensembles of decision trees), and within the SYNTH project (ERC-ADG-2015, Project 694980) funded by the European Research Council. The author thanks the anonymous reviewers for several comments that improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hendrik Blockeel.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blockeel, H. Declarative data analysis. Int J Data Sci Anal 6, 217–223 (2018). https://doi.org/10.1007/s41060-017-0081-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-017-0081-y

Keywords