Abstract
This paper presents an exhaustive and unified dataset based on the European Court of Human Rights judgments since its creation. The interest of such database is explained through the prism of the researcher, the data scientist, the citizen and the legal practitioner. Contrarily to many datasets, the creation process, from the collection of raw data to the feature transformation, is provided under the form of a collection of fully automated and open-source scripts. It ensures reproducibility and a high level of confidence in the processed data, which is some of the most important issues in data governance nowadays. A first experimental campaign is performed to study some predictability properties and to establish baseline results on popular machine learning algorithms. The results are consistently good across the binary datasets with an accuracy comprised between 75.86% and 98.32% for a micro-average accuracy of 96.44%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Maastricht University Law and Tech Lab. https://www.maastrichtuniversity.nl/about-um/faculties/law/research/law-and-tech-lab
Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., Lampos, V.: Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ. Comput. Sci. 2, e93 (2016)
Ali, S.M.F., Wrembel, R.: From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J. 26(6), 777–801 (2017). https://doi.org/10.1007/s00778-017-0477-2
Ashley, K.D.: Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age. Cambridge University Press (2017)
Atkinson, K., Bench-Capon, T.: Reasoning with legal cases: analogy or rule application? In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL), pp. 12–21. ACM (2019)
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Intelligent assistance for data pre-processing. Comput. Stand. Interfaces 57, 101–109 (2018). https://doi.org/10.1016/j.csi.2017.05.004
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: PRESISTANT: learning based assistant for data pre-processing. Data Knowl. Eng. 123, 101727 (2019). https://doi.org/10.1016/j.datak.2019.101727
Crone, S.F., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur. J. Oper. Res. 173(3), 781–800 (2006)
Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning, vol. 479. Wiley, Hoboken (2003)
Guimerà, R., Sales-Pardo, M.: Justice Blocks and Predictability of U.S. Supreme Court Votes. PLoS ONE 6(11), e27188 (2011)
Katz, D.M., Bommarito, M.J., Blackman, J.: A general approach for predicting the behavior of the Supreme Court of the United States. PLoS ONE 12(4), e0174698 (2017)
Kelleher, J.D., Mac Namee, B., D’Arcy, A.: Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. MIT Press, Cambridge (2015)
Lemberger, P., Panico, I.: A primer on domain adaptation (2020)
Martin, A.D., Quinn, K.M., Ruger, T.W., Kim, P.T.: Competing approaches to predicting supreme court decision making. Perspect. Polit. 2(4), 761–767 (2004)
Medvedeva, M., Vols, M., Wieling, M.: Using machine learning to predict decisions of the European Court of Human Rights. Artif. Intell. Law 28(2), 237–266 (2019). https://doi.org/10.1007/s10506-019-09255-y
Pedregosa, F.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Quemy, A.: Data science techniques for law and justice: current state of research and open problems. In: Kirikova, M., et al. (eds.) ADBIS 2017. CCIS, vol. 767, pp. 302–312. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67162-8_30
Quemy, A.: Data pipeline selection and optimization. In: Proceedings of the International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) (2019)
Quemy, A.: ECHR-DB experiments, all detailed results (2019). https://github.com/echr-od/ECHR-OD_project_supplementary_material/blob/master/binary.md
Quemy, A.: Predictions of the European Court of Human Rights (2019). https://github.com/aquemy/ECHR-OD_predictions
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA (2010)
Rissland, E.L.: AI and similarity. IEEE Intell. Syst. 21(3), 39–49 (2006)
Ruger, T.W., Kim, P.T., Martin, A.D., Quinn, K.M.: The supreme court forecasting project: legal and political science approaches to predicting supreme court decisionmaking. Columbia Law Rev. 104(4), 1150–1210 (2004)
Yan, L., Wilson, C.: Developing AI for law enforcement in Singapore and Australia. Commun. ACM 63(4), 62 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Quemy, A., Wrembel, R. (2020). On Integrating and Classifying Legal Text Documents. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-59003-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)