research-article

Open access

Lex Rosetta: transfer of predictive models across languages, jurisdictions, and legal domains

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

Pages 129 - 138

https://doi.org/10.1145/3462757.3466149

Published: 27 July 2021 Publication History

Abstract

In this paper, we examine the use of multi-lingual sentence embeddings to transfer predictive models for functional segmentation of adjudicatory decisions across jurisdictions, legal systems (common and civil law), languages, and domains (i.e. contexts). Mechanisms for utilizing linguistic resources outside of their original context have significant potential benefits in AI & Law because differences between legal systems, languages, or traditions often block wider adoption of research outcomes. We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages. To investigate transfer between different contexts we developed an annotation scheme for functional segmentation of adjudicatory decisions. We found that models generalize beyond the contexts on which they were trained (e.g., a model trained on administrative decisions from the US can be applied to criminal law decisions from Italy). Further, we found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts. Finally, we found that pooling the training data from all the contexts enhances the models' in-context performance.

References

[1]

Tommaso Agnoloni, Lorenzo Bacci, Enrico Francesconi, P Spinosa, Daniela Tiscornia, Simonetta Montemagni, and Giulia Venturi. 2007. Building an ontological support for multilingual legislative drafting. Frontiers in Artificial Intelligence and Applications 165 (2007), 9.

[2]

Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively Multilingual Neural Machine Translation. In NAACL-HLT, Vol. 1 (Long and Short Papers). 3874--3884.

[3]

Gianmaria Ajani, Guido Boella, Luigi Di Caro, Livio Robaldo, Llio Humphreys, Sabrina Praduroux, Piercarlo Rossi, and Andrea Violato. 2016. The European Legal Taxonomy Syllabus: A multi-lingual, multi-level ontology framework to untangle the web of European legal terminology. Applied Ontology 11, 4 (2016).

[4]

Sai Saket Aluru, Binny Mathew, Punyajoy Saha, and Animesh Mukherjee. 2020. Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465 (2020).

[5]

Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics 7 (2019), 597--610.

[6]

Paheli Bhattacharya, Shounak Paul, Kripabandhu Ghosh, Saptarshi Ghosh, and Adam Wyner. 2019. Identification of rhetorical roles of sentences in Indian legal judgments. In JURIX 2019, Vol. 322. IOS Press, 3.

[7]

Guido Boella, Luigi Di Caro, Michele Graziadei, Loredana Cupi, Carlo Emilio Salaroglio, Llio Humphreys, Hristo Konstantinov, Kornel Marko, Livio Robaldo, Claudio Ruffini, et al. 2015. Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In ICAIL 2015. 171--175.

Digital Library

[8]

Paul Boniol, George Panagopoulos, Christos Xypolopoulos, Rajaa El Hamdani, David Restrepo Amariles, and Michalis Vazirgiannis. 2020. Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France. In Proceedings of the Natural Legal Language Processing Workshop 2020.

[9]

Karl Branting, Brandy Weiss, Bradford Brown, Craig Pfeifer, A Chakraborty, Lisa Ferro, M Pfaff, and A Yeh. 2019. Semi-supervised methods for explainable legal prediction. In ICAIL 2019. 22--31.

Digital Library

[10]

Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP 2014.

[11]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In ACL 2020. 8440--8451.

[12]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research 7, Jan (2006), 1--30.

Digital Library

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019, Volume 1 (Long and Short Papers). 4171--4186.

[14]

Luca Dini, Wim Peters, Doris Liebwald, Erich Schweighofer, Laurens Mommers, and Wim Voermans. 2005. Cross-lingual legal information retrieval using a WordNet architecture. In ICAIL 2005. 163--167.

Digital Library

[15]

Atefeh Farzindar and Guy Lapalme. 2004. LetSum, an Automatic Text Summarization system in Law field. JURIX 2004.

[16]

Jorge González-Conejero, Pompeu Casanovas, and Emma Teodoro. 2018. Business Requirements for Legal Knowledge Graph: the LYNX Platform. In TERECOM@JURIX 2018. 31--38.

[17]

Jakub Harašta, Jaromír Šavelka, František Kasl, and Jakub Míšek. 2019. Automatic Segmentation of Czech Court Decisions into Multi-Paragraph Parts. Jusletter IT 4, M (2019).

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[19]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[20]

Philipp Koehn, Alexandra Birch, and Ralf Steinberger. 2009. 462 Machine Translation Systems for Europe. In Proceedings of the Twelfth Machine Translation Summit. Association for Machine Translation in the Americas, 65--72.

[21]

Guokun Lai, Barlas Oguz, Yiming Yang, and Veselin Stoyanov. 2019. Bridging the domain gap in cross-lingual document classification. arXiv preprint arXiv:1909.07009 (2019).

[22]

Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2020. MLQA: Evaluating Cross-lingual Extractive Question Answering. In ACL 2020. 7315--7330.

[23]

D.N. MacCormick, R.S. Summers, and A.L. Goodhart. 2016. Interpreting Precedents A Comparative Study. Taylor & Francis.

[24]

Rohan Nanda, Llio Humphreys, Lorenzo Grossio, and Adebayo Kolawole John. 2020. Multilingual Legal Information Retrieval System for Mapping Recitals and Normative Provisions. In Proceedings of Jurix 2020. IOS Press, 123--132.

[25]

Alina Petrova, John Armour, and Thomas Lukasiewicz. 2020. Extracting Outcomes from Appellate Decisions in US State Courts. In JURIX 2020. 133.

[26]

Jaromír Šavelka and Kevin D Ashley. 2015. Transfer of predictive models for classification of statutory texts in multi-jurisdictional settings. In ICAIL 2015. 216--220.

Digital Library

[27]

Jaromir Savelka and Kevin D Ashley. 2018. Segmenting US Court Decisions into Functional and Issue Specific Parts. In JURIX 2018. 111--120.

[28]

Jaromir Šavelka, Hannes Westermann, and Karim Benyekhlef. 2020. Cross-Domain Generalization and Knowledge Transfer in Transformers Trained on Legal Data. In ASAIL@ JURIX 2020.

[29]

Párai Sheridan, Martin Braschlert, and Peter Schauble. 1997. Cross-language information retrieval in a Multilingual Legal Domain. In International Conference on Theory and Practice of Digital Libraries. Springer, 253--268.

[30]

Ralf Steinberger, Mohamed Ebrahim, Alexandros Poulis, Manual Carrasco-Benitez, Patrick Schluter, Marek Przybyszewski, and Signe Gilbro. 2014. An overview of the European Union's highly multilingual parallel corpora. Language Resources and Evaluation 48, 4 (2014), 679--707.

Digital Library

[31]

Kyoko Sugisaki, Martin Volk, Rodrigo Polanco, Wolfgang Alschner, and Dmitriy Skougarevskiy. 2016. Building a Corpus of Multi-lingual and Multi-format International Investment Agreements. In JURIX 2016.

[32]

Linyuan Tang and Kyo Kageura. 2019. Verifying Meaning Equivalence in Bilingual International Treaties. In JURIX 2019. 103--112.

[33]

Vern R Walker, Krishnan Pillaipakkamnatt, Alexandra M Davidson, Marysa Linares, and Domenick J Pesce. 2019. Automatic Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with Machine Learning. In ASAIL@ ICAIL 2019.

[34]

Hannes Westermann, Jaromír Šavelka, and Karim Benyekhlef. 2021. Paragraph Similarity Scoring and Fine-Tuned BERT for Legal Information Retrieval and Entailment. In New Frontiers in Artificial Intelligence (Lecture Notes in Computer Science). Springer International Publishing.

[35]

Hannes Westermann, Jaromír Šavelka, Vern R Walker, Kevin D Ashley, and Karim Benyekhlef. 2019. Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain. In JURIX 2019, Vol. 322. IOS Press, 123.

[36]

Hannes Westermann, Jaromír Šavelka, Vern R Walker, Kevin D Ashley, and Karim Benyekhlef. 2020. Sentence Embeddings and High-Speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents. In JURIX 2020, Vol. 334. IOS Press, 164.

[37]

Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 196--202.

[38]

Huihui Xu, Jaromír Šavelka, and Kevin D Ashley. 2020. Using Argument Mining for Legal Text Summarization. In JURIX 2020, Vol. 334. IOS Press.

[39]

Shudong Yang, Xueying Yu, and Ying Zhou. 2020. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In IWECAI 2020. IEEE, 98--101.

[40]

Vladimir Zhebel, Denis Zubarev, and Ilya Sochenkov. 2020. Different Approaches in Cross-Language Similar Documents Retrieval in the Legal Domain. In International Conference on Speech and Computer. Springer, 679--686.

[41]

Linwu Zhong, Ziyi Zhong, Zinian Zhao, Siyuan Wang, Kevin D Ashley, and Matthias Grabmair. 2019. Automatic summarization of legal decisions using iterative masking of predictive sentences. In ICAIL 2019. 163--172.

Digital Library

Cited By

Gray MSavelka JOliver WAshley K(2024)Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factorsPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2023.0155382:2270Online publication date: 26-Feb-2024
https://doi.org/10.1098/rsta.2023.0155
Savelka JAshley K(2023)The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal textsFrontiers in Artificial Intelligence10.3389/frai.2023.12797946Online publication date: 17-Nov-2023
https://doi.org/10.3389/frai.2023.1279794
Ma YWu YAi QLiu YShao YZhang MMa S(2023)Incorporating Structural Information into Legal Case RetrievalACM Transactions on Information Systems10.1145/360979642:2(1-28)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3609796
Show More Cited By

Index Terms

Lex Rosetta: transfer of predictive models across languages, jurisdictions, and legal domains
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Annotation
  2. Law, social and behavioral sciences
    1. Law
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Document structure
    2. Specialized information retrieval
      1. Structure and multilingual text search
  2. Information systems applications
    1. Data mining

Recommendations

Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts
ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

We evaluated the capability of a state-of-the-art generative pretrained transformer (GPT) model to perform semantic annotation of short text snippets (one to few sentences) coming from legal documents of various types. Discussions of potential uses (e.g.,...
UnseenNet: Fast Training Detector for Unseen Concepts with No Bounding Boxes
Image and Vision Computing
Abstract
Training of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have ...
LMDT

Performance of handwritten character recognition systems degrades significantly when they are trained and tested on different databases. In this paper, we propose a novel large margin domain transfer algorithm, which is able to jointly reduce the data ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

June 2021

319 pages

ISBN:9781450385268

DOI:10.1145/3462757

Conference Chair:
Juliano Maranhão
University of São Paulo, Brazil
,
Program Chair:
Adam Zachary Wyner
Swansea University, United Kingdom

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICAIL '21

Sponsor:

SIGAI

ICAIL '21: Eighteenth International Conference for Artificial Intelligence and Law

June 21 - 25, 2021

São Paulo, Brazil

Acceptance Rates

Overall Acceptance Rate 69 of 169 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
624
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)16

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gray MSavelka JOliver WAshley K(2024)Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factorsPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2023.0155382:2270Online publication date: 26-Feb-2024
https://doi.org/10.1098/rsta.2023.0155
Savelka JAshley K(2023)The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal textsFrontiers in Artificial Intelligence10.3389/frai.2023.12797946Online publication date: 17-Nov-2023
https://doi.org/10.3389/frai.2023.1279794
Ma YWu YAi QLiu YShao YZhang MMa S(2023)Incorporating Structural Information into Legal Case RetrievalACM Transactions on Information Systems10.1145/360979642:2(1-28)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3609796
Shao YWu YLiu YMao JMa S(2023)Understanding Relevance Judgments in Legal Case RetrievalACM Transactions on Information Systems10.1145/356992941:3(1-32)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1145/3569929
Mahoney CHuber-Fliflet NGronvall PClark CZhang JWei FMao Q(2023)Exploring Approaches to Optimize the Performance of Predictive Coding on Multilanguage Data Sets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386533(1774-1781)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386533
Greco CTagarelli A(2023)Bringing order into the realm of Transformer-based language models for artificial intelligence and lawArtificial Intelligence and Law10.1007/s10506-023-09374-7Online publication date: 20-Nov-2023
https://doi.org/10.1007/s10506-023-09374-7
Gyory RRestrepo Amariles DLewkowicz GBersini H(2023)Ant: a process aware annotation software for regulatory complianceArtificial Intelligence and Law10.1007/s10506-023-09372-9Online publication date: 9-Aug-2023
https://doi.org/10.1007/s10506-023-09372-9
GÜNEŞ PESCHKE SPESCHKE L(2022)ARTIFICIAL INTELLIGENCE AND THE NEW CHALLENGES FOR EU LEGISLATIONYıldırım Beyazıt Hukuk Dergisi10.33432/ybuhukuk.1104344Online publication date: 19-Aug-2022
https://doi.org/10.33432/ybuhukuk.1104344
Francesconi E(2022)The winter, the summer and the summer dream of artificial intelligence in lawArtificial Intelligence and Law10.1007/s10506-022-09309-830:2(147-161)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s10506-022-09309-8
Bhattacharya PPaul SGhosh KGhosh SWyner A(2021)DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documentsArtificial Intelligence and Law10.1007/s10506-021-09304-531:1(53-90)Online publication date: 13-Nov-2021
https://dl.acm.org/doi/10.1007/s10506-021-09304-5

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents