Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3462757.3466149acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article
Open access

Lex Rosetta: transfer of predictive models across languages, jurisdictions, and legal domains

Published: 27 July 2021 Publication History

Abstract

In this paper, we examine the use of multi-lingual sentence embeddings to transfer predictive models for functional segmentation of adjudicatory decisions across jurisdictions, legal systems (common and civil law), languages, and domains (i.e. contexts). Mechanisms for utilizing linguistic resources outside of their original context have significant potential benefits in AI & Law because differences between legal systems, languages, or traditions often block wider adoption of research outcomes. We analyze the use of Language-Agnostic Sentence Representations in sequence labeling models using Gated Recurrent Units (GRUs) that are transferable across languages. To investigate transfer between different contexts we developed an annotation scheme for functional segmentation of adjudicatory decisions. We found that models generalize beyond the contexts on which they were trained (e.g., a model trained on administrative decisions from the US can be applied to criminal law decisions from Italy). Further, we found that training the models on multiple contexts increases robustness and improves overall performance when evaluating on previously unseen contexts. Finally, we found that pooling the training data from all the contexts enhances the models' in-context performance.

References

[1]
Tommaso Agnoloni, Lorenzo Bacci, Enrico Francesconi, P Spinosa, Daniela Tiscornia, Simonetta Montemagni, and Giulia Venturi. 2007. Building an ontological support for multilingual legislative drafting. Frontiers in Artificial Intelligence and Applications 165 (2007), 9.
[2]
Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively Multilingual Neural Machine Translation. In NAACL-HLT, Vol. 1 (Long and Short Papers). 3874--3884.
[3]
Gianmaria Ajani, Guido Boella, Luigi Di Caro, Livio Robaldo, Llio Humphreys, Sabrina Praduroux, Piercarlo Rossi, and Andrea Violato. 2016. The European Legal Taxonomy Syllabus: A multi-lingual, multi-level ontology framework to untangle the web of European legal terminology. Applied Ontology 11, 4 (2016).
[4]
Sai Saket Aluru, Binny Mathew, Punyajoy Saha, and Animesh Mukherjee. 2020. Deep learning models for multilingual hate speech detection. arXiv preprint arXiv:2004.06465 (2020).
[5]
Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics 7 (2019), 597--610.
[6]
Paheli Bhattacharya, Shounak Paul, Kripabandhu Ghosh, Saptarshi Ghosh, and Adam Wyner. 2019. Identification of rhetorical roles of sentences in Indian legal judgments. In JURIX 2019, Vol. 322. IOS Press, 3.
[7]
Guido Boella, Luigi Di Caro, Michele Graziadei, Loredana Cupi, Carlo Emilio Salaroglio, Llio Humphreys, Hristo Konstantinov, Kornel Marko, Livio Robaldo, Claudio Ruffini, et al. 2015. Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In ICAIL 2015. 171--175.
[8]
Paul Boniol, George Panagopoulos, Christos Xypolopoulos, Rajaa El Hamdani, David Restrepo Amariles, and Michalis Vazirgiannis. 2020. Performance in the Courtroom: Automated Processing and Visualization of Appeal Court Decisions in France. In Proceedings of the Natural Legal Language Processing Workshop 2020.
[9]
Karl Branting, Brandy Weiss, Bradford Brown, Craig Pfeifer, A Chakraborty, Lisa Ferro, M Pfaff, and A Yeh. 2019. Semi-supervised methods for explainable legal prediction. In ICAIL 2019. 22--31.
[10]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP 2014.
[11]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In ACL 2020. 8440--8451.
[12]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research 7, Jan (2006), 1--30.
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL 2019, Volume 1 (Long and Short Papers). 4171--4186.
[14]
Luca Dini, Wim Peters, Doris Liebwald, Erich Schweighofer, Laurens Mommers, and Wim Voermans. 2005. Cross-lingual legal information retrieval using a WordNet architecture. In ICAIL 2005. 163--167.
[15]
Atefeh Farzindar and Guy Lapalme. 2004. LetSum, an Automatic Text Summarization system in Law field. JURIX 2004.
[16]
Jorge González-Conejero, Pompeu Casanovas, and Emma Teodoro. 2018. Business Requirements for Legal Knowledge Graph: the LYNX Platform. In TERECOM@JURIX 2018. 31--38.
[17]
Jakub Harašta, Jaromír Šavelka, František Kasl, and Jakub Míšek. 2019. Automatic Segmentation of Czech Court Decisions into Multi-Paragraph Parts. Jusletter IT 4, M (2019).
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[20]
Philipp Koehn, Alexandra Birch, and Ralf Steinberger. 2009. 462 Machine Translation Systems for Europe. In Proceedings of the Twelfth Machine Translation Summit. Association for Machine Translation in the Americas, 65--72.
[21]
Guokun Lai, Barlas Oguz, Yiming Yang, and Veselin Stoyanov. 2019. Bridging the domain gap in cross-lingual document classification. arXiv preprint arXiv:1909.07009 (2019).
[22]
Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2020. MLQA: Evaluating Cross-lingual Extractive Question Answering. In ACL 2020. 7315--7330.
[23]
D.N. MacCormick, R.S. Summers, and A.L. Goodhart. 2016. Interpreting Precedents A Comparative Study. Taylor & Francis.
[24]
Rohan Nanda, Llio Humphreys, Lorenzo Grossio, and Adebayo Kolawole John. 2020. Multilingual Legal Information Retrieval System for Mapping Recitals and Normative Provisions. In Proceedings of Jurix 2020. IOS Press, 123--132.
[25]
Alina Petrova, John Armour, and Thomas Lukasiewicz. 2020. Extracting Outcomes from Appellate Decisions in US State Courts. In JURIX 2020. 133.
[26]
Jaromír Šavelka and Kevin D Ashley. 2015. Transfer of predictive models for classification of statutory texts in multi-jurisdictional settings. In ICAIL 2015. 216--220.
[27]
Jaromir Savelka and Kevin D Ashley. 2018. Segmenting US Court Decisions into Functional and Issue Specific Parts. In JURIX 2018. 111--120.
[28]
Jaromir Šavelka, Hannes Westermann, and Karim Benyekhlef. 2020. Cross-Domain Generalization and Knowledge Transfer in Transformers Trained on Legal Data. In ASAIL@ JURIX 2020.
[29]
Párai Sheridan, Martin Braschlert, and Peter Schauble. 1997. Cross-language information retrieval in a Multilingual Legal Domain. In International Conference on Theory and Practice of Digital Libraries. Springer, 253--268.
[30]
Ralf Steinberger, Mohamed Ebrahim, Alexandros Poulis, Manual Carrasco-Benitez, Patrick Schluter, Marek Przybyszewski, and Signe Gilbro. 2014. An overview of the European Union's highly multilingual parallel corpora. Language Resources and Evaluation 48, 4 (2014), 679--707.
[31]
Kyoko Sugisaki, Martin Volk, Rodrigo Polanco, Wolfgang Alschner, and Dmitriy Skougarevskiy. 2016. Building a Corpus of Multi-lingual and Multi-format International Investment Agreements. In JURIX 2016.
[32]
Linyuan Tang and Kyo Kageura. 2019. Verifying Meaning Equivalence in Bilingual International Treaties. In JURIX 2019. 103--112.
[33]
Vern R Walker, Krishnan Pillaipakkamnatt, Alexandra M Davidson, Marysa Linares, and Domenick J Pesce. 2019. Automatic Classification of Rhetorical Roles for Sentences: Comparing Rule-Based Scripts with Machine Learning. In ASAIL@ ICAIL 2019.
[34]
Hannes Westermann, Jaromír Šavelka, and Karim Benyekhlef. 2021. Paragraph Similarity Scoring and Fine-Tuned BERT for Legal Information Retrieval and Entailment. In New Frontiers in Artificial Intelligence (Lecture Notes in Computer Science). Springer International Publishing.
[35]
Hannes Westermann, Jaromír Šavelka, Vern R Walker, Kevin D Ashley, and Karim Benyekhlef. 2019. Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain. In JURIX 2019, Vol. 322. IOS Press, 123.
[36]
Hannes Westermann, Jaromír Šavelka, Vern R Walker, Kevin D Ashley, and Karim Benyekhlef. 2020. Sentence Embeddings and High-Speed Similarity Search for Fast Computer Assisted Annotation of Legal Documents. In JURIX 2020, Vol. 334. IOS Press, 164.
[37]
Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 196--202.
[38]
Huihui Xu, Jaromír Šavelka, and Kevin D Ashley. 2020. Using Argument Mining for Legal Text Summarization. In JURIX 2020, Vol. 334. IOS Press.
[39]
Shudong Yang, Xueying Yu, and Ying Zhou. 2020. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In IWECAI 2020. IEEE, 98--101.
[40]
Vladimir Zhebel, Denis Zubarev, and Ilya Sochenkov. 2020. Different Approaches in Cross-Language Similar Documents Retrieval in the Legal Domain. In International Conference on Speech and Computer. Springer, 679--686.
[41]
Linwu Zhong, Ziyi Zhong, Zinian Zhao, Siyuan Wang, Kevin D Ashley, and Matthias Grabmair. 2019. Automatic summarization of legal decisions using iterative masking of predictive sentences. In ICAIL 2019. 163--172.

Cited By

View all
  • (2024)Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factorsPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2023.0155382:2270Online publication date: 26-Feb-2024
  • (2023)The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal textsFrontiers in Artificial Intelligence10.3389/frai.2023.12797946Online publication date: 17-Nov-2023
  • (2023)Incorporating Structural Information into Legal Case RetrievalACM Transactions on Information Systems10.1145/360979642:2(1-28)Online publication date: 8-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
June 2021
319 pages
ISBN:9781450385268
DOI:10.1145/3462757
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adjudicatory decisions
  2. annotation
  3. document segmentation
  4. domain adaptation
  5. multi-lingual sentence embeddings
  6. transfer learning

Qualifiers

  • Research-article

Conference

ICAIL '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 69 of 169 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)16
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Empirical legal analysis simplified: reducing complexity through automatic identification and evaluation of legally relevant factorsPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2023.0155382:2270Online publication date: 26-Feb-2024
  • (2023)The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal textsFrontiers in Artificial Intelligence10.3389/frai.2023.12797946Online publication date: 17-Nov-2023
  • (2023)Incorporating Structural Information into Legal Case RetrievalACM Transactions on Information Systems10.1145/360979642:2(1-28)Online publication date: 8-Nov-2023
  • (2023)Understanding Relevance Judgments in Legal Case RetrievalACM Transactions on Information Systems10.1145/356992941:3(1-32)Online publication date: 7-Feb-2023
  • (2023)Exploring Approaches to Optimize the Performance of Predictive Coding on Multilanguage Data Sets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386533(1774-1781)Online publication date: 15-Dec-2023
  • (2023)Bringing order into the realm of Transformer-based language models for artificial intelligence and lawArtificial Intelligence and Law10.1007/s10506-023-09374-7Online publication date: 20-Nov-2023
  • (2023)Ant: a process aware annotation software for regulatory complianceArtificial Intelligence and Law10.1007/s10506-023-09372-9Online publication date: 9-Aug-2023
  • (2022)ARTIFICIAL INTELLIGENCE AND THE NEW CHALLENGES FOR EU LEGISLATIONYıldırım Beyazıt Hukuk Dergisi10.33432/ybuhukuk.1104344Online publication date: 19-Aug-2022
  • (2022)The winter, the summer and the summer dream of artificial intelligence in lawArtificial Intelligence and Law10.1007/s10506-022-09309-830:2(147-161)Online publication date: 1-Jun-2022
  • (2021)DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documentsArtificial Intelligence and Law10.1007/s10506-021-09304-531:1(53-90)Online publication date: 13-Nov-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media