Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3462757.3466105acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task

Published: 27 July 2021 Publication History

Abstract

The Competition on Legal Information Extraction/Entailment (COLIEE) statute law legal textual entailment task (task 4) is a task to make a system judge whether a given question statement is true or not by provided articles. In the last COLIEE 2020, the best performance system used bidirectional encoder representations from transformers (BERT), a deep-learning-based natural language processing tool for handling word semantics by considering their context. However, there are problems related to the small amount of training data and the variability of the questions. In this paper, we propose a BERT-based ensemble method with data augmentation to solve this problem. For the data augmentation, we propose a systematic method to make training data for understanding the syntactic structure of the questions and articles for entailment. In addition, due to the nature of the non-deterministic characteristics of BERT fine-tuning and the variability of the questions, we propose a method to construct multiple BERT fine-tuning models and select an appropriate set of models for ensemble. The accuracy of our proposed method for task 4 was 0.7037, which was the best performance among all submissions.

References

[1]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.
[2]
Richard Evans, David Saxton, David Amos, Pushmeet Kohli, and Edward Grefenstette. 2018. Can Neural Networks Understand Logical Entailment?. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. htttps://openreview.net/forum?id=SkZxCk-0Z
[3]
Yoshinobu Kano, Mi-Young Kim, Randy Goebel, and Ken Satoh. 2017. Overview of COLIEE 2017. In COLIEE 2017. 4th Competition on Legal Information Extraction and Entailment (EPiC Series in Computing, Vol. 47), Ken Satoh, Mi-Young Kim, Yoshinobu Kano, Randy Goebel, and Tiago Oliveira (Eds.). EasyChair, 1--8.
[4]
Mi-Young Kim, Randy Goebel, Yoshinobu Kano, and Ken Satoh. 2016. COLIEE-2016: Evaluation of the Competition on Legal Information Extraction and Entailment. In The Proceedings of the 10th International Workshop on Juris-Informatics (JURISIN2016). Paper 11.
[5]
Mi-Young Kim, Ying Xu, and Randy Goebel. 2017. Applying a Convolutional Neural Network to Legal Question Answering. In New Frontiers in Artificial Intelligence, Mihoko Otake, Setsuya Kurahashi, Yuiko Ota, Ken Satoh, and Daisuke Bekki (Eds.). Springer International Publishing, Cham, 282--294.
[6]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[7]
George A. Miller. 1995. WordNet: A Lexical Database for English. Commun. ACM 38, 11 (1995), 39--41.
[8]
Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, and Tal Linzen. 2020. Syntactic Data Augmentation Increases Robustness to Inference Heuristics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2339--2352.
[9]
Ha-Thanh Nguyen, Hai-Yen Thi Vuong, Phuong Minh Nguyen, Binh Tran Dang, Quan Minh Bui, Sinh Trong Vu, Chau Minh Nguyen, Vu Tran, Ken Satoh, and Minh Le Nguyen. 2020. JNLP Team: Deep Learning for Legal Processing. In The Proceedings of the 14th International Workshop on Juris-Informatics (JURISIN2020). The Japanese Society of Artificial Intelligence, 195--208.
[10]
Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. COLIEE2020:Methods for Legal Document Retrieval and Entailment. In The Proceedings of the 14th International Workshop on Juris-Informatics (JURISIN2020). The Japanese Society of Artificial Intelligence, 114--127.
[11]
Juliano Rabelo, Mi-Young Kim, Randy Goebel, Masaharu Yoshioka, Yoshinobu Kano, and Ken Satoh. 2020. A Summary of the COLIEE 2019 Competition. In New Frontiers in Artificial Intelligence, Maki Sakamoto, Naoaki Okazaki, Koji Mineshima, and Ken Satoh (Eds.). Springer International Publishing, Cham, 34--49.
[12]
Hsuan-Lei Shao, Yi-Chia Chen, and Sieh-Chuen Huang. 2020. BERT-based Ensemble Model for The Statute Law Retrieval and Legal Information Entailment. In The Proceedings of the 14th International Workshop on Juris-Informatics (JURISIN2020). The Japanese Society of Artificial Intelligence, 223--234.
[13]
Connor Shorten and T. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 6 (2019), 1--48.
[14]
Ryosuke Taniguchi, Reina Hoshino, and Yoshinobu Kano. 2019. Legal Question Answering System Using FrameNet. In New Frontiers in Artificial Intelligence, Kazuhiro Kojima, Maki Sakamoto, Koji Mineshima, and Ken Satoh (Eds.). Springer International Publishing, Cham, 193--206.
[15]
Masaharu Yoshioka, Yoshinobu Kano, Naoki Kiyota, and Ken Satoh. 2018. Overview of Japanese Statute Law Retrieval and Entailment Task at COLIEE-2018. In The Proceedings of the 12th International Workshop on Juris-Informatics (JURISIN2018). The Japanese Society of Artificial Intelligence, 117--128.

Cited By

View all
  • (2024)Software Engineering Methods for AI-Driven Deductive Legal ReasoningProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3690050(85-95)Online publication date: 17-Oct-2024
  • (2024)Legal Natural Language Processing From 2015 to 2022: A Comprehensive Systematic Mapping Study of Advances and ApplicationsIEEE Access10.1109/ACCESS.2023.333394612(145286-145317)Online publication date: 2024
  • (2024)Recognizing textual entailment: A review of resources, approaches, applications, and challengesICT Express10.1016/j.icte.2023.08.01210:1(132-155)Online publication date: Feb-2024
  • Show More Cited By

Index Terms

  1. BERT-based ensemble methods with data augmentation for legal textual entailment in COLIEE statute law task

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICAIL '21: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law
      June 2021
      319 pages
      ISBN:9781450385268
      DOI:10.1145/3462757
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 July 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. BERT
      2. data augmentation
      3. ensemble method
      4. textual entailment

      Qualifiers

      • Research-article

      Funding Sources

      • JSPS Kakenhi

      Conference

      ICAIL '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 69 of 169 submissions, 41%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Software Engineering Methods for AI-Driven Deductive Legal ReasoningProceedings of the 2024 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3689492.3690050(85-95)Online publication date: 17-Oct-2024
      • (2024)Legal Natural Language Processing From 2015 to 2022: A Comprehensive Systematic Mapping Study of Advances and ApplicationsIEEE Access10.1109/ACCESS.2023.333394612(145286-145317)Online publication date: 2024
      • (2024)Recognizing textual entailment: A review of resources, approaches, applications, and challengesICT Express10.1016/j.icte.2023.08.01210:1(132-155)Online publication date: Feb-2024
      • (2024)Data Augmentation and Large Language Model for Legal Case Retrieval and EntailmentThe Review of Socionetwork Strategies10.1007/s12626-024-00158-218:1(49-74)Online publication date: 26-Mar-2024
      • (2023)NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment2023 15th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE59128.2023.10299515(1-6)Online publication date: 18-Oct-2023
      • (2023)Towards Fine-Grained Localization of Privacy Behaviors2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP57164.2023.00024(258-277)Online publication date: Jul-2023
      • (2023)A survey on legal question–answering systemsComputer Science Review10.1016/j.cosrev.2023.10055248:COnline publication date: 1-May-2023
      • (2023)Bringing order into the realm of Transformer-based language models for artificial intelligence and lawArtificial Intelligence and Law10.1007/s10506-023-09374-732:4(863-1010)Online publication date: 20-Nov-2023
      • (2023)Legal Textual Entailment Using Ensemble of Rule-Based and BERT-Based Method with Data Augmentation by Related Article GenerationNew Frontiers in Artificial Intelligence10.1007/978-3-031-29168-5_10(138-153)Online publication date: 6-Apr-2023
      • (2022)Overview and Discussion of the Competition on Legal Information Extraction/Entailment (COLIEE) 2021The Review of Socionetwork Strategies10.1007/s12626-022-00105-z16:1(111-133)Online publication date: 15-Feb-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media