Abstract
Event extraction (EE) is capable of providing essential information to facilitate comprehension of legal cases by identifying event types and extracting corresponding arguments from legal case documents. In the legal field, events are often presented in the form of document, with arguments scattered across multiple sentences, which means that legal EE at the document level is needed to better capture the complete event. However, the existing legal EE datasets mainly focused on event extraction at the sentence level, with little attention given to the document level. Obviously, it put the development of document-level event extraction (DEE) in the legal field at a disadvantage. To address this challenge, we proposed DLEE, the first DEE dataset in the legal field with two distinctive features: (1) Document-level Semi-automated Annotation, ensuring effective annotation with high quality. (2) Large-scale and Fine-grained coverage, comprising 10,014 events and 99,423 arguments. Finally, we assessed the performance of commonly used DEE baseline models on DLEE. It revealed that the DLEE is an open question, and further attention is needed for the improvement of the models’ performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Details of our dataset can be found online at https://anonymous.4open.science/r/DLEE-DATA/README.md. The dataset is available on request.
References
Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ACE) program—tasks, data, and evaluation. In: Proceedings of the fourth international conference on language resources and evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf
Gao L, Wu J, Qiao Z, Zhou C, Yang H, Hu Y (2016) Collaborative social group influence for event recommendation. In: Proceedings of the 25th ACM international on conference on information and knowledge management. CIKM ’16, pp 1941–1944. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2983323.2983879
Man Duc Trong H, Trong Le D, Pouran Ben Veyseh A, Nguyen T, Nguyen TH (2020) Introducing a new dataset for event detection in cybersecurity texts. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 5381–5390. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.433
Du L, Ding X, Xiong K, Liu T, Qin B (2021) ExCAR: Event graph knowledge enhanced explainable causal reasoning. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2354–2363. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.183
Souza Costa T, Gottschalk S, Demidova E (2020) Event-qa: A dataset for event-centric question answering over knowledge graphs. In: Proceedings of the 29th ACM international conference on information & knowledge management. CIKM ’20, pp 3157–3164. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3340531.3412760
Sims M, Park JH, Bamman D (2019) Literary event detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3623–3634. Association for Computational Linguistics, Florence, Italy. https://aclanthology.org/P19-1353
Lai VD, Nguyen MV, Kaufman H, Nguyen TH (2021) Event extraction from historical texts: a new dataset for black rebellions. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp. 2390–2400. Association for Computational Linguistics, Online. https://aclanthology.org/2021.findings-acl.211
Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 167–176. Association for Computational Linguistics, Beijing, China. https://aclanthology.org/P15-1017
Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: a summary of legal artificial intelligence. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5218–5230. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.466
Yao F, Xiao C, Wang X, Liu Z, Hou L, Tu C, Li J, Liu Y, Shen W, Sun M (2022) LEVEN: A large-scale Chinese legal event detection dataset. In: Findings of the association for computational linguistics: ACL 2022, pp. 183–201. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.17
Feng Y, Li C, Ng V (2022) Legal judgment prediction via event extraction with constraints. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 648–664. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.acl-long.48
Li C, Sheng Y, Ge J, Luo B (2019) Apply event extraction techniques to the judicial field. In: Adjunct proceedings of the 2019 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2019 ACM international symposium on wearable computers. UbiComp/ISWC ’19 Adjunct, pp. 492–497. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341162.3345608
Shen S, Qi G, Li Z, Bi S, Wang L (2020) Hierarchical Chinese legal event extraction via pedal attention mechanism. In: Proceedings of the 28th international conference on computational linguistics, pp 100–113. International Committee on Computational Linguistics, Barcelona, Spain (Online). https://aclanthology.org/2020.coling-main.9
Li Q, Zhang Q, Yao J, Zhang Y (2020) Event extraction for criminal legal text. In: 2020 IEEE international conference on knowledge graph (ICKG), pp 573–580. https://doi.org/10.1109/ICBK50248.2020.00086
Ma Y, Shao Y, Wu Y, Liu Y, Zhang R, Zhang M, Ma S (2021) Lecard: a legal case retrieval dataset for Chinese law system. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’21, pp 2342–2348. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3404835.3463250
Grishman R, Sundheim B (1996) Message Understanding Conference- 6: a brief history. In: COLING 1996 Volume 1: The 16th international conference on computational linguistics. https://aclanthology.org/C96-1079
Mitamura T, Liu Z, Hovy EH (2015) Overview of tac kbp 2015 event nugget track. Theory and Applications of Categories
Liu J, Chen Y, Liu K, Bi W, Liu X (2020) Event extraction as machine reading comprehension. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 1641–1651. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.128
Wang S, Yu M, Chang S, Sun L, Huang L (2022) Query and extract: refining event extraction as type-oriented binary decoding. In: Findings of the association for computational linguistics: ACL 2022, pp 169–182. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.16
Liu S, Li Y, Zhang F, Yang T, Zhou X (2019) Event detection without triggers. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 735–744. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1080
Tong M, Xu B, Wang S, Cao Y, Hou L, Li J, Xie J (2020) Improving event detection via open-domain trigger knowledge. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5887–5897. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.522
Ahn D (2006) The stages of event extraction. In: Proceedings of the workshop on annotating and reasoning about time and events, pp 1–8. Association for Computational Linguistics, Sydney, Australia. https://aclanthology.org/W06-0901
Gupta P, Ji H (2009) Predicting unknown time arguments based on cross-event propagation. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 369–372. Association for Computational Linguistics, Suntec, Singapore. https://aclanthology.org/P09-2093
Yang H, Chen Y, Liu K, Xiao Y, Zhao J (2018) DCFEE: a document-level Chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, system demonstrations, pp 50–55. Association for Computational Linguistics, Melbourne, Australia. https://aclanthology.org/P18-4009
Zheng S, Cao W, Xu W, Bian J (2019) Doc2EDAG: an end-to-end document-level framework for Chinese financial event extraction. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 337–346. Association for Computational Linguistics, Hong Kong, China https://aclanthology.org/D19-1032
Xu R, Liu T, Li L, Chang B (2021) Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 3533–3546. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.274
Liang Y, Jiang Z, Yin D, Ren B (2022) RAAT: relation-augmented attention transformer for relation modeling in document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4985–4997. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.367
Han C, Zhang J, Li X, Xu G, Peng W, Zeng Z (2022) Duee-fin: a large-scale dataset for document-level event extraction. In: Natural language processing and Chinese computing: 11th CCF international conference, NLPCC 2022, Guilin, China, September 24–25, 2022, proceedings, Part I, pp 172–183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-031-17120-8_14
McLean V (1992) Fourth message understanding conference (MUC-4). https://aclanthology.org/M92-1000
Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B (2020) Multi-sentence argument linking. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8057–8077. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.718
Li S, Ji H, Han J (2021) Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 894–908. Association for Computational Linguistics, Online. https://aclanthology.org/2021.naacl-main.69
Tong M, Xu B, Wang S, Han M, Cao Y, Zhu J, Chen S, Hou L, Li J (2022) DocEE: a large-scale and fine-grained benchmark for document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 3970–3982. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.291
Yang T-H, Huang H-H, Yen A-Z, Chen H-H (2018) Transfer of frames from English FrameNet to construct Chinese FrameNet: a bilingual corpus-based approach. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1139
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: 36th Annual Meeting of the Association for computational linguistics and 17th international conference on computational linguistics, Volume 1, pp 86–90. Association for Computational Linguistics, Montreal, Quebec, Canada. https://aclanthology.org/P98-1013
Artstein R, Poesio M (2008) Survey article: inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596. https://doi.org/10.1162/coli.07-034-R2
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46. https://doi.org/10.1177/001316446002000104
Yu W, Sun Z, Xu J, Dong Z, Chen X, Xu H, Wen J-R (2022) Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of the 45th international acm sigir conference on research and development in information retrieval. SIGIR ’22, pp. 657–668. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3477495.3531974
Du X, Cardie C (2020) Event extraction by answering (almost) natural questions. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 671–683. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.49
Zhu T, Qu X, Chen W, Wang Z, Huai B, Yuan N, Zhang M (2022) Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph. In: Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp 4552–4558. International Joint Conferences on Artificial Intelligence Organization, Vienna. Main Track. https://doi.org/10.24963/ijcai.2022/632
Contributors P (2021) PaddleNLP: an easy-to-use and high performance NLP library. https://github.com/PaddlePaddle/PaddleNLP
Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019) ERNIE: enhanced representation through knowledge integration. CoRR arXiv:1904.09223
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1423
Zhong H, Zhang Z, Liu Z, Sun M (2019) Open Chinese language pre-trained model zoo. Technical Report. https://github.com/thunlp/openclap
Xiao C, Zhong H, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H, et al (2018) Cail2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478
Funding
National Natural Science Foundation of China(No.61877051)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared no potential Conflict of interest with respect to the research, authorship, and publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Hyper-parameters of the models
Appendix: Hyper-parameters of the models
See Table 9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xian, G., Du, S., Tang, X. et al. DLEE: a dataset for Chinese document-level legal event extraction. Neural Comput & Applic 36, 15581–15597 (2024). https://doi.org/10.1007/s00521-024-09907-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09907-4