DLEE: a dataset for Chinese document-level legal event extraction

Guochuan Xian¹,
Siyuan Du¹,
Xi Tang¹,
Yuan Shi¹,
Bofang Jia¹,
Banghao Tang¹,
Zhefu Leng¹ &
…
Li Li ORCID: orcid.org/0000-0003-4818-8770¹

350 Accesses
Explore all metrics

Abstract

Event extraction (EE) is capable of providing essential information to facilitate comprehension of legal cases by identifying event types and extracting corresponding arguments from legal case documents. In the legal field, events are often presented in the form of document, with arguments scattered across multiple sentences, which means that legal EE at the document level is needed to better capture the complete event. However, the existing legal EE datasets mainly focused on event extraction at the sentence level, with little attention given to the document level. Obviously, it put the development of document-level event extraction (DEE) in the legal field at a disadvantage. To address this challenge, we proposed DLEE, the first DEE dataset in the legal field with two distinctive features: (1) Document-level Semi-automated Annotation, ensuring effective annotation with high quality. (2) Large-scale and Fine-grained coverage, comprising 10,014 events and 99,423 arguments. Finally, we assessed the performance of commonly used DEE baseline models on DLEE. It revealed that the DLEE is an open question, and further attention is needed for the improvement of the models’ performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Event-Aware Document-Level Event Extraction via Multi-granularity Event Encoder

MIIGraph: Multi-granularity Information Integration Graph for Document-Level Event Extraction

DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Details of our dataset can be found online at https://anonymous.4open.science/r/DLEE-DATA/README.md. The dataset is available on request.

Notes

https://wenshu.court.gov.cn/.

References

Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ACE) program—tasks, data, and evaluation. In: Proceedings of the fourth international conference on language resources and evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf
Gao L, Wu J, Qiao Z, Zhou C, Yang H, Hu Y (2016) Collaborative social group influence for event recommendation. In: Proceedings of the 25th ACM international on conference on information and knowledge management. CIKM ’16, pp 1941–1944. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2983323.2983879
Man Duc Trong H, Trong Le D, Pouran Ben Veyseh A, Nguyen T, Nguyen TH (2020) Introducing a new dataset for event detection in cybersecurity texts. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 5381–5390. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.433
Du L, Ding X, Xiong K, Liu T, Qin B (2021) ExCAR: Event graph knowledge enhanced explainable causal reasoning. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2354–2363. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.183
Souza Costa T, Gottschalk S, Demidova E (2020) Event-qa: A dataset for event-centric question answering over knowledge graphs. In: Proceedings of the 29th ACM international conference on information & knowledge management. CIKM ’20, pp 3157–3164. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3340531.3412760
Sims M, Park JH, Bamman D (2019) Literary event detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3623–3634. Association for Computational Linguistics, Florence, Italy. https://aclanthology.org/P19-1353
Lai VD, Nguyen MV, Kaufman H, Nguyen TH (2021) Event extraction from historical texts: a new dataset for black rebellions. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp. 2390–2400. Association for Computational Linguistics, Online. https://aclanthology.org/2021.findings-acl.211
Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 167–176. Association for Computational Linguistics, Beijing, China. https://aclanthology.org/P15-1017
Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: a summary of legal artificial intelligence. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5218–5230. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.466
Yao F, Xiao C, Wang X, Liu Z, Hou L, Tu C, Li J, Liu Y, Shen W, Sun M (2022) LEVEN: A large-scale Chinese legal event detection dataset. In: Findings of the association for computational linguistics: ACL 2022, pp. 183–201. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.17
Feng Y, Li C, Ng V (2022) Legal judgment prediction via event extraction with constraints. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 648–664. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.acl-long.48
Li C, Sheng Y, Ge J, Luo B (2019) Apply event extraction techniques to the judicial field. In: Adjunct proceedings of the 2019 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2019 ACM international symposium on wearable computers. UbiComp/ISWC ’19 Adjunct, pp. 492–497. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341162.3345608
Shen S, Qi G, Li Z, Bi S, Wang L (2020) Hierarchical Chinese legal event extraction via pedal attention mechanism. In: Proceedings of the 28th international conference on computational linguistics, pp 100–113. International Committee on Computational Linguistics, Barcelona, Spain (Online). https://aclanthology.org/2020.coling-main.9
Li Q, Zhang Q, Yao J, Zhang Y (2020) Event extraction for criminal legal text. In: 2020 IEEE international conference on knowledge graph (ICKG), pp 573–580. https://doi.org/10.1109/ICBK50248.2020.00086
Ma Y, Shao Y, Wu Y, Liu Y, Zhang R, Zhang M, Ma S (2021) Lecard: a legal case retrieval dataset for Chinese law system. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’21, pp 2342–2348. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3404835.3463250
Grishman R, Sundheim B (1996) Message Understanding Conference- 6: a brief history. In: COLING 1996 Volume 1: The 16th international conference on computational linguistics. https://aclanthology.org/C96-1079
Mitamura T, Liu Z, Hovy EH (2015) Overview of tac kbp 2015 event nugget track. Theory and Applications of Categories
Liu J, Chen Y, Liu K, Bi W, Liu X (2020) Event extraction as machine reading comprehension. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 1641–1651. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.128
Wang S, Yu M, Chang S, Sun L, Huang L (2022) Query and extract: refining event extraction as type-oriented binary decoding. In: Findings of the association for computational linguistics: ACL 2022, pp 169–182. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.16
Liu S, Li Y, Zhang F, Yang T, Zhou X (2019) Event detection without triggers. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 735–744. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1080
Tong M, Xu B, Wang S, Cao Y, Hou L, Li J, Xie J (2020) Improving event detection via open-domain trigger knowledge. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5887–5897. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.522
Ahn D (2006) The stages of event extraction. In: Proceedings of the workshop on annotating and reasoning about time and events, pp 1–8. Association for Computational Linguistics, Sydney, Australia. https://aclanthology.org/W06-0901
Gupta P, Ji H (2009) Predicting unknown time arguments based on cross-event propagation. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 369–372. Association for Computational Linguistics, Suntec, Singapore. https://aclanthology.org/P09-2093
Yang H, Chen Y, Liu K, Xiao Y, Zhao J (2018) DCFEE: a document-level Chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, system demonstrations, pp 50–55. Association for Computational Linguistics, Melbourne, Australia. https://aclanthology.org/P18-4009
Zheng S, Cao W, Xu W, Bian J (2019) Doc2EDAG: an end-to-end document-level framework for Chinese financial event extraction. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 337–346. Association for Computational Linguistics, Hong Kong, China https://aclanthology.org/D19-1032
Xu R, Liu T, Li L, Chang B (2021) Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 3533–3546. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.274
Liang Y, Jiang Z, Yin D, Ren B (2022) RAAT: relation-augmented attention transformer for relation modeling in document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4985–4997. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.367
Han C, Zhang J, Li X, Xu G, Peng W, Zeng Z (2022) Duee-fin: a large-scale dataset for document-level event extraction. In: Natural language processing and Chinese computing: 11th CCF international conference, NLPCC 2022, Guilin, China, September 24–25, 2022, proceedings, Part I, pp 172–183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-031-17120-8_14
McLean V (1992) Fourth message understanding conference (MUC-4). https://aclanthology.org/M92-1000
Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B (2020) Multi-sentence argument linking. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8057–8077. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.718
Li S, Ji H, Han J (2021) Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 894–908. Association for Computational Linguistics, Online. https://aclanthology.org/2021.naacl-main.69
Tong M, Xu B, Wang S, Han M, Cao Y, Zhu J, Chen S, Hou L, Li J (2022) DocEE: a large-scale and fine-grained benchmark for document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 3970–3982. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.291
Yang T-H, Huang H-H, Yen A-Z, Chen H-H (2018) Transfer of frames from English FrameNet to construct Chinese FrameNet: a bilingual corpus-based approach. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1139
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: 36th Annual Meeting of the Association for computational linguistics and 17th international conference on computational linguistics, Volume 1, pp 86–90. Association for Computational Linguistics, Montreal, Quebec, Canada. https://aclanthology.org/P98-1013
Artstein R, Poesio M (2008) Survey article: inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596. https://doi.org/10.1162/coli.07-034-R2
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46. https://doi.org/10.1177/001316446002000104
Article Google Scholar
Yu W, Sun Z, Xu J, Dong Z, Chen X, Xu H, Wen J-R (2022) Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of the 45th international acm sigir conference on research and development in information retrieval. SIGIR ’22, pp. 657–668. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3477495.3531974
Du X, Cardie C (2020) Event extraction by answering (almost) natural questions. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 671–683. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.49
Zhu T, Qu X, Chen W, Wang Z, Huai B, Yuan N, Zhang M (2022) Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph. In: Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp 4552–4558. International Joint Conferences on Artificial Intelligence Organization, Vienna. Main Track. https://doi.org/10.24963/ijcai.2022/632
Contributors P (2021) PaddleNLP: an easy-to-use and high performance NLP library. https://github.com/PaddlePaddle/PaddleNLP
Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019) ERNIE: enhanced representation through knowledge integration. CoRR arXiv:1904.09223
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1423
Zhong H, Zhang Z, Liu Z, Sun M (2019) Open Chinese language pre-trained model zoo. Technical Report. https://github.com/thunlp/openclap
Xiao C, Zhong H, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H, et al (2018) Cail2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478

Download references

Funding

National Natural Science Foundation of China(No.61877051)

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, Chongqing, 400715, China
Guochuan Xian, Siyuan Du, Xi Tang, Yuan Shi, Bofang Jia, Banghao Tang, Zhefu Leng & Li Li

Authors

Guochuan Xian
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Du
View author publications
You can also search for this author in PubMed Google Scholar
Xi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Bofang Jia
View author publications
You can also search for this author in PubMed Google Scholar
Banghao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zhefu Leng
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Li.

Ethics declarations

Conflict of interest

The authors declared no potential Conflict of interest with respect to the research, authorship, and publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Hyper-parameters of the models

See Table 9.

Table 9 Hyper-parameters of baseline models

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xian, G., Du, S., Tang, X. et al. DLEE: a dataset for Chinese document-level legal event extraction. Neural Comput & Applic 36, 15581–15597 (2024). https://doi.org/10.1007/s00521-024-09907-4

Download citation

Received: 22 October 2023
Accepted: 23 April 2024
Published: 16 May 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00521-024-09907-4

DLEE: a dataset for Chinese document-level legal event extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Event-Aware Document-Level Event Extraction via Multi-granularity Event Encoder

MIIGraph: Multi-granularity Information Integration Graph for Document-Level Event Extraction

DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Hyper-parameters of the models

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

DLEE: a dataset for Chinese document-level legal event extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Event-Aware Document-Level Event Extraction via Multi-granularity Event Encoder

MIIGraph: Multi-granularity Information Integration Graph for Document-Level Event Extraction

DuEE-Fin: A Large-Scale Dataset for Document-Level Event Extraction

Explore related subjects

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Hyper-parameters of the models

Appendix: Hyper-parameters of the models

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now