Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

DLEE: a dataset for Chinese document-level legal event extraction

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Event extraction (EE) is capable of providing essential information to facilitate comprehension of legal cases by identifying event types and extracting corresponding arguments from legal case documents. In the legal field, events are often presented in the form of document, with arguments scattered across multiple sentences, which means that legal EE at the document level is needed to better capture the complete event. However, the existing legal EE datasets mainly focused on event extraction at the sentence level, with little attention given to the document level. Obviously, it put the development of document-level event extraction (DEE) in the legal field at a disadvantage. To address this challenge, we proposed DLEE, the first DEE dataset in the legal field with two distinctive features: (1) Document-level Semi-automated Annotation, ensuring effective annotation with high quality. (2) Large-scale and Fine-grained coverage, comprising 10,014 events and 99,423 arguments. Finally, we assessed the performance of commonly used DEE baseline models on DLEE. It revealed that the DLEE is an open question, and further attention is needed for the improvement of the models’ performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Details of our dataset can be found online at https://anonymous.4open.science/r/DLEE-DATA/README.md. The dataset is available on request.

Notes

  1. https://wenshu.court.gov.cn/.

References

  1. Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ACE) program—tasks, data, and evaluation. In: Proceedings of the fourth international conference on language resources and evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf

  2. Gao L, Wu J, Qiao Z, Zhou C, Yang H, Hu Y (2016) Collaborative social group influence for event recommendation. In: Proceedings of the 25th ACM international on conference on information and knowledge management. CIKM ’16, pp 1941–1944. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2983323.2983879

  3. Man Duc Trong H, Trong Le D, Pouran Ben Veyseh A, Nguyen T, Nguyen TH (2020) Introducing a new dataset for event detection in cybersecurity texts. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 5381–5390. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.433

  4. Du L, Ding X, Xiong K, Liu T, Qin B (2021) ExCAR: Event graph knowledge enhanced explainable causal reasoning. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2354–2363. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.183

  5. Souza Costa T, Gottschalk S, Demidova E (2020) Event-qa: A dataset for event-centric question answering over knowledge graphs. In: Proceedings of the 29th ACM international conference on information & knowledge management. CIKM ’20, pp 3157–3164. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3340531.3412760

  6. Sims M, Park JH, Bamman D (2019) Literary event detection. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3623–3634. Association for Computational Linguistics, Florence, Italy. https://aclanthology.org/P19-1353

  7. Lai VD, Nguyen MV, Kaufman H, Nguyen TH (2021) Event extraction from historical texts: a new dataset for black rebellions. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp. 2390–2400. Association for Computational Linguistics, Online. https://aclanthology.org/2021.findings-acl.211

  8. Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 167–176. Association for Computational Linguistics, Beijing, China. https://aclanthology.org/P15-1017

  9. Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does NLP benefit legal system: a summary of legal artificial intelligence. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5218–5230. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.466

  10. Yao F, Xiao C, Wang X, Liu Z, Hou L, Tu C, Li J, Liu Y, Shen W, Sun M (2022) LEVEN: A large-scale Chinese legal event detection dataset. In: Findings of the association for computational linguistics: ACL 2022, pp. 183–201. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.17

  11. Feng Y, Li C, Ng V (2022) Legal judgment prediction via event extraction with constraints. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 648–664. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.acl-long.48

  12. Li C, Sheng Y, Ge J, Luo B (2019) Apply event extraction techniques to the judicial field. In: Adjunct proceedings of the 2019 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2019 ACM international symposium on wearable computers. UbiComp/ISWC ’19 Adjunct, pp. 492–497. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3341162.3345608

  13. Shen S, Qi G, Li Z, Bi S, Wang L (2020) Hierarchical Chinese legal event extraction via pedal attention mechanism. In: Proceedings of the 28th international conference on computational linguistics, pp 100–113. International Committee on Computational Linguistics, Barcelona, Spain (Online). https://aclanthology.org/2020.coling-main.9

  14. Li Q, Zhang Q, Yao J, Zhang Y (2020) Event extraction for criminal legal text. In: 2020 IEEE international conference on knowledge graph (ICKG), pp 573–580. https://doi.org/10.1109/ICBK50248.2020.00086

  15. Ma Y, Shao Y, Wu Y, Liu Y, Zhang R, Zhang M, Ma S (2021) Lecard: a legal case retrieval dataset for Chinese law system. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’21, pp 2342–2348. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3404835.3463250

  16. Grishman R, Sundheim B (1996) Message Understanding Conference- 6: a brief history. In: COLING 1996 Volume 1: The 16th international conference on computational linguistics. https://aclanthology.org/C96-1079

  17. Mitamura T, Liu Z, Hovy EH (2015) Overview of tac kbp 2015 event nugget track. Theory and Applications of Categories

  18. Liu J, Chen Y, Liu K, Bi W, Liu X (2020) Event extraction as machine reading comprehension. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 1641–1651. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.128

  19. Wang S, Yu M, Chang S, Sun L, Huang L (2022) Query and extract: refining event extraction as type-oriented binary decoding. In: Findings of the association for computational linguistics: ACL 2022, pp 169–182. Association for Computational Linguistics, Dublin, Ireland. https://aclanthology.org/2022.findings-acl.16

  20. Liu S, Li Y, Zhang F, Yang T, Zhou X (2019) Event detection without triggers. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 735–744. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1080

  21. Tong M, Xu B, Wang S, Cao Y, Hou L, Li J, Xie J (2020) Improving event detection via open-domain trigger knowledge. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5887–5897. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.522

  22. Ahn D (2006) The stages of event extraction. In: Proceedings of the workshop on annotating and reasoning about time and events, pp 1–8. Association for Computational Linguistics, Sydney, Australia. https://aclanthology.org/W06-0901

  23. Gupta P, Ji H (2009) Predicting unknown time arguments based on cross-event propagation. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, pp 369–372. Association for Computational Linguistics, Suntec, Singapore. https://aclanthology.org/P09-2093

  24. Yang H, Chen Y, Liu K, Xiao Y, Zhao J (2018) DCFEE: a document-level Chinese financial event extraction system based on automatically labeled training data. In: Proceedings of ACL 2018, system demonstrations, pp 50–55. Association for Computational Linguistics, Melbourne, Australia. https://aclanthology.org/P18-4009

  25. Zheng S, Cao W, Xu W, Bian J (2019) Doc2EDAG: an end-to-end document-level framework for Chinese financial event extraction. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 337–346. Association for Computational Linguistics, Hong Kong, China https://aclanthology.org/D19-1032

  26. Xu R, Liu T, Li L, Chang B (2021) Document-level event extraction via heterogeneous graph-based interaction model with a tracker. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 3533–3546. Association for Computational Linguistics, Online. https://aclanthology.org/2021.acl-long.274

  27. Liang Y, Jiang Z, Yin D, Ren B (2022) RAAT: relation-augmented attention transformer for relation modeling in document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 4985–4997. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.367

  28. Han C, Zhang J, Li X, Xu G, Peng W, Zeng Z (2022) Duee-fin: a large-scale dataset for document-level event extraction. In: Natural language processing and Chinese computing: 11th CCF international conference, NLPCC 2022, Guilin, China, September 24–25, 2022, proceedings, Part I, pp 172–183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-031-17120-8_14

  29. McLean V (1992) Fourth message understanding conference (MUC-4). https://aclanthology.org/M92-1000

  30. Ebner S, Xia P, Culkin R, Rawlins K, Van Durme B (2020) Multi-sentence argument linking. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8057–8077. Association for Computational Linguistics, Online. https://aclanthology.org/2020.acl-main.718

  31. Li S, Ji H, Han J (2021) Document-level event argument extraction by conditional generation. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 894–908. Association for Computational Linguistics, Online. https://aclanthology.org/2021.naacl-main.69

  32. Tong M, Xu B, Wang S, Han M, Cao Y, Zhu J, Chen S, Hou L, Li J (2022) DocEE: a large-scale and fine-grained benchmark for document-level event extraction. In: Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 3970–3982. Association for Computational Linguistics, Seattle, United States. https://aclanthology.org/2022.naacl-main.291

  33. Yang T-H, Huang H-H, Yen A-Z, Chen H-H (2018) Transfer of frames from English FrameNet to construct Chinese FrameNet: a bilingual corpus-based approach. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1139

  34. Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: 36th Annual Meeting of the Association for computational linguistics and 17th international conference on computational linguistics, Volume 1, pp 86–90. Association for Computational Linguistics, Montreal, Quebec, Canada. https://aclanthology.org/P98-1013

  35. Artstein R, Poesio M (2008) Survey article: inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596. https://doi.org/10.1162/coli.07-034-R2

    Article  Google Scholar 

  36. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37–46. https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  37. Yu W, Sun Z, Xu J, Dong Z, Chen X, Xu H, Wen J-R (2022) Explainable legal case matching via inverse optimal transport-based rationale extraction. In: Proceedings of the 45th international acm sigir conference on research and development in information retrieval. SIGIR ’22, pp. 657–668. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3477495.3531974

  38. Du X, Cardie C (2020) Event extraction by answering (almost) natural questions. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 671–683. Association for Computational Linguistics, Online. https://aclanthology.org/2020.emnlp-main.49

  39. Zhu T, Qu X, Chen W, Wang Z, Huai B, Yuan N, Zhang M (2022) Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph. In: Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp 4552–4558. International Joint Conferences on Artificial Intelligence Organization, Vienna. Main Track. https://doi.org/10.24963/ijcai.2022/632

  40. Contributors P (2021) PaddleNLP: an easy-to-use and high performance NLP library. https://github.com/PaddlePaddle/PaddleNLP

  41. Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019) ERNIE: enhanced representation through knowledge integration. CoRR arXiv:1904.09223

  42. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota. https://aclanthology.org/N19-1423

  43. Zhong H, Zhang Z, Liu Z, Sun M (2019) Open Chinese language pre-trained model zoo. Technical Report. https://github.com/thunlp/openclap

  44. Xiao C, Zhong H, Guo Z, Tu C, Liu Z, Sun M, Feng Y, Han X, Hu Z, Wang H, et al (2018) Cail2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478

Download references

Funding

National Natural Science Foundation of China(No.61877051)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Li.

Ethics declarations

Conflict of interest

The authors declared no potential Conflict of interest with respect to the research, authorship, and publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Hyper-parameters of the models

Appendix: Hyper-parameters of the models

See Table 9.

Table 9 Hyper-parameters of baseline models

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xian, G., Du, S., Tang, X. et al. DLEE: a dataset for Chinese document-level legal event extraction. Neural Comput & Applic 36, 15581–15597 (2024). https://doi.org/10.1007/s00521-024-09907-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09907-4

Keywords

Navigation