Abstract
The amount of legislative documents produced within the past decade has risen dramatically, making it difficult for law practitioners to consult and update legislation. Named Entity Recognition (NER) systems have the untapped potential to extract information from legal documents, which can improve information retrieval and decision-making processes. We introduce the UlyssesNER-Br, a corpus of Brazilian Legislative Documents for NER with quality baselines. The presented corpus consists of bills and legislative consultations from Brazilian Chamber of Deputies. We implemented Conditional Random Field (CRF) and Hidden Markov Model (HMM) models, and the promising F1-score of 80.8% in the analysis by categories and 81.04% in the analysis by types, was achieved with the CRF model. The entities with the best average F1-score results were “FUNDlei” and “DATA”, and the ones with the worst results were “EVENTO” and “PESSOAgrupoind”. The corpus was also evaluated using a BiLSTM-CRF and Glove architectures provided by the pioneering state-of-the-art paper, achieving F1-score of 76.89% in the analysis by categories and 59.67% in the analysis by types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alles, V.J.: Construção de um corpus para extrair entidades nomeadas do Diário Oficial da União utilizando aprendizado supervisionado. Master’s thesis, Departamento de Engenharia Elétrica, Universidade de Brasília, Brasília, DF (2018)
Almeida, P.G.R.: Uma jornada para um Parlamento inteligente: Câmara dos Deputados do Brasil. Red Información, Edición N\(^{\circ }\) 24 (2021)
Angelidis, I., Chalkidis, I., Koubarakis, M.: Named entity recognition, linking and generation for Greek legislation. In: Proceedings of 31st International Conference on Legal Knowledge and Information Systems, JURIX 2018 (2018)
Badji, I.: Legal entity extraction with NER systems. Master’s thesis, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid (2018)
Brandt M.B.: Modelagem da informação legislativa: arquitetura da informação para o processo legislativo brasileiro. Faculdade de Filosofia e Ciências da Universidade Estadual Paulista (UNESP) (2020)
Castro, P.V.Q.: Aprendizagem profunda para reconhecimento de entidades nomeadas em domínio jurídico. Masters thesis, Programa de Pós-graduação em Ciência da Computação, Universidade Federal de Goiás (2019)
Klie, J.C., Bugert, M., Boullosa, B., Eckart de Castilho, R., Gurevych, I.: The INCEpTION platform: machine-assisted and knowledge-oriented interactive annotation. In: Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, COLING 2018 (2018)
Lafferty, J.; McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (2001)
Leitner, E.; Rehm, G.; Moreno-Schneider, J.: A dataset of German legal documents for named entity recognition. In: LREC 2020–12th International Conference on Language Resources and Evaluation, Conference Proceedings (2020)
Li, J.; Sun, A.; Han, J.; Li, C.: A survey on deep learning for named entity recognition. In: IEEE Transactions on Knowledge and Data Engineering (2020)
Loper, E., Bird, S.: NLTK: The Natural Language Toolkit (2002)
Luz de Araujo, P.H., Campos, T.E., Braz, F.A., Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille (2020)
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Maxwell, K.T., Schafer, B.: Concept and context in legal information retrieval. Front. Artif. Intell. Appl. 189, 63–72 (2008)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Pirovani, J. P. C.: CRF+LG: uma abordagem híbrida para o reconhecimento de entidades nomeadas em português. PhD thesis, Universidade Federal do Espírito Santo (2019)
Quaresma, P., Gonçalves, T.: Using linguistic information and machine learning techniques to identify entities from juridical documents. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 44–59. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_3
Santos, D., Cardoso, N.: A golden resource for named entity recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 69–79. Springer, Heidelberg (2006). https://doi.org/10.1007/11751984_8
Váradi, T., et al.: The MARCELL legislative corpus. In: Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association (2020)
Acknowledgements
This research is carried out in the context of the Ulysses Project, of the Brazilian Chamber of Deputies. Ellen Souza and Nadia Félix are supported by FAPESP , agreement between USP and the Brazilian Chamber of Deputies. André C. P. L. F. de Carvalho and Adriano L. I. Oliveira are supported by CNPq. To the Brazilian Chamber of Deputies and to research funding agencies, to which we express our gratitude for supporting the research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Albuquerque, H.O. et al. (2022). UlyssesNER-Br: A Corpus of Brazilian Legislative Documents for Named Entity Recognition. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-98305-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)