Article

Extracting definitions from brazilian legal texts

Authors:

Edilson Ferneda,

Hércules Antonio do Prado,

Augusto Herrmann Batista,

Marcello Sandi PinheiroAuthors Info & Claims

ICCSA'12: Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III

Pages 631 - 646

https://doi.org/10.1007/978-3-642-31137-6_48

Published: 18 June 2012 Publication History

Abstract

In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.

References

[1]

Alarcón, R., Sierra, G., Bach, C.: Developing a Definitional Knowledge Extraction System. In: Proceedings of Third Language & Technology Conference, LTC 2007 (2007)

[2]

Alarcón, R., Sierra, G., Bach, C.: ECODE: A Definition Extraction System. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 382-391. Springer, Heidelberg (2009)

Digital Library

[3]

Alarcón, R., Sierra, G., Bach, C.: Description and evaluation of a definition extraction system for Spanish language. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 7-13. Association for Computational Linguistics, Borovets (2009)

Digital Library

[4]

Aluísio, S. M., Pinheiro, G., Finger, M., Nunes, M. G. V., Tagnin, S. E.: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics, Lancaster, UK, vol. 16, pp. 14-21 (2003)

[5]

Aluísio, S., Pelizzoni, J., Marchi, A. R., de Oliveira, L., Manenti, R., Marquiafável, V.: An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese. In: Mamede, N. J., Baptista, J., Trancoso, I., Nunes, M.d. G. V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 110-117. Springer, Heidelberg (2003)

Digital Library

[6]

Aranha, M. I., Lima, J. A. O.: Coleção Brasileira de Direito das Telecomunicações, Grupos de Pesquisa. v. 3. Brasília, Brazil (2009)

[7]

Blair-Goldensohn, S., McKeown, K. R., Schlaikjer, A. H.: Answering definitional questions: A hybrid approach. New directions in question answering. AAAI Press (2004)

Digital Library

[8]

Borg, C., Rosner, M., Pace, G. J.: Towards Automatic Extraction of Definitions. In: Proceedings of the 5th Computer Science Annual Workshop, CSAW 2007 (2007)

[9]

Borg, C., Rosner, M., Pace, G.: Evolutionary algorithms for definition extraction. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 26-32. Association for Computational Linguistics, Stroudsburg (2009)

Digital Library

[10]

Branco, A., Silva, J.: Evaluating solutions for the rapid development of state-of-the-art POS taggers for Portuguese. In: Proceedings of the 4th Language Resources and Evaluation Conference, LREC 2004, Lisbon, Portugal, pp. 507-510 (2004)

[11]

BRASIL. Lei no 8.666 (1993), http://www3.dataprev.gov.br/sislex/ paginas/42/1993/8666.html

[12]

BRASIL. Lei Complementar no 95 (1998), http://www.lexml.gov.br/urn/urn:lex:br:federal:lei.complementar:1998-02-26;95

[13]

Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing - ANLC, pp. 152-155. Association for Computational Linguistics, Trento (1992)

Digital Library

[14]

Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27) (2011), http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

Digital Library

[15]

Clark, A., Fox, C., Lappin, S. (Orgs.): The Handbook of Computational Linguistics and Natural Language Processing. John Wiley and Sons (2010)

[16]

Del Gaudio, R., Branco, A.: Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach. In: Neves, J., Santos, M. F., Machado, J. M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 659-670. Springer, Heidelberg (2007)

Digital Library

[17]

Del Gaudio, R., Branco, A.: Extraction of definitions in portuguese: An imbalanced data set problem. In: Proceedings of Text Mining and Applications at EPIA (2009)

[18]

Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From Experimental Machine Learning to Interactive Data Mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 537-539. Springer, Heidelberg (2004)

Digital Library

[19]

Fahmi, I., Bouma, G.: Learning to identify definitions using syntactic features. In: Proceedings of the Workshop on Learning Structured Information in Natural Language Applications, pp. 64-71. Association for Computational Linguistics, Trento (2006)

[20]

Feldman, R., Sanger, J.: The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press (2007)

Digital Library

[21]

Fernandes, A. D.: Answering definitional questions before they are asked. PhD Thesis. Massachusetts Institute of Technology, Cambridge, USA (2004)

[22]

Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S.: Introducing and evaluating ukwac, a very large web-derived corpus of english. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4), pp. 47-54. Marrakech, Marrocos (2008)

[23]

Kiss, T., Strunk, J.: Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32(4), 485-525 (2006)

Digital Library

[24]

Klavans, J. L., Muresan, S.: DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text. In: Proceedings of the AMIA Symposium, pp. 1049-1049 (2000)

[25]

Loper, E., Bird, S.: NLTK: the Natural Language Toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - ETMTNLP, vol. 1, pp. 63-70. Association for Computational Linguistics, Stroudsburg (2002)

Digital Library

[26]

Magnini, B.; Cappelli, A.; Tamburini, F.: Evaluation of natural language tools for italian: Evalita 2007. Proceedings of the International Language Resources and Evaluation Conference, LREC 2008, vol. 8, p. 2536-2543, 2008.

[27]

Marcus, M. P., Marcinkiewicz, M. A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Computational Linguistic 19(2), 313-330 (1993)

Digital Library

[28]

Marques, N.C., Lopes, J. G. P.: A Neural Network Approach to Portuguese Part-of-Speech Tagging. In: Garcia, L. S. (ed.) Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado. CEFET-PR, Curitiba (1996)

[29]

Miliaraki, S., Androutsopoulos, I.: Learning to identify single-snippet answers to definition questions. In: Proceedings of the 20th International Conference on Computational Linguistics - COLING 2004. Association for Computational Linguistics, Stroudsburg (2004)

Digital Library

[30]

Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1318-1327 (2010)

Digital Library

[31]

Pearson, J.: Terms in context. John Benjamins Publishing Company (1998)

[32]

Pinto, A. S., Oliveira, D.: Extracção de definições no Corpógrafo. Faculdade de Letras da Universidade do Porto, Portugal (2004), http://comum.rcaap.pt/bitstream/123456789/281/1/OliveiraPintoOut2004.pdf

[33]

Przepiórkowski, A., Degórski, L., Wójtowicz, B.: Towards the automatic extraction of definitions in Slavic. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, pp. 43-50. Association for Computational Linguistics, Prague (2007)

Digital Library

[34]

Rigutini, L., Diligenti, M., Maggini, M., Gori, M.: A Fully Automatic Crossword Generator. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 362-367. IEEE Computer Society (2008)

Digital Library

[35]

Rondeau, G.: Introduction à la Terminologie, Québec, Gaëten Morin Editeur (1984)

[36]

Sager, J.C.: A practical course in terminology processing. J. Benjamins Pub. Co. (1990)

[37]

Saggion, H.: Identifying Definitions in Text Collections for Question Answering. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)

[38]

Saggion, H.: Mining Profiles and Definitions with Natural Language Processing. In: Prado, H. A., Ferneda, E. (Orgs.) Emerging Technologies of Text Mining: Techniques and Applications, IGI Global, Hershey (2008)

[39]

Sang, E. T. K., Bouma, G., De Rijke, M.: Developing offline strategies for answering medical questions. In: Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, Pittsburgh, USA, pp. 41-45 (2005)

[40]

Sarmento, L., Maia, B., Santos, D.: The Corpógrafo - a Web-based environment for corpora research. In: Proceedings of the International Language Resources and Evaluation Conference, LREC 2004, pp. 449-452 (2004)

[41]

Shaw, W.C.: The Art of Debate. Allyn and Bacon, New York (1922)

[42]

Tanev, H., Negri, M., Magnini, B., Kouylekov, M.: The DIOGENE Question Answering System at CLEF-2004. In: Peters, C., Clough, P., Gonzalo, J., Jones, G. J. F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 435-445. Springer, Heidelberg (2005)

Digital Library

[43]

Westerhout, E., Monachesi, P.: Extraction of Dutch definitory contexts for elearning purposes. In: Proceedings of Computational Linguistics in the Netherlands, CLIN 2006 (2006)

[44]

Wüster, E.: Die allgemeine Terminologielehre-ein Grenzgebiet zwischen Sprachwissenschaft, Logik, Ontologie, Informatik und den Sachwissenschaften. Linguistics 12(119), 61-106 (1974)

Cited By

Sai CDamaratskaya AWinter KRinderle-Ma S(2023)Identification and Visualization of Legal Definitions and Legal Term RelationsAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_14(151-161)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47112-4_14

Index Terms

Extracting definitions from brazilian legal texts
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Platform power issues

Recommendations

ECODE: A Definition Extraction System
Human Language Technology. Challenges of the Information Society

Terminological work aims to identify knowledge about terms in specialised texts in order to compile dictionaries, glossaries or ontologies. Searching for definitions about the terms that terminographers intend to define is therefore an essential task. ...
Enriching a lexicographic tool with domain definitions: problems and solutions
WDE '09: Proceedings of the 1st Workshop on Definition Extraction

Enriching linguistic resources with domain information has been considered one important target in natural language applications. However, automatic definition extraction of this domain information from specialized resources has revealed certain ...
A formal scope on the relations between definitions and verbal predications
WDE '09: Proceedings of the 1st Workshop on Definition Extraction

This paper outlines a formal description of grammatical relations between definitions and verbal predications found in Definitional Contexts in Spanish. It can be situated within the framework of Predication Theory, a model derived from Government & ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICCSA'12: Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III

June 2012

757 pages

ISBN:9783642311369

Editors:
Beniamino Murgante
Laboratory of Urban and Territorial Systems, University of Basilicata, 10, Viale dell'Ateneo Lucano, Potenza, Italy
,
Osvaldo Gervasi
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli 1, Perugia, Italy
,
Sanjay Misra
Department of Cyber Security Science, Federal University of Technology, Gidan Kwano Campus, Minna, Nigeria
,
Nadia Nedjah
Faculty of Engineering, Department of Electronics Engineering and Telecommunications, State University of Rio de Janeiro, Rua Sao Francisco Xavier, 524, 50. andar, sala 5145-F, Maracana, Rio de Janeiro, RJ, Brazil
,
Ana C. Rocha
Department of Production and Systems, University of Minho, Campus de Gualtar, Braga, RJ, Portugal

Sponsors

The University of Perugia: The University of Perugia
UEFS: Universidade Estadual de Feira de Santana, Brazil
UFBA: Universidade Federal da Bahia, Brazil
The University of Basilicata: The University of Basilicata
UFRB: Universidade Federal do Recôncavo da Bahia, Brazil

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 June 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sai CDamaratskaya AWinter KRinderle-Ma S(2023)Identification and Visualization of Legal Definitions and Legal Term RelationsAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_14(151-161)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47112-4_14

View Options

View options

Media

Figures

Other

Tables

View Table of Contents