Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-642-31137-6_48guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Extracting definitions from brazilian legal texts

Published: 18 June 2012 Publication History

Abstract

In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.

References

[1]
Alarcón, R., Sierra, G., Bach, C.: Developing a Definitional Knowledge Extraction System. In: Proceedings of Third Language & Technology Conference, LTC 2007 (2007)
[2]
Alarcón, R., Sierra, G., Bach, C.: ECODE: A Definition Extraction System. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS, vol. 5603, pp. 382-391. Springer, Heidelberg (2009)
[3]
Alarcón, R., Sierra, G., Bach, C.: Description and evaluation of a definition extraction system for Spanish language. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 7-13. Association for Computational Linguistics, Borovets (2009)
[4]
Aluísio, S. M., Pinheiro, G., Finger, M., Nunes, M. G. V., Tagnin, S. E.: The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics, Lancaster, UK, vol. 16, pp. 14-21 (2003)
[5]
Aluísio, S., Pelizzoni, J., Marchi, A. R., de Oliveira, L., Manenti, R., Marquiafável, V.: An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese. In: Mamede, N. J., Baptista, J., Trancoso, I., Nunes, M.d. G. V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 110-117. Springer, Heidelberg (2003)
[6]
Aranha, M. I., Lima, J. A. O.: Coleção Brasileira de Direito das Telecomunicações, Grupos de Pesquisa. v. 3. Brasília, Brazil (2009)
[7]
Blair-Goldensohn, S., McKeown, K. R., Schlaikjer, A. H.: Answering definitional questions: A hybrid approach. New directions in question answering. AAAI Press (2004)
[8]
Borg, C., Rosner, M., Pace, G. J.: Towards Automatic Extraction of Definitions. In: Proceedings of the 5th Computer Science Annual Workshop, CSAW 2007 (2007)
[9]
Borg, C., Rosner, M., Pace, G.: Evolutionary algorithms for definition extraction. In: Proceedings of the 1st Workshop on Definition Extraction, pp. 26-32. Association for Computational Linguistics, Stroudsburg (2009)
[10]
Branco, A., Silva, J.: Evaluating solutions for the rapid development of state-of-the-art POS taggers for Portuguese. In: Proceedings of the 4th Language Resources and Evaluation Conference, LREC 2004, Lisbon, Portugal, pp. 507-510 (2004)
[11]
BRASIL. Lei no 8.666 (1993), http://www3.dataprev.gov.br/sislex/ paginas/42/1993/8666.html
[12]
BRASIL. Lei Complementar no 95 (1998), http://www.lexml.gov.br/urn/urn:lex:br:federal:lei.complementar:1998-02-26;95
[13]
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing - ANLC, pp. 152-155. Association for Computational Linguistics, Trento (1992)
[14]
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(27) (2011), http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf
[15]
Clark, A., Fox, C., Lappin, S. (Orgs.): The Handbook of Computational Linguistics and Natural Language Processing. John Wiley and Sons (2010)
[16]
Del Gaudio, R., Branco, A.: Automatic Extraction of Definitions in Portuguese: A Rule-Based Approach. In: Neves, J., Santos, M. F., Machado, J. M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 659-670. Springer, Heidelberg (2007)
[17]
Del Gaudio, R., Branco, A.: Extraction of definitions in portuguese: An imbalanced data set problem. In: Proceedings of Text Mining and Applications at EPIA (2009)
[18]
Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From Experimental Machine Learning to Interactive Data Mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 537-539. Springer, Heidelberg (2004)
[19]
Fahmi, I., Bouma, G.: Learning to identify definitions using syntactic features. In: Proceedings of the Workshop on Learning Structured Information in Natural Language Applications, pp. 64-71. Association for Computational Linguistics, Trento (2006)
[20]
Feldman, R., Sanger, J.: The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press (2007)
[21]
Fernandes, A. D.: Answering definitional questions before they are asked. PhD Thesis. Massachusetts Institute of Technology, Cambridge, USA (2004)
[22]
Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S.: Introducing and evaluating ukwac, a very large web-derived corpus of english. In: Proceedings of the 4th Web as Corpus Workshop (WAC-4), pp. 47-54. Marrakech, Marrocos (2008)
[23]
Kiss, T., Strunk, J.: Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32(4), 485-525 (2006)
[24]
Klavans, J. L., Muresan, S.: DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text. In: Proceedings of the AMIA Symposium, pp. 1049-1049 (2000)
[25]
Loper, E., Bird, S.: NLTK: the Natural Language Toolkit. In: Proceedings of the ACL 2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics - ETMTNLP, vol. 1, pp. 63-70. Association for Computational Linguistics, Stroudsburg (2002)
[26]
Magnini, B.; Cappelli, A.; Tamburini, F.: Evaluation of natural language tools for italian: Evalita 2007. Proceedings of the International Language Resources and Evaluation Conference, LREC 2008, vol. 8, p. 2536-2543, 2008.
[27]
Marcus, M. P., Marcinkiewicz, M. A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Computational Linguistic 19(2), 313-330 (1993)
[28]
Marques, N.C., Lopes, J. G. P.: A Neural Network Approach to Portuguese Part-of-Speech Tagging. In: Garcia, L. S. (ed.) Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado. CEFET-PR, Curitiba (1996)
[29]
Miliaraki, S., Androutsopoulos, I.: Learning to identify single-snippet answers to definition questions. In: Proceedings of the 20th International Conference on Computational Linguistics - COLING 2004. Association for Computational Linguistics, Stroudsburg (2004)
[30]
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1318-1327 (2010)
[31]
Pearson, J.: Terms in context. John Benjamins Publishing Company (1998)
[32]
Pinto, A. S., Oliveira, D.: Extracção de definições no Corpógrafo. Faculdade de Letras da Universidade do Porto, Portugal (2004), http://comum.rcaap.pt/bitstream/123456789/281/1/OliveiraPintoOut2004.pdf
[33]
Przepiórkowski, A., Degórski, L., Wójtowicz, B.: Towards the automatic extraction of definitions in Slavic. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, pp. 43-50. Association for Computational Linguistics, Prague (2007)
[34]
Rigutini, L., Diligenti, M., Maggini, M., Gori, M.: A Fully Automatic Crossword Generator. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 362-367. IEEE Computer Society (2008)
[35]
Rondeau, G.: Introduction à la Terminologie, Québec, Gaëten Morin Editeur (1984)
[36]
Sager, J.C.: A practical course in terminology processing. J. Benjamins Pub. Co. (1990)
[37]
Saggion, H.: Identifying Definitions in Text Collections for Question Answering. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
[38]
Saggion, H.: Mining Profiles and Definitions with Natural Language Processing. In: Prado, H. A., Ferneda, E. (Orgs.) Emerging Technologies of Text Mining: Techniques and Applications, IGI Global, Hershey (2008)
[39]
Sang, E. T. K., Bouma, G., De Rijke, M.: Developing offline strategies for answering medical questions. In: Proceedings of the AAAI 2005 Workshop on Question Answering in Restricted Domains, Pittsburgh, USA, pp. 41-45 (2005)
[40]
Sarmento, L., Maia, B., Santos, D.: The Corpógrafo - a Web-based environment for corpora research. In: Proceedings of the International Language Resources and Evaluation Conference, LREC 2004, pp. 449-452 (2004)
[41]
Shaw, W.C.: The Art of Debate. Allyn and Bacon, New York (1922)
[42]
Tanev, H., Negri, M., Magnini, B., Kouylekov, M.: The DIOGENE Question Answering System at CLEF-2004. In: Peters, C., Clough, P., Gonzalo, J., Jones, G. J. F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 435-445. Springer, Heidelberg (2005)
[43]
Westerhout, E., Monachesi, P.: Extraction of Dutch definitory contexts for elearning purposes. In: Proceedings of Computational Linguistics in the Netherlands, CLIN 2006 (2006)
[44]
Wüster, E.: Die allgemeine Terminologielehre-ein Grenzgebiet zwischen Sprachwissenschaft, Logik, Ontologie, Informatik und den Sachwissenschaften. Linguistics 12(119), 61-106 (1974)

Cited By

View all
  • (2023)Identification and Visualization of Legal Definitions and Legal Term RelationsAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_14(151-161)Online publication date: 6-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICCSA'12: Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
June 2012
757 pages
ISBN:9783642311369
  • Editors:
  • Beniamino Murgante,
  • Osvaldo Gervasi,
  • Sanjay Misra,
  • Nadia Nedjah,
  • Ana C. Rocha

Sponsors

  • The University of Perugia: The University of Perugia
  • UEFS: Universidade Estadual de Feira de Santana, Brazil
  • UFBA: Universidade Federal da Bahia, Brazil
  • The University of Basilicata: The University of Basilicata
  • UFRB: Universidade Federal do Recôncavo da Bahia, Brazil

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 June 2012

Author Tags

  1. definition extraction
  2. information extraction
  3. natural language processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Identification and Visualization of Legal Definitions and Legal Term RelationsAdvances in Conceptual Modeling10.1007/978-3-031-47112-4_14(151-161)Online publication date: 6-Nov-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media