Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3322640.3326727acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

Automatic Construction of a Polish Legal Dictionary with Mappings to Extra-Legal Terms Established via Word Embeddings

Published: 17 June 2019 Publication History

Abstract

The primary objective of this research is finding correspondence between legal and extra-legal terms in Polish by employing unsupervised methods, such as statistics and word embeddings. We investigate the possibility to construct a legal dictionary automatically by employing statistical methods for identifying the legal terms (including multi-word entities) and then finding correspondence between these terms and extralegal terminology used by laymen, by employing word embeddings inducing algorithms. We compare two popular libraries word2vec and GloVe in a synthetic experiment showing the superiority of word2vec CBOW negative sampling variant in the described problem.

References

[1]
Enrique Alcaraz and Brian Hughes. 2014. Legal translation explained. Routledge.
[2]
Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 740--750.
[3]
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19, 1 (1993), 61--74.
[4]
Michałt Jungiewicz and Michałt Łopuszyński. 2014. Unsupervised keyword extraction from Polish legal texts. In International Conference on Natural Language Processing. Springer, 65--70.
[5]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[6]
Rémi Lebret and Ronan Collobert. 2013. Word emdeddings through hellinger PCA. arXiv preprint arXiv:1312.5542 (2013).
[7]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[8]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing(EMNLP). 1532--1543.
[9]
Piotr Pęzik. 2012. Wyszukiwarka PELCRA dla danych NKJP. In Narodowy Korpus Języka Polskiego, Adam Przepiórkowski, Mirosłtaw Bańko, Rafałt Górski, and Barbara Lewandowska-Tomaszczyk (Eds.). Wydawnictwo Naukowe PWN, 253--279.
[10]
Adam Przepiórkowskis, Mirosłtaw Bańko, Rafałt Górski, and Barbara Lewandowska-Tomaszczyk. 2012. Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN.
[11]
Adam Radziszewski and Tomasz Śniatowski. 2011. Maca -- a configurable tool to integrate Polish morphological data. In Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation.
[12]
Alison Riley. 1996. The meaning of words in english legal texts: Mastering the vocabulary of the law---a legal task. The Law Teacher 30, 1 (1996), 68--83.
[13]
Stephen Roller, Katrin Erk, and Gemma Boleda. 2014. Inclusive yet selective: Supervised distributional hypernymy detection. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1025--1036.
[14]
Vered Shwartz, Yoav Goldberg, and Ido Dagan. 2016. Improving hypernymy detection with an integrated path-based and distributional method. arXiv preprint arXiv:1603.06076 (2016).
[15]
Andreas Stolcke. 2002. SRILM -- an extensible language modeling toolkit. In Seventh international conference on spoken language processing.
[16]
Marcin Woliński. 2006. Morfeusz -- a practical tool for the morphological analysis of Polish. In Intelligent Information Processing and Web Mining. Springer, 511--520.
[17]
Krzysztof Wróbel. 2017. KRNNT: Polish Recurrent Neural Network Tagger. In Proceedings of the 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Zygmunt Vetulani and Patrick Paroubek (Eds.). Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu, 386--391.

Cited By

View all
  • (2025)AI-Supported Translation Tools for Legal TextsProcedia Computer Science10.1016/j.procs.2024.09.707246:C(5545-5554)Online publication date: 30-Jan-2025
  • (2020)Space mission design ontology: extraction of domain-specific entities and concepts similarity analysisAIAA Scitech 2020 Forum10.2514/6.2020-2253Online publication date: 5-Jan-2020
  • (2020)Quantitative analysis of a private tax rulings corpusProcedia Computer Science10.1016/j.procs.2020.09.322176(2445-2455)Online publication date: 2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICAIL '19: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law
June 2019
312 pages
ISBN:9781450367547
DOI:10.1145/3322640
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • Univ. of Montreal: University of Montreal
  • AAAI
  • IAAIL: Intl Asso for Artifical Intel & Law

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversational systems
  2. legal terms
  3. multi-word entities
  4. terminology extraction
  5. word embeddings

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Narodowe Centrum Bada? i Rozwoju

Conference

ICAIL '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 69 of 169 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)AI-Supported Translation Tools for Legal TextsProcedia Computer Science10.1016/j.procs.2024.09.707246:C(5545-5554)Online publication date: 30-Jan-2025
  • (2020)Space mission design ontology: extraction of domain-specific entities and concepts similarity analysisAIAA Scitech 2020 Forum10.2514/6.2020-2253Online publication date: 5-Jan-2020
  • (2020)Quantitative analysis of a private tax rulings corpusProcedia Computer Science10.1016/j.procs.2020.09.322176(2445-2455)Online publication date: 2020
  • (2020)Impact of Text Specificity and Size on Word Embeddings Performance: An Empirical Evaluation in Brazilian Legal DomainIntelligent Systems10.1007/978-3-030-61377-8_36(521-535)Online publication date: 20-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media