research-article

A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles

Authors:

Nguyen Le Minh,

Akira ShimazuAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 12, Issue 1

Article No.: 3, Pages 1 - 32

https://doi.org/10.1145/2425327.2425330

Published: 01 March 2013 Publication History

Abstract

Analyzing logical structures of texts is important to understanding natural language, especially in the legal domain, where legal texts have their own specific characteristics. Recognizing logical structures in legal texts does not only help people in understanding legal documents, but also in supporting other tasks in legal text processing.

In this article, we present a new task, learning logical structures of paragraphs in legal articles, which is studied in research on Legal Engineering. The goals of this task are recognizing logical parts of law sentences in a paragraph, and then grouping related logical parts into some logical structures of formulas, which describe logical relations between logical parts. We present a two-phase framework to learn logical structures of paragraphs in legal articles. In the first phase, we model the problem of recognizing logical parts in law sentences as a multi-layer sequence learning problem, and present a CRF-based model to recognize them. In the second phase, we propose a graph-based method to group logical parts into logical structures. We consider the problem of finding a subset of complete subgraphs in a weighted-edge complete graph, where each node corresponds to a logical part, and a complete subgraph corresponds to a logical structure. We also present an integer linear programming formulation for this optimization problem. Our models achieve 74.37% in recognizing logical parts, 80.08% in recognizing logical structures, and 58.36% in the whole task on the Japanese National Pension Law corpus. Our work provides promising results for further research on this interesting task.

References

[1]

Bach, N. X. 2011. A Study on Recognition of Requisite Part and Effectuation Part in Law Sentences. M.S. thesis, School of Information Science, Japan Advanced Institute of Science and Technology.

[2]

Bach, N. X., Minh, N. L., and Shimazu, A. 2010. Exploring contributions of words to recognition of requisite part and effectuation part in law sentences. In Proceedings of the 4th International Workshop on Juris-Informatics (JURISIN’10). 121--132.

[3]

Bach, N. X., Minh, N. L., and Shimazu, A. 2011. RRE task: The task of recognition of requisite part and effectuation part in law sentences. Int. J. Comp. Proc. Lang. 23, 2, 109--130.

[4]

Berger, A. L., Pietra, V. J. D., and Pietra, S. A. D. 1996. A maximum entropy approach to natural language processing. Computat. Linguist. 22.

Digital Library

[5]

Boser, B. E., Guyon, I., and Vapnik, V. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (CLT’92). 144--152.

Digital Library

[6]

Brighi, R., Lesmo, L., Mazzei, A., Palmirani, M., and Radicioni, D. 2008. Towards semantic interpretation of legal modifications through deep syntactic analysis. In Proceedings of the 21st International Conference on Legal Knowledge and Information Systems (JURIX’08). 202--206.

Digital Library

[7]

Bron, C. and Kerbosch, J. 1973. Algorithm 457: Finding all cliques of an undirected graph. Comm. ACM 16, 9, 575--577.

Digital Library

[8]

Byrd, R. H., Nocedal, J., and Schnabel, R. B. 1994. Representations of quasi-Newton matrices and their use in limited memory methods. Math. Prog. 63, 4, 129--156.

Digital Library

[9]

Carreras, X. and Marquez, L. 2005. Filtering-ranking perceptron learning for partial parsing. Mach. Learn. 60, 1--3, 41--71.

Digital Library

[10]

Carreras, X., Marquez, L., Punyakanok, V., and Roth, D. 2002. Learning and inference for clause identification. In Proceedings of the 13th European Conference on Machine Learning (ECML’02). 35--47.

Digital Library

[11]

Chih-Chung, C. and Chih-Jen, L. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Tech. 2, 27, 1--27. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[12]

Clarke, J. and Lapata, M. 2008. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res. 31, 399--429.

[13]

Cortes, C. and Vapnik, V. 1995. Support-vector networks. Mach. Learn. 20, 3, 273--297.

Digital Library

[14]

Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 1, 21--27.

Digital Library

[15]

Darroch, J. N. and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. Annals Math. Stat. 43, 5, 1470--1480.

[16]

Denis, P. and Baldridge, J. 2007. Joint determination of anaphoricity and coreference resolution using integer programming. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’07). 236--243.

[17]

Dyer, C. 2009. Using a maximum entropy model to build segmentation lattices for MT. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’09). 406--414.

Digital Library

[18]

Forney, G. D. J. 1973. The viterbi algorithm. Proc. IEEE 61, 268--278.

[19]

Grover, C., Hachey, B., Hughson, I., and Korycinski, C. 2003. Automatic summarization of legal documents. In Proceedings of the 9th International Conference on Artificial Intelligence and Law (ICAIL). 243--251.

Digital Library

[20]

Hernault, H., Prendinger, H., duVerle, D. A., and Ishizuka, M. 2010. HILDA: A discourse parser using support vector machine classification. Dial.&Disc. 1, 3, 1--33.

[21]

Hsu, C. W., Chang, C. C., and Lin, C. J. 2010. A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.

[22]

Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML’98).

Digital Library

[23]

Katayama, T. 2005. The current status of the art of the 21st COE programs in the information sciences field. Verifiable and evolvable e-society - realization of trustworthy e-society by computer science - (in Japanese). Inf. Proc. Soc. Japan 46, 5, 515--521.

[24]

Katayama, T. 2007. Legal engineering - An engineering approach to laws in e-society age. In Proceedings of the 1st International Workshop on Juris-Informatics (JURISIN’07).

[25]

Katayama, T., Shimazu, A., Tojo, S., Futatsugi, K., and Ochimizu, K. 2008. E-society and legal engineering (in Japanese). J. Jap. Soc. Artif. Intell. 23, 4, 529--536.

[26]

Kimura, Y., Nakamura, M., and Shimazu, A. 2009. Treatment of legal sentences including itemized and referential expressions - Towards translation into logical forms. In New Frontiers in Artificial Intelligence, Lecture Notes in Artificial Intelligence, vol. 5447, 242--253.

Digital Library

[27]

Koeling, R. 2000. Chunking with maximum entropy models. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’00). 139--141.

Digital Library

[28]

Kudo, T. 2010. CRF++: Yet another CRF toolkit. http://crfpp.sourceforge.net/.

[29]

Kudo, T. and Matsumoto, Y. 2001. Chunking with support vector machines. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’01).

Digital Library

[30]

Kudo, T. and Matsumoto, Y. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’02). 63--69. http://code.google.com/p/cabocha/.

Digital Library

[31]

Kudo, T., Yamamoto, K., and Matsumoto, Y. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 230--237.

[32]

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282--289.

Digital Library

[33]

Lame, G. 2004. Using NLP techniques to identify legal ontology components: Concepts and relations. Artifi. Intell. Law 12, 4, 379--396.

Digital Library

[34]

Lee, Y. K. and Ng, H. T. 2002. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 41--48.

Digital Library

[35]

Martins, A., Smith, N., and Xing, E. 2009. Concise integer linear programming formulations for dependency parsing. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJCNLP’09). 342--350.

Digital Library

[36]

McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 591--598.

Digital Library

[37]

McCarty, L. 2007. Deep semantic interpretations of legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law (ICAIL’07). 217--224.

Digital Library

[38]

Mitchell, T. 1997. Machine Learning. MIT Press and McGraw Hill.

Digital Library

[39]

Moens, M.-F., Boiy, E., Palau, R., and Reed, C. 2007. Automatic detection of arguments in legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law (ICAIL’07). 225--230.

Digital Library

[40]

Muramatsu, M., Yasumura, Y., and Nitta, K. 2002. A tagging tool for logical structure of legal sentences. Tech. rep. IEICE.

[41]

Murata, M., Uchimoto, K., Ma, Q., and Isahara, H. 2000. Bunsetsu identification using category-exclusive rules. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’00). 565--571.

Digital Library

[42]

Nakamura, M., Nobuoka, S., and Shimazu, A. 2007. Towards translation of legal sentences into logical forms. In Proceedings of the 1st International Workshop on Juris-Informatics (JURISIN’07).

Digital Library

[43]

Nguyen, L. M., Shimazu, A., and Phan, H. X. 2006. Semantic parsing with structured SVM ensemble classification models. In Proceedings of the Joint 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL’06). 619--626.

Digital Library

[44]

Nivre, J., Hall, J., Nilsson, J., Eryiǧit, G., and Marinov, S. 2006. Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’06). 221--225.

Digital Library

[45]

Nocedal, J. 1980. Updating quasi-Newton matrices with limited storage. Math. Computat. 35, 151, 773--782.

[46]

Pala, K., Rychlý, P., and Šmerk, P. 2007. Morphological analysis of law texts. In Proceedings of the 1st Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN’07). 21--26.

[47]

Pala, K., Rychlý, P., and Šmerk, P. 2010. Automatic identification of legal terms in Czech legal texts. Semant. Proc. Legal Texts, 83-94.

Digital Library

[48]

Peng, F. and McCallum, A. 2006. Information extraction from research papers using conditional random fields. Inf. Proc. Man. 42, 4, 963--979.

Digital Library

[49]

Punyakanok, V., Roth, D., Yih, W., and Zimak, D. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 1346--1352.

Digital Library

[50]

Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’96). 133--142.

[51]

Safavian, S. and Landgrebe, D. 1991. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cyber. 21, 3, 660--674.

[52]

Saias, J. and Quaresma, P. 2005. A methodology to create legal ontologies in a logic programming based web information retrieval system. Law Semant. Web, 185--200.

Digital Library

[53]

Sha, F. P. 2003. Shallow parsing with conditional random fields. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’03). 213--220.

Digital Library

[54]

Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., and Montemagni, S. 2009. NLP-based metadata extraction for legal text consolidation. In Proceedings of the 12th International Conference on Artificial Intelligence and Law (ICAIL’09). 40--49.

Digital Library

[55]

Sutton, C. and McCallum, A. 2006. An Introduction to Conditional Random Fields for Relational Learning. MIT Press.

[56]

Takano, K., Nakamura, M., Oyama, Y., and Shimazu, A. 2010. Semantic analysis of paragraphs consisting of multiple sentences - Towards development of a logical formulation system. In Proceedings of the 23rd International Conference on Legal Knowledge and Information Systems (JURIX’10). 117--126.

Digital Library

[57]

Tanaka, K. 1998. About semantic function of the legal-effect’s restrictive part. Nat. Lang. 98, 21, 1--8.

[58]

Tanaka, K., Kawazoe, I., and Narita, H. 1993. Standard structure of legal provisions for the legal knowledge processing by natural language (in Japanese). Res. rep. on Natural Language Processing, IPSJ. 79--86.

[59]

Tsuruoka, Y. 2006. A simple C++ library for maximum entropy classification. http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/maxent/.

[60]

Vapnik, V. N. 1998. Statistical Learning Theory. Wiley-Interscience.

[61]

Venturi, G. 2010. Legal language and legal knowledge management applications. Semant. Proc. Legal Texts, 3--26.

Digital Library

[62]

Vilain, M., Burger, J., Aberdeen, J., Connolly, D., and Hirschman, L. 1995. A model-theoretic coreference scoring scheme. In Proceedings of the Message Understanding Conference (MUC’95). 45--52.

Digital Library

[63]

Völker, J., Langa, S., and Sure, Y. 2008. Supporting the construction of Spanish legal ontologies with Text2Onto. Comput. Mod. Law, 105--112.

[64]

Walter, S. 2008. Linguistic description and automatic extraction of definitions from German court decisions. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08).

[65]

Walter, S. and Pinkal, M. 2006. Automatic extraction of definitions from German court decisions. In Proceedings of the Workshop on Information Extraction Beyond The Document (COLING’06). 20--28.

Digital Library

[66]

Wyner, A., Palau, R., Moens, M., and Milward, D. 2010. Approaches to text mining arguments from legal cases. Semant. Proc. Legal Texts, 60--79.

Digital Library

[67]

Zhao, H. and Kit, C. 2008. Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’08). 203--207.

Digital Library

Cited By

Bozdag MSevim NKoç A(2023)Measuring and Mitigating Gender Bias in Legal Contextualized Language ModelsACM Transactions on Knowledge Discovery from Data10.1145/3628602Online publication date: 18-Oct-2023
https://dl.acm.org/doi/10.1145/3628602
Sevim NŞahinuç FKoç A(2022)Gender bias in legal corpora and debiasing itNatural Language Engineering10.1017/S1351324922000122(1-34)Online publication date: 30-Mar-2022
https://doi.org/10.1017/S1351324922000122
Wang XSong NZhou HCheng H(2022)The representation of argumentation in scientific papersJournal of the Association for Information Science and Technology10.1002/asi.2459073:6(863-878)Online publication date: 26-Apr-2022
https://dl.acm.org/doi/10.1002/asi.24590
Show More Cited By

Index Terms

A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Benchmarks for Indian Legal NLP: A Survey
New Frontiers in Artificial Intelligence
Abstract
Legal text is significantly different from English text (e.g. Wikipedia, News) used for training most natural language processing (NLP) algorithms. As a result, the state of the art algorithms (e.g. GPT-3, BERT derivatives), need additional effort ...
Logical Structure Recovery in Scholarly Articles with Rich Document Features

Scholarly digital libraries increasingly provide analytics to information within documents themselves. This includes information about the logical document structure of use to downstream components, such as search, navigation, and summarization. In this ...
Interactive Logical Structures
Elegant Structures in Computation. To Andrzej Ehrenfeucht on His 85th Birthday

We present an extension of logical structures, called interactive logical structures, for reasoning about interactive computations performed by Intelligent Systems or Complex Adaptive Systems. Reasoning based on such structures is called adaptive ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 12, Issue 1

March 2013

102 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/2425327

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2013

Accepted: 01 March 2012

Revised: 01 January 2012

Received: 01 October 2011

Published in TALIP Volume 12, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Japan Society for the Promotion of Science
JAIST Overseas Training Program for 3D Program Students
Grant-in-Aid for Scientific Research, Education and Research Center for Trustworthy e-Society

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
544
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bozdag MSevim NKoç A(2023)Measuring and Mitigating Gender Bias in Legal Contextualized Language ModelsACM Transactions on Knowledge Discovery from Data10.1145/3628602Online publication date: 18-Oct-2023
https://dl.acm.org/doi/10.1145/3628602
Sevim NŞahinuç FKoç A(2022)Gender bias in legal corpora and debiasing itNatural Language Engineering10.1017/S1351324922000122(1-34)Online publication date: 30-Mar-2022
https://doi.org/10.1017/S1351324922000122
Wang XSong NZhou HCheng H(2022)The representation of argumentation in scientific papersJournal of the Association for Information Science and Technology10.1002/asi.2459073:6(863-878)Online publication date: 26-Apr-2022
https://dl.acm.org/doi/10.1002/asi.24590
Trevisan-Groddek BJakobs E(2019)Linguistisches Text Mining – Neue Wege für die MarktforschungZukunft der Marktforschung10.1007/978-3-658-25449-0_11(173-190)Online publication date: 19-Jun-2019
https://doi.org/10.1007/978-3-658-25449-0_11
Wilson SSchaub FLiu FSathyendra KSmullen DZimmeck SRamanath RStory PLiu FSadeh NSmith N(2018)Analyzing Privacy Policies at ScaleACM Transactions on the Web10.1145/323066513:1(1-29)Online publication date: 4-Dec-2018
https://dl.acm.org/doi/10.1145/3230665
Tran OLuong T(2018)Towards Understanding User Requests in AI BotsPRICAI 2018: Trends in Artificial Intelligence10.1007/978-3-319-97304-3_66(864-877)Online publication date: 27-Jul-2018
https://doi.org/10.1007/978-3-319-97304-3_66
Bach NCham LThien TPhuong T(2017)Question analysis for Vietnamese legal question answering2017 9th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE.2017.8119451(154-159)Online publication date: Oct-2017
https://doi.org/10.1109/KSE.2017.8119451
Nguyen HNguyen VVu V(2017)A knowledge representation for Vietnamese legal document system2017 9th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE.2017.8119430(30-35)Online publication date: Oct-2017
https://doi.org/10.1109/KSE.2017.8119430
Nguyen Truong Son Nguyen Thi Phuong Duyen Ho Bao Quoc Nguyen L(2015)Recognizing logical parts in Vietnamese legal texts using Conditional Random FieldsThe 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)10.1109/RIVF.2015.7049865(1-6)Online publication date: Jan-2015
https://doi.org/10.1109/RIVF.2015.7049865
Kamola GSpytkowski MParadowski MMarkowska-Kaczmar U(2015)Image-based logical document structure recognitionPattern Analysis & Applications10.1007/s10044-014-0412-818:3(651-665)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1007/s10044-014-0412-8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents