Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles

Published: 01 March 2013 Publication History

Abstract

Analyzing logical structures of texts is important to understanding natural language, especially in the legal domain, where legal texts have their own specific characteristics. Recognizing logical structures in legal texts does not only help people in understanding legal documents, but also in supporting other tasks in legal text processing.
In this article, we present a new task, learning logical structures of paragraphs in legal articles, which is studied in research on Legal Engineering. The goals of this task are recognizing logical parts of law sentences in a paragraph, and then grouping related logical parts into some logical structures of formulas, which describe logical relations between logical parts. We present a two-phase framework to learn logical structures of paragraphs in legal articles. In the first phase, we model the problem of recognizing logical parts in law sentences as a multi-layer sequence learning problem, and present a CRF-based model to recognize them. In the second phase, we propose a graph-based method to group logical parts into logical structures. We consider the problem of finding a subset of complete subgraphs in a weighted-edge complete graph, where each node corresponds to a logical part, and a complete subgraph corresponds to a logical structure. We also present an integer linear programming formulation for this optimization problem. Our models achieve 74.37% in recognizing logical parts, 80.08% in recognizing logical structures, and 58.36% in the whole task on the Japanese National Pension Law corpus. Our work provides promising results for further research on this interesting task.

References

[1]
Bach, N. X. 2011. A Study on Recognition of Requisite Part and Effectuation Part in Law Sentences. M.S. thesis, School of Information Science, Japan Advanced Institute of Science and Technology.
[2]
Bach, N. X., Minh, N. L., and Shimazu, A. 2010. Exploring contributions of words to recognition of requisite part and effectuation part in law sentences. In Proceedings of the 4th International Workshop on Juris-Informatics (JURISIN’10). 121--132.
[3]
Bach, N. X., Minh, N. L., and Shimazu, A. 2011. RRE task: The task of recognition of requisite part and effectuation part in law sentences. Int. J. Comp. Proc. Lang. 23, 2, 109--130.
[4]
Berger, A. L., Pietra, V. J. D., and Pietra, S. A. D. 1996. A maximum entropy approach to natural language processing. Computat. Linguist. 22.
[5]
Boser, B. E., Guyon, I., and Vapnik, V. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (CLT’92). 144--152.
[6]
Brighi, R., Lesmo, L., Mazzei, A., Palmirani, M., and Radicioni, D. 2008. Towards semantic interpretation of legal modifications through deep syntactic analysis. In Proceedings of the 21st International Conference on Legal Knowledge and Information Systems (JURIX’08). 202--206.
[7]
Bron, C. and Kerbosch, J. 1973. Algorithm 457: Finding all cliques of an undirected graph. Comm. ACM 16, 9, 575--577.
[8]
Byrd, R. H., Nocedal, J., and Schnabel, R. B. 1994. Representations of quasi-Newton matrices and their use in limited memory methods. Math. Prog. 63, 4, 129--156.
[9]
Carreras, X. and Marquez, L. 2005. Filtering-ranking perceptron learning for partial parsing. Mach. Learn. 60, 1--3, 41--71.
[10]
Carreras, X., Marquez, L., Punyakanok, V., and Roth, D. 2002. Learning and inference for clause identification. In Proceedings of the 13th European Conference on Machine Learning (ECML’02). 35--47.
[11]
Chih-Chung, C. and Chih-Jen, L. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Tech. 2, 27, 1--27. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[12]
Clarke, J. and Lapata, M. 2008. Global inference for sentence compression: An integer linear programming approach. J. Artif. Intell. Res. 31, 399--429.
[13]
Cortes, C. and Vapnik, V. 1995. Support-vector networks. Mach. Learn. 20, 3, 273--297.
[14]
Cover, T. and Hart, P. 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 1, 21--27.
[15]
Darroch, J. N. and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. Annals Math. Stat. 43, 5, 1470--1480.
[16]
Denis, P. and Baldridge, J. 2007. Joint determination of anaphoricity and coreference resolution using integer programming. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’07). 236--243.
[17]
Dyer, C. 2009. Using a maximum entropy model to build segmentation lattices for MT. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’09). 406--414.
[18]
Forney, G. D. J. 1973. The viterbi algorithm. Proc. IEEE 61, 268--278.
[19]
Grover, C., Hachey, B., Hughson, I., and Korycinski, C. 2003. Automatic summarization of legal documents. In Proceedings of the 9th International Conference on Artificial Intelligence and Law (ICAIL). 243--251.
[20]
Hernault, H., Prendinger, H., duVerle, D. A., and Ishizuka, M. 2010. HILDA: A discourse parser using support vector machine classification. Dial.&Disc. 1, 3, 1--33.
[21]
Hsu, C. W., Chang, C. C., and Lin, C. J. 2010. A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
[22]
Joachims, T. 1998. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning (ECML’98).
[23]
Katayama, T. 2005. The current status of the art of the 21st COE programs in the information sciences field. Verifiable and evolvable e-society - realization of trustworthy e-society by computer science - (in Japanese). Inf. Proc. Soc. Japan 46, 5, 515--521.
[24]
Katayama, T. 2007. Legal engineering - An engineering approach to laws in e-society age. In Proceedings of the 1st International Workshop on Juris-Informatics (JURISIN’07).
[25]
Katayama, T., Shimazu, A., Tojo, S., Futatsugi, K., and Ochimizu, K. 2008. E-society and legal engineering (in Japanese). J. Jap. Soc. Artif. Intell. 23, 4, 529--536.
[26]
Kimura, Y., Nakamura, M., and Shimazu, A. 2009. Treatment of legal sentences including itemized and referential expressions - Towards translation into logical forms. In New Frontiers in Artificial Intelligence, Lecture Notes in Artificial Intelligence, vol. 5447, 242--253.
[27]
Koeling, R. 2000. Chunking with maximum entropy models. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’00). 139--141.
[28]
Kudo, T. 2010. CRF++: Yet another CRF toolkit. http://crfpp.sourceforge.net/.
[29]
Kudo, T. and Matsumoto, Y. 2001. Chunking with support vector machines. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’01).
[30]
Kudo, T. and Matsumoto, Y. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’02). 63--69. http://code.google.com/p/cabocha/.
[31]
Kudo, T., Yamamoto, K., and Matsumoto, Y. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 230--237.
[32]
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 282--289.
[33]
Lame, G. 2004. Using NLP techniques to identify legal ontology components: Concepts and relations. Artifi. Intell. Law 12, 4, 379--396.
[34]
Lee, Y. K. and Ng, H. T. 2002. An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’02). 41--48.
[35]
Martins, A., Smith, N., and Xing, E. 2009. Concise integer linear programming formulations for dependency parsing. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJCNLP’09). 342--350.
[36]
McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 591--598.
[37]
McCarty, L. 2007. Deep semantic interpretations of legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law (ICAIL’07). 217--224.
[38]
Mitchell, T. 1997. Machine Learning. MIT Press and McGraw Hill.
[39]
Moens, M.-F., Boiy, E., Palau, R., and Reed, C. 2007. Automatic detection of arguments in legal texts. In Proceedings of the 11th International Conference on Artificial Intelligence and Law (ICAIL’07). 225--230.
[40]
Muramatsu, M., Yasumura, Y., and Nitta, K. 2002. A tagging tool for logical structure of legal sentences. Tech. rep. IEICE.
[41]
Murata, M., Uchimoto, K., Ma, Q., and Isahara, H. 2000. Bunsetsu identification using category-exclusive rules. In Proceedings of the 18th International Conference on Computational Linguistics (COLING’00). 565--571.
[42]
Nakamura, M., Nobuoka, S., and Shimazu, A. 2007. Towards translation of legal sentences into logical forms. In Proceedings of the 1st International Workshop on Juris-Informatics (JURISIN’07).
[43]
Nguyen, L. M., Shimazu, A., and Phan, H. X. 2006. Semantic parsing with structured SVM ensemble classification models. In Proceedings of the Joint 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL’06). 619--626.
[44]
Nivre, J., Hall, J., Nilsson, J., Eryiǧit, G., and Marinov, S. 2006. Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’06). 221--225.
[45]
Nocedal, J. 1980. Updating quasi-Newton matrices with limited storage. Math. Computat. 35, 151, 773--782.
[46]
Pala, K., Rychlý, P., and Šmerk, P. 2007. Morphological analysis of law texts. In Proceedings of the 1st Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN’07). 21--26.
[47]
Pala, K., Rychlý, P., and Šmerk, P. 2010. Automatic identification of legal terms in Czech legal texts. Semant. Proc. Legal Texts, 83-94.
[48]
Peng, F. and McCallum, A. 2006. Information extraction from research papers using conditional random fields. Inf. Proc. Man. 42, 4, 963--979.
[49]
Punyakanok, V., Roth, D., Yih, W., and Zimak, D. 2004. Semantic role labeling via integer linear programming inference. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). 1346--1352.
[50]
Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’96). 133--142.
[51]
Safavian, S. and Landgrebe, D. 1991. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cyber. 21, 3, 660--674.
[52]
Saias, J. and Quaresma, P. 2005. A methodology to create legal ontologies in a logic programming based web information retrieval system. Law Semant. Web, 185--200.
[53]
Sha, F. P. 2003. Shallow parsing with conditional random fields. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’03). 213--220.
[54]
Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., and Montemagni, S. 2009. NLP-based metadata extraction for legal text consolidation. In Proceedings of the 12th International Conference on Artificial Intelligence and Law (ICAIL’09). 40--49.
[55]
Sutton, C. and McCallum, A. 2006. An Introduction to Conditional Random Fields for Relational Learning. MIT Press.
[56]
Takano, K., Nakamura, M., Oyama, Y., and Shimazu, A. 2010. Semantic analysis of paragraphs consisting of multiple sentences - Towards development of a logical formulation system. In Proceedings of the 23rd International Conference on Legal Knowledge and Information Systems (JURIX’10). 117--126.
[57]
Tanaka, K. 1998. About semantic function of the legal-effect’s restrictive part. Nat. Lang. 98, 21, 1--8.
[58]
Tanaka, K., Kawazoe, I., and Narita, H. 1993. Standard structure of legal provisions for the legal knowledge processing by natural language (in Japanese). Res. rep. on Natural Language Processing, IPSJ. 79--86.
[59]
Tsuruoka, Y. 2006. A simple C++ library for maximum entropy classification. http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/maxent/.
[60]
Vapnik, V. N. 1998. Statistical Learning Theory. Wiley-Interscience.
[61]
Venturi, G. 2010. Legal language and legal knowledge management applications. Semant. Proc. Legal Texts, 3--26.
[62]
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., and Hirschman, L. 1995. A model-theoretic coreference scoring scheme. In Proceedings of the Message Understanding Conference (MUC’95). 45--52.
[63]
Völker, J., Langa, S., and Sure, Y. 2008. Supporting the construction of Spanish legal ontologies with Text2Onto. Comput. Mod. Law, 105--112.
[64]
Walter, S. 2008. Linguistic description and automatic extraction of definitions from German court decisions. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08).
[65]
Walter, S. and Pinkal, M. 2006. Automatic extraction of definitions from German court decisions. In Proceedings of the Workshop on Information Extraction Beyond The Document (COLING’06). 20--28.
[66]
Wyner, A., Palau, R., Moens, M., and Milward, D. 2010. Approaches to text mining arguments from legal cases. Semant. Proc. Legal Texts, 60--79.
[67]
Zhao, H. and Kit, C. 2008. Parsing syntactic and semantic dependencies with two single-stage maximum entropy models. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL’08). 203--207.

Cited By

View all
  • (2023)Measuring and Mitigating Gender Bias in Legal Contextualized Language ModelsACM Transactions on Knowledge Discovery from Data10.1145/3628602Online publication date: 18-Oct-2023
  • (2022)Gender bias in legal corpora and debiasing itNatural Language Engineering10.1017/S1351324922000122(1-34)Online publication date: 30-Mar-2022
  • (2022)The representation of argumentation in scientific papersJournal of the Association for Information Science and Technology10.1002/asi.2459073:6(863-878)Online publication date: 26-Apr-2022
  • Show More Cited By

Index Terms

  1. A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian Language Information Processing
    ACM Transactions on Asian Language Information Processing  Volume 12, Issue 1
    March 2013
    102 pages
    ISSN:1530-0226
    EISSN:1558-3430
    DOI:10.1145/2425327
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2013
    Accepted: 01 March 2012
    Revised: 01 January 2012
    Received: 01 October 2011
    Published in TALIP Volume 12, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Japanese National Pension Law
    2. Legal text processing
    3. conditional random fields
    4. graph-based methods
    5. integer linear programming
    6. logical parts
    7. logical structures
    8. maximum entropy model
    9. sequence learning
    10. support vector machines

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Measuring and Mitigating Gender Bias in Legal Contextualized Language ModelsACM Transactions on Knowledge Discovery from Data10.1145/3628602Online publication date: 18-Oct-2023
    • (2022)Gender bias in legal corpora and debiasing itNatural Language Engineering10.1017/S1351324922000122(1-34)Online publication date: 30-Mar-2022
    • (2022)The representation of argumentation in scientific papersJournal of the Association for Information Science and Technology10.1002/asi.2459073:6(863-878)Online publication date: 26-Apr-2022
    • (2019)Linguistisches Text Mining – Neue Wege für die MarktforschungZukunft der Marktforschung10.1007/978-3-658-25449-0_11(173-190)Online publication date: 19-Jun-2019
    • (2018)Analyzing Privacy Policies at ScaleACM Transactions on the Web10.1145/323066513:1(1-29)Online publication date: 4-Dec-2018
    • (2018)Towards Understanding User Requests in AI BotsPRICAI 2018: Trends in Artificial Intelligence10.1007/978-3-319-97304-3_66(864-877)Online publication date: 27-Jul-2018
    • (2017)Question analysis for Vietnamese legal question answering2017 9th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE.2017.8119451(154-159)Online publication date: Oct-2017
    • (2017)A knowledge representation for Vietnamese legal document system2017 9th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE.2017.8119430(30-35)Online publication date: Oct-2017
    • (2015)Recognizing logical parts in Vietnamese legal texts using Conditional Random FieldsThe 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)10.1109/RIVF.2015.7049865(1-6)Online publication date: Jan-2015
    • (2015)Image-based logical document structure recognitionPattern Analysis & Applications10.1007/s10044-014-0412-818:3(651-665)Online publication date: 1-Aug-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media