Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472163.3472174acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

ICIX: A Semantic Information Extraction Architecture

Published: 07 September 2021 Publication History

Abstract

Public and private organizations produce and store huge amounts of documents which contain information about their domains in non-structured formats. Although from the final user’s point of view we can rely on different retrieval tools to access such data, the progressive structuring of such documents has important benefits for daily operations. While there exist many approaches to extract information in open domains, we lack tools flexible enough to adapt themselves to the particularities of different domains.
In this paper, we present the design and implementation of ICIX, an architecture to extract structured information from text documents. ICIX aims at obtaining specific information within a given domain, defined by means of an ontology which guides the extraction process. Besides, to optimize such an extraction, ICIX relies on document classification and data curation adapted to the particular domain. Our proposal has been implemented and evaluated in the specific context of managing legal documents, with promising results.

References

[1]
Martin Atzmueller, Peter Kluegl, and Frank Puppe. 2008. Rule-Based Information Extraction for Structured Data Acquisition using TextMarker. In Proc. of Intl. Conf. of Learning, Knowledge, and Adaptability (LWA 2008). 1–7.
[2]
Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Measuring semantic similarity between words using web search engines.WWW 7(2007), 757–766.
[3]
Javier Rincón Borobia, Carlos Bobed, Angel Luis Garrido, and Eduardo Mena. 2014. SIWAM: Using Social Data to Semantically Assess the Difficulties in Mountain Activities. In 10th International Conference on Web Information Systems and Technologies (WEBIST’14). 41–48.
[4]
María G. Buey, Angel Luis Garrido, Carlos Bobed, and Sergio Ilarri. 2016. The AIS Project: Boosting Information Extraction from Legal Documents by using Ontologies. In Proc. of Intl. Conf. on Agents and Artificial Intelligence (ICAART 2016). INSTICC, SciTePress, 438–445.
[5]
María G Buey, Cristian Roman, Angel Luis Garrido, Carlos Bobed, and Eduardo Mena. 2019. Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology. In Intelligent Methods and Big Data in Industrial Applications. Springer, 333–351.
[6]
Tin Tin Cheng, Jeffrey Leonard Cua, Mark Davies Tan, Kenneth Gerard Yao, and Rachel Edita Roxas. 2009. Information extraction from legal documents. In Proc. of Intl. Symposium on Natural Language Processing (SNLP 2009). 157–162.
[7]
Edward Curry, Andre Freitas, and Sean O’Riáin. 2010. The role of community-driven data curation for enterprises. In Linking enterprise data. 25–47.
[8]
Denis Andrei de Araujo, Sandro José Rigo, and Jorge Luis Victória Barbosa. 2017. Ontology-based information extraction for juridical events with case studies in Brazilian legal realm. Artificial Intelligence and Law 25, 4 (2017), 379–396.
[9]
Denis A de Araujo, Sandro J Rigo, Carolina Muller, and Rove Chishman. 2013. Automatic information extraction from texts with inference and linguistic knowledge acquisition rules. In Proc. of Intl. Conf. on Web Intelligence (WI 2013) and Intelligent Agent Technologies (IAT 2013), Vol. 3. IEEE, 151–154.
[10]
Jérôme Euzenat and Petko Valtchev. 2004. Similarity-based ontology alignment in OWL-lite. In Proc. of Intl. European Conf. on Artificial Intelligence (ECAI 2004). IOS press, 323–327.
[11]
Thomas R. Gruber. 1995. Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies 43, 5-6 (1995), 907–928.
[12]
Yaser Jararweh, Mahmoud Al-Ayyoub, Maged Fakirah, Luay Alawneh, and Brij B Gupta. 2019. Improving the performance of the needleman-wunsch algorithm using parallelization and vectorization techniques. Multimedia Tools and Applications 78, 4 (2019), 3961–3977.
[13]
Yong Jiang, Xinmin Wang, and Hai-Tao Zheng. 2014. A semantic similarity measure based on information distance for ontology alignment. Information Sciences 278(2014), 76–87.
[14]
Armand Joulin, Édouard Grave, Piotr Bojanowski, and Tomáš Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proc. of Intl. Conf. of the European Chapter of the Association for Computational Linguistics (ACL 2017): Vol. 2. 427–431.
[15]
Daniel Jurasky and James H Martin. 2000. Speech and Language Processing: An introduction to natural language Processing. Computational Linguistics and Speech Recognition (2000).
[16]
Agnieszka Konys. 2018. Towards knowledge handling in ontology-based information extraction systems. Procedia computer science 126 (2018), 2208–2218.
[17]
Víctor Labrador, Alvaro Peiró, Angel Luis Garrido, and Eduardo Mena. 2020. LEDAC: Optimizing the Performance of the Automatic Classification of Legal Documents through the Use of Word Embeddings. In Proc. of Intl. Conf. on Enterprise Information Systems (ICEIS 2020). 181–188.
[18]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International conference on machine learning. 1188–1196.
[19]
Kaijian Liu and Nora El-Gohary. 2017. Ontology-based semi-supervised conditional random fields for automated information extraction from bridge inspection reports. Automation in construction 81 (2017), 313–327.
[20]
Christopher D Manning and Hinrich Schütze. 1999. Foundations of statistical natural language processing. Vol. 999. MIT Press.
[21]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3–26.
[22]
Nassim Abdeldjallal Otmani, Malik Si-Mohammed, Catherine Comparot, and Pierre-Jean Charrel. 2019. Ontology-based approach to enhance medical web information extraction. International Journal of Web Information Systems (2019).
[23]
Prakash Poudyal and Paulo Quaresma. 2012. An hybrid approach for legal information extraction. In JURIX, Vol. 2012. 115–118.
[24]
Erhard Rahm and Hong Hai Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23, 4 (2000), 3–13.
[25]
K Rajbabu, Harshavardhan Srinivas, and S Sudha. 2018. Industrial information extraction through multi-phase classification using ontology for unstructured documents. Computers in Industry 100 (2018), 137–147.
[26]
Syed Tahseen Raza Rizvi, Dominique Mercier, Stefan Agne, Steffen Erkel, Andreas Dengel, and Sheraz Ahmed. 2018. Ontology-based Information Extraction from Technical Documents. In Proc. of Intl. Conf. on Agents and Artificial Intelligence (ICAART 2018). 493–500.
[27]
Mari Carmen Suárez-Figueroa. 2010. NeOn Methodology for building ontology networks: specification, scheduling and reuse. Ph.D. Dissertation. Universidad Politécnica de Madrid.
[28]
Kees van Noortwijk. 2017. Integrated Legal Information Retrieval; new developments and educational challenges. European Journal of Law and Technology 8, 1 (2017), 1–18.
[29]
Natalia Viani, Cristiana Larizza, Valentina Tibollo, Carlo Napolitano, Silvia G Priori, Riccardo Bellazzi, and Lucia Sacchi. 2018. Information extraction from Italian medical reports: An ontology-driven approach. International journal of medical informatics 111 (2018), 140–148.
[30]
Bernhard Waltl, Georg Bonczek, and Florian Matthes. 2018. Rule-based information extraction: Advantages, limitations, and perspectives. Jusletter IT (02 2018)(2018).
[31]
Daya C Wimalasuriya and Dejing Dou. 2010. Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science 36, 3 (2010), 306–323.

Cited By

View all
  • (2025)Streamlining Legal Document Management: A Knowledge-Driven Service PlatformSN Computer Science10.1007/s42979-025-03694-y6:2Online publication date: 14-Feb-2025
  • (2024)Ontology-Driven Automated Reasoning About Property CrimesBusiness & Information Systems Engineering10.1007/s12599-024-00886-3Online publication date: 12-Aug-2024
  1. ICIX: A Semantic Information Extraction Architecture

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IDEAS '21: Proceedings of the 25th International Database Engineering & Applications Symposium
    July 2021
    308 pages
    ISBN:9781450389914
    DOI:10.1145/3472163
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 September 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Information extraction
    2. ontologies
    3. text classification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • CCYT

    Conference

    IDEAS 2021

    Acceptance Rates

    Overall Acceptance Rate 74 of 210 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Streamlining Legal Document Management: A Knowledge-Driven Service PlatformSN Computer Science10.1007/s42979-025-03694-y6:2Online publication date: 14-Feb-2025
    • (2024)Ontology-Driven Automated Reasoning About Property CrimesBusiness & Information Systems Engineering10.1007/s12599-024-00886-3Online publication date: 12-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media