Abstract
This paper presents OntoHuman, a toolchain for involving humans in a process of automatic information extraction and ontology enhancement. Document Semantic Annotation Tool (DSAT) [13], a user interface of OntoHuman, offers an automatic function to extract information in the form of key-value-unit tuples from PDF documents based on ontologies. Additionally, it allows users to provide feedback to improve the ontologies used. Although the information extraction can be improved with the ontology, our use cases were previously limited to an area of space engineering. OntoHuman now tackles this shortcoming by allowing users to upload their customized ontologies. This entends usages to various domains and enables this shareable knowledge to be used cooperatively. Then we display the ontologies in a node-link representation so they are easier to understand. Another major improvement in OntoHuman is the graph data points extraction, which is still missing in the existing information extraction tools. The application of OntoHuman can be used for documents related to any engineering domain and makes the work with ontologies intuitive and collaborative for users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adnan, K., Akbar, R.: Limitations of information extraction methods and techniques for heterogeneous unstructured big data. Int. J. Eng. Bus. Manag. 11 (2019). https://doi.org/10.1177/1847979019890771
Anikin, A., Litovkin, D., Kultsova, M., Sarkisova, E., Petrova, T.: Ontology visualization: approaches and software tools for visual representation of large ontologies in learning. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds.) CIT &DS 2017. CCIS, vol. 754, pp. 133–149. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65551-2_10
Classifying text with AWS Textract. https://www.bakertilly.com/insights/classifying-text-with-aws-textract. Accessed 8 Apr 2022
Buey, M.G., Garrido, A.L., Bobed, C., Ilarri, S.: The AIS project: boosting information extraction from legal documents by using ontologies. In: ICAART (2016)
Camelot: PDF Table Extraction for Humans. https://camelot-py.readthedocs.io/en/master/. Accessed 8 Apr 2022
ConTrOn. Contron - spacecraft parts ontology 1.2, May 2020
Decatur, D., Krishnan, S.: Vizextract: automatic relation extraction from data visualizations. CoRR abs/2112.03485 (2021)
Dudáš, M., Lohmann, S., Svátek, V., Pavlov, D.: Ontology visualization methods and tools: a survey of the state of the art. Knowl. Eng. Rev. 33, e10 (2018)
Jusoh, S., Awajan, A., Obeid, N.: The use of ontology in clinical information extraction. J. Phys. Conf. Ser. 1529(5), 052083 (2020)
Kaló, A.Z., Sipos, M.L.: Key-value pair searching system via tesseract OCR and post processing. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000461–000464 (2021)
Konys, A.: Towards knowledge handling in ontology-based information extraction systems. Procedia Comput. Sci. 126, 2208–2218 (2018). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia
Luo, J., Li, Z., Wang, J., Lin, C.-Y.: Chartocr: data extraction from charts images via a deep hybrid framework. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1916–1924 (2021)
Opasjumruskit, K., Peters, D., Schindler, S.: DSAT: ontology-based information extraction on technical data sheets. In: SEMWEB (2020)
How to extract data out of a PDF, February 2021. https://academy.datawrapper.de/article/135-how-to-extract-data-out-of-pdfs
PDFMiner - a python package for extracting information from PDF documents. https://pdfminersix.readthedocs.io/en/latest/. Accessed 8 Apr 2022
Peters, D., Fischer, P.M., Schäfer, P.M., Opasjumruskit, K., Gerndt, A.: Digital availability of product information for collaborative engineering of spacecraft. In: Luo, Y. (ed.) CDVE 2019. LNCS, vol. 11792, pp. 74–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30949-7_9
Rizvi, S.T.R., Mercier, D., Agne, S., Erkel, S., Dengel, A., Ahmed, S.: Ontology-based information extraction from technical documents. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications (2018)
Tesseract Open Source OCR Engine. https://tesseract-ocr.github.io/. Accessed 13 Apr 2022
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wang, Z., Zhan, M., Liu, X., Liang, D.: Docstruct: a multimodal method to extract hierarchy structure in document for general form understanding. arXiv:abs/2010.11685 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Opasjumruskit, K., Böning, S., Schindler, S., Peters, D. (2022). OntoHuman: Ontology-Based Information Extraction Tools with Human-in-the-Loop Interaction. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2022. Lecture Notes in Computer Science, vol 13492. Springer, Cham. https://doi.org/10.1007/978-3-031-16538-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-16538-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16537-5
Online ISBN: 978-3-031-16538-2
eBook Packages: Computer ScienceComputer Science (R0)