Nothing Special   »   [go: up one dir, main page]

Skip to main content

OntoHuman: Ontology-Based Information Extraction Tools with Human-in-the-Loop Interaction

  • Conference paper
  • First Online:
Cooperative Design, Visualization, and Engineering (CDVE 2022)

Abstract

This paper presents OntoHuman, a toolchain for involving humans in a process of automatic information extraction and ontology enhancement. Document Semantic Annotation Tool (DSAT) [13], a user interface of OntoHuman, offers an automatic function to extract information in the form of key-value-unit tuples from PDF documents based on ontologies. Additionally, it allows users to provide feedback to improve the ontologies used. Although the information extraction can be improved with the ontology, our use cases were previously limited to an area of space engineering. OntoHuman now tackles this shortcoming by allowing users to upload their customized ontologies. This entends usages to various domains and enables this shareable knowledge to be used cooperatively. Then we display the ontologies in a node-link representation so they are easier to understand. Another major improvement in OntoHuman is the graph data points extraction, which is still missing in the existing information extraction tools. The application of OntoHuman can be used for documents related to any engineering domain and makes the work with ontologies intuitive and collaborative for users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://arxiv.org/pdf/1906.06752.pdf.

  2. 2.

    https://nfdi4ing.de/community-hub/community/.

References

  1. Adnan, K., Akbar, R.: Limitations of information extraction methods and techniques for heterogeneous unstructured big data. Int. J. Eng. Bus. Manag. 11 (2019). https://doi.org/10.1177/1847979019890771

  2. Anikin, A., Litovkin, D., Kultsova, M., Sarkisova, E., Petrova, T.: Ontology visualization: approaches and software tools for visual representation of large ontologies in learning. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds.) CIT &DS 2017. CCIS, vol. 754, pp. 133–149. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65551-2_10

    Chapter  Google Scholar 

  3. Classifying text with AWS Textract. https://www.bakertilly.com/insights/classifying-text-with-aws-textract. Accessed 8 Apr 2022

  4. Buey, M.G., Garrido, A.L., Bobed, C., Ilarri, S.: The AIS project: boosting information extraction from legal documents by using ontologies. In: ICAART (2016)

    Google Scholar 

  5. Camelot: PDF Table Extraction for Humans. https://camelot-py.readthedocs.io/en/master/. Accessed 8 Apr 2022

  6. ConTrOn. Contron - spacecraft parts ontology 1.2, May 2020

    Google Scholar 

  7. Decatur, D., Krishnan, S.: Vizextract: automatic relation extraction from data visualizations. CoRR abs/2112.03485 (2021)

    Google Scholar 

  8. Dudáš, M., Lohmann, S., Svátek, V., Pavlov, D.: Ontology visualization methods and tools: a survey of the state of the art. Knowl. Eng. Rev. 33, e10 (2018)

    Article  Google Scholar 

  9. Jusoh, S., Awajan, A., Obeid, N.: The use of ontology in clinical information extraction. J. Phys. Conf. Ser. 1529(5), 052083 (2020)

    Google Scholar 

  10. Kaló, A.Z., Sipos, M.L.: Key-value pair searching system via tesseract OCR and post processing. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000461–000464 (2021)

    Google Scholar 

  11. Konys, A.: Towards knowledge handling in ontology-based information extraction systems. Procedia Comput. Sci. 126, 2208–2218 (2018). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia

    Google Scholar 

  12. Luo, J., Li, Z., Wang, J., Lin, C.-Y.: Chartocr: data extraction from charts images via a deep hybrid framework. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1916–1924 (2021)

    Google Scholar 

  13. Opasjumruskit, K., Peters, D., Schindler, S.: DSAT: ontology-based information extraction on technical data sheets. In: SEMWEB (2020)

    Google Scholar 

  14. How to extract data out of a PDF, February 2021. https://academy.datawrapper.de/article/135-how-to-extract-data-out-of-pdfs

  15. PDFMiner - a python package for extracting information from PDF documents. https://pdfminersix.readthedocs.io/en/latest/. Accessed 8 Apr 2022

  16. Peters, D., Fischer, P.M., Schäfer, P.M., Opasjumruskit, K., Gerndt, A.: Digital availability of product information for collaborative engineering of spacecraft. In: Luo, Y. (ed.) CDVE 2019. LNCS, vol. 11792, pp. 74–83. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30949-7_9

    Chapter  Google Scholar 

  17. Rizvi, S.T.R., Mercier, D., Agne, S., Erkel, S., Dengel, A., Ahmed, S.: Ontology-based information extraction from technical documents. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications (2018)

    Google Scholar 

  18. Tesseract Open Source OCR Engine. https://tesseract-ocr.github.io/. Accessed 13 Apr 2022

  19. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  20. Wang, Z., Zhan, M., Liu, X., Liang, D.: Docstruct: a multimodal method to extract hierarchy structure in document for general form understanding. arXiv:abs/2010.11685 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kobkaew Opasjumruskit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Opasjumruskit, K., Böning, S., Schindler, S., Peters, D. (2022). OntoHuman: Ontology-Based Information Extraction Tools with Human-in-the-Loop Interaction. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2022. Lecture Notes in Computer Science, vol 13492. Springer, Cham. https://doi.org/10.1007/978-3-031-16538-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16538-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16537-5

  • Online ISBN: 978-3-031-16538-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics