Keyword: Information Extraction : Search

Article

ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction

Information Integration and Web IntelligencePages 38–52https://doi.org/10.1007/978-3-031-78090-5_4

Abstract

E-commerce platforms require structured product data in the form of attribute-value pairs to offer features such as faceted product search or attribute-based product comparison. However, vendors often provide unstructured product descriptions, ...

Article

DELA: Dual Embedding Using LSTM and Attention for Asset Tag Inference in Industrial Automation Systems

AI 2024: Advances in Artificial IntelligencePages 3–15https://doi.org/10.1007/978-981-96-0348-0_1

Abstract

Artificial Intelligence (AI) is a key driver of the Industry 4.0 revolution. In industrial automation systems, data points of assets are represented by globally unique identifiers known as “Tags,” which often contain abbreviated asset and ...

Article

A Decomposed-Distilled Sequential Framework for Text-to-Table Task with LLMs

PRICAI 2024: Trends in Artificial IntelligencePages 403–410https://doi.org/10.1007/978-981-96-0119-6_38

Abstract

Large Language Models (LLMs) have shown their power in information extraction (IE) leveraging text-to-table task. However, suffering from lost-in-the-middle problem, LLMs struggle to extract all the necessary information within longer context, ...

research-article

Open Access

Transforming Unstructured Sensitive Information into Structured Knowledge

ICAIF '24: Proceedings of the 5th ACM International Conference on AI in FinancePages 831–838https://doi.org/10.1145/3677052.3698602

Information is crucial in today’s context, yet less than 20% of companies utilize their unstructured data due to its complexity. Information Extraction (IE) is vital for effective data use, but current IE models face four major issues. First, they often ...

Article

SciHyp: A Fine-Grained Dataset Describing Hypotheses and Their Components from Scientific Articles

The Semantic Web – ISWC 2024Pages 134–152https://doi.org/10.1007/978-3-031-77847-6_8

Abstract

Scientific discovery entails a detailed understanding and structuring of existing hypotheses—a challenging task due to the variety and complexity of the scientific texts. Despite efforts in domains like bio-medicine and invasion biology, there ...

Article

InstructIE: A Bilingual Instruction-based Information Extraction Dataset

The Semantic Web – ISWC 2024Pages 59–79https://doi.org/10.1007/978-3-031-77847-6_4

Abstract

Large language models can perform well on general natural language tasks, but their effectiveness is still suboptimal for information extraction (IE). Recent works indicate that the main reason lies in the lack of extensive data on IE ...

Article

Overview of the NLPCC 2024 Shared Task 2: Nominal Compound Chain Extraction

Natural Language Processing and Chinese ComputingPages 463–470https://doi.org/10.1007/978-981-97-9443-0_41

Abstract

Nominal compound chain extraction (NCCE) represents an emerging task within the domain of natural language processing, exhibiting significant potential for application in various downstream tasks, including relation extraction and summarization, ...

Article

Retrieval-Augmented Code Generation for Universal Information Extraction

Natural Language Processing and Chinese ComputingPages 30–42https://doi.org/10.1007/978-981-97-9434-8_3

Abstract

Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts. Recently, Large Language Models (LLMs) with code-style prompts have demonstrated powerful capabilities in IE tasks. ...

article

Text Analysis on Green Supply Chain Practices of Electronic Companies

International Journal of Decision Support System Technology (IJDSST-IGI), Volume 16, Issue 1Pages 1–16https://doi.org/10.4018/IJDSST.358950

The electronics industry is one of the major regulated industries in the United States that is profoundly impacted by environmental issues. In this study, we use natural language processing (NLP) techniques to analyze reports from major electronics ...

research-article

Enhancing Keyword Extraction from Academic Articles Using Highlights

Proceedings of the Association for Information Science and Technology (PRA2), Volume 61, Issue 1Pages 1147–1149https://doi.org/10.1002/pra2.1213

ABSTRACT

Keywords facilitate rapid comprehension of academic papers for scholars, enhancing research efficiency. As some papers lack author‐assigned keywords, automated keyword extraction becomes crucial. Addressing the limited utilization of external ...

research-article

Building a Multimodal Dataset of Academic Paper for Keyword Extraction

Proceedings of the Association for Information Science and Technology (PRA2), Volume 61, Issue 1Pages 435–446https://doi.org/10.1002/pra2.1040

ABSTRACT

Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, ...

Article

Learning Reading Order via Document Layout with Layout2Pos

Linking Theory and Practice of Digital LibrariesPages 3–19https://doi.org/10.1007/978-3-031-72437-4_1

Abstract

Due to their remarkable performance, general-purpose multimodal pre-trained language models have gained widespread adoption for Document Understanding tasks. The majority of pre-trained language models rely on serialized text, extracted using ...

research-article

Open Access

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and AnalysisPages 704–716https://doi.org/10.1145/3650212.3680314

Businesses often need to query visually rich documents (VRDs), e.g., purchase receipts, medical records, and insurance forms, among many other forms from multiple vendors, to make informed decisions. As such, several techniques have been proposed to ...

Article

Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network

Document Analysis and Recognition - ICDAR 2024Pages 248–263https://doi.org/10.1007/978-3-031-70552-6_15

Abstract

This paper presents a novel approach to information extraction (IE) from visually rich documents (VRD) by employing a directed weighted graph representation to capture relationships among various VRD components. In contrast to conventional methods ...

Article

Are Layout Analysis and OCR Still Useful for Document Information Extraction Using Foundation Models?

Document Analysis and Recognition - ICDAR 2024Pages 175–191https://doi.org/10.1007/978-3-031-70546-5_11

Abstract

With the advent of end-to-end models and the remarkable performance of foundation models, the question arises regarding the relevance of preliminary steps, such as layout analysis and optical character recognition (OCR), for information extraction ...

Article

Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval

Document Analysis SystemsPages 347–362https://doi.org/10.1007/978-3-031-70442-0_21

Abstract

This paper introduces Fetch-A-Set (FAS), a comprehensive benchmark tailored for legislative historical document analysis systems, addressing the challenges of large-scale document retrieval in historical contexts. The benchmark comprises a vast ...

Article

Embedding Layout in Text for Document Understanding Using Large Language Models

Document Analysis and Recognition - ICDAR 2024Pages 280–293https://doi.org/10.1007/978-3-031-70533-5_17

Abstract

In this paper, we address the challenge of effectively utilizing Large Language Models (LLMs) for Visually Rich Document Understanding (VRDU), a key part of intelligent document processing systems. While LLMs excel in various Natural Language ...

Article

Reading Order Independent Metrics for Information Extraction in Handwritten Documents

Document Analysis and Recognition - ICDAR 2024Pages 191–215https://doi.org/10.1007/978-3-031-70536-6_12

Abstract

Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance ...

Article

Using LLMs for the Extraction and Normalization of Product Attribute Values

Advances in Databases and Information SystemsPages 217–230https://doi.org/10.1007/978-3-031-70626-4_15

Abstract

Product offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured ...

Article

Logit Adjustment with Normalization and Augmentation in Few-Shot Named Entity Recognition

Knowledge Science, Engineering and ManagementPages 398–410https://doi.org/10.1007/978-981-97-5498-4_31

Abstract

We study the problem of few-shot learning in Name Entity Recognition(FS-NER). Specifically, unlike other sequence labeling-based models, that mainly focus on better representations, we leverage logit adjustment technology to alleviate the problem ...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences