Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleDecember 2024
ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction
AbstractE-commerce platforms require structured product data in the form of attribute-value pairs to offer features such as faceted product search or attribute-based product comparison. However, vendors often provide unstructured product descriptions, ...
- ArticleNovember 2024
DELA: Dual Embedding Using LSTM and Attention for Asset Tag Inference in Industrial Automation Systems
AbstractArtificial Intelligence (AI) is a key driver of the Industry 4.0 revolution. In industrial automation systems, data points of assets are represented by globally unique identifiers known as “Tags,” which often contain abbreviated asset and ...
- ArticleNovember 2024
A Decomposed-Distilled Sequential Framework for Text-to-Table Task with LLMs
PRICAI 2024: Trends in Artificial IntelligencePages 403–410https://doi.org/10.1007/978-981-96-0119-6_38AbstractLarge Language Models (LLMs) have shown their power in information extraction (IE) leveraging text-to-table task. However, suffering from lost-in-the-middle problem, LLMs struggle to extract all the necessary information within longer context, ...
- research-articleNovember 2024
Transforming Unstructured Sensitive Information into Structured Knowledge
ICAIF '24: Proceedings of the 5th ACM International Conference on AI in FinancePages 831–838https://doi.org/10.1145/3677052.3698602Information is crucial in today’s context, yet less than 20% of companies utilize their unstructured data due to its complexity. Information Extraction (IE) is vital for effective data use, but current IE models face four major issues. First, they often ...
- ArticleNovember 2024
SciHyp: A Fine-Grained Dataset Describing Hypotheses and Their Components from Scientific Articles
AbstractScientific discovery entails a detailed understanding and structuring of existing hypotheses—a challenging task due to the variety and complexity of the scientific texts. Despite efforts in domains like bio-medicine and invasion biology, there ...
-
- ArticleNovember 2024
InstructIE: A Bilingual Instruction-based Information Extraction Dataset
- Honghao Gui,
- Shuofei Qiao,
- Jintian Zhang,
- Hongbin Ye,
- Mengshu Sun,
- Lei Liang,
- Jeff Z. Pan,
- Huajun Chen,
- Ningyu Zhang
AbstractLarge language models can perform well on general natural language tasks, but their effectiveness is still suboptimal for information extraction (IE). Recent works indicate that the main reason lies in the lack of extensive data on IE ...
- ArticleNovember 2024
Overview of the NLPCC 2024 Shared Task 2: Nominal Compound Chain Extraction
Natural Language Processing and Chinese ComputingPages 463–470https://doi.org/10.1007/978-981-97-9443-0_41AbstractNominal compound chain extraction (NCCE) represents an emerging task within the domain of natural language processing, exhibiting significant potential for application in various downstream tasks, including relation extraction and summarization, ...
- ArticleNovember 2024
Retrieval-Augmented Code Generation for Universal Information Extraction
- Yucan Guo,
- Zixuan Li,
- Xiaolong Jin,
- Yantao Liu,
- Yutao Zeng,
- Wenxuan Liu,
- Xiang Li,
- Pan Yang,
- Long Bai,
- Jiafeng Guo,
- Xueqi Cheng
Natural Language Processing and Chinese ComputingPages 30–42https://doi.org/10.1007/978-981-97-9434-8_3AbstractInformation Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts. Recently, Large Language Models (LLMs) with code-style prompts have demonstrated powerful capabilities in IE tasks. ...
- articleNovember 2024
Text Analysis on Green Supply Chain Practices of Electronic Companies
International Journal of Decision Support System Technology (IJDSST-IGI), Volume 16, Issue 1Pages 1–16https://doi.org/10.4018/IJDSST.358950The electronics industry is one of the major regulated industries in the United States that is profoundly impacted by environmental issues. In this study, we use natural language processing (NLP) techniques to analyze reports from major electronics ...
- research-articleOctober 2024
Enhancing Keyword Extraction from Academic Articles Using Highlights
Proceedings of the Association for Information Science and Technology (PRA2), Volume 61, Issue 1Pages 1147–1149https://doi.org/10.1002/pra2.1213ABSTRACTKeywords facilitate rapid comprehension of academic papers for scholars, enhancing research efficiency. As some papers lack author‐assigned keywords, automated keyword extraction becomes crucial. Addressing the limited utilization of external ...
- research-articleOctober 2024
Building a Multimodal Dataset of Academic Paper for Keyword Extraction
Proceedings of the Association for Information Science and Technology (PRA2), Volume 61, Issue 1Pages 435–446https://doi.org/10.1002/pra2.1040ABSTRACTUp to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, ...
- ArticleSeptember 2024
Learning Reading Order via Document Layout with Layout2Pos
Linking Theory and Practice of Digital LibrariesPages 3–19https://doi.org/10.1007/978-3-031-72437-4_1AbstractDue to their remarkable performance, general-purpose multimodal pre-trained language models have gained widespread adoption for Document Understanding tasks. The majority of pre-trained language models rely on serialized text, extracted using ...
VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and AnalysisPages 704–716https://doi.org/10.1145/3650212.3680314Businesses often need to query visually rich documents (VRDs), e.g., purchase receipts, medical records, and insurance forms, among many other forms from multiple vendors, to make informed decisions. As such, several techniques have been proposed to ...
- ArticleSeptember 2024
Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network
Document Analysis and Recognition - ICDAR 2024Pages 248–263https://doi.org/10.1007/978-3-031-70552-6_15AbstractThis paper presents a novel approach to information extraction (IE) from visually rich documents (VRD) by employing a directed weighted graph representation to capture relationships among various VRD components. In contrast to conventional methods ...
- ArticleSeptember 2024
Are Layout Analysis and OCR Still Useful for Document Information Extraction Using Foundation Models?
Document Analysis and Recognition - ICDAR 2024Pages 175–191https://doi.org/10.1007/978-3-031-70546-5_11AbstractWith the advent of end-to-end models and the remarkable performance of foundation models, the question arises regarding the relevance of preliminary steps, such as layout analysis and optical character recognition (OCR), for information extraction ...
- ArticleSeptember 2024
Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval
AbstractThis paper introduces Fetch-A-Set (FAS), a comprehensive benchmark tailored for legislative historical document analysis systems, addressing the challenges of large-scale document retrieval in historical contexts. The benchmark comprises a vast ...
- ArticleSeptember 2024
Embedding Layout in Text for Document Understanding Using Large Language Models
Document Analysis and Recognition - ICDAR 2024Pages 280–293https://doi.org/10.1007/978-3-031-70533-5_17AbstractIn this paper, we address the challenge of effectively utilizing Large Language Models (LLMs) for Visually Rich Document Understanding (VRDU), a key part of intelligent document processing systems. While LLMs excel in various Natural Language ...
- ArticleSeptember 2024
Reading Order Independent Metrics for Information Extraction in Handwritten Documents
- David Villanova-Aparisi,
- Solène Tarride,
- Carlos-D. Martínez-Hinarejos,
- Verónica Romero,
- Christopher Kermorvant,
- Moisés Pastor-Gadea
Document Analysis and Recognition - ICDAR 2024Pages 191–215https://doi.org/10.1007/978-3-031-70536-6_12AbstractInformation Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance ...
- ArticleSeptember 2024
Using LLMs for the Extraction and Normalization of Product Attribute Values
Advances in Databases and Information SystemsPages 217–230https://doi.org/10.1007/978-3-031-70626-4_15AbstractProduct offers on e-commerce websites often consist of a product title and a textual product description. In order to enable features such as faceted product search or to generate product comparison tables, it is necessary to extract structured ...
- ArticleAugust 2024
Logit Adjustment with Normalization and Augmentation in Few-Shot Named Entity Recognition
Knowledge Science, Engineering and ManagementPages 398–410https://doi.org/10.1007/978-981-97-5498-4_31AbstractWe study the problem of few-shot learning in Name Entity Recognition(FS-NER). Specifically, unlike other sequence labeling-based models, that mainly focus on better representations, we leverage logit adjustment technology to alleviate the problem ...