Nothing Special   »   [go: up one dir, main page]

Skip to content

Named Entity Recognition with an decoder-only (autoregressive) LLM using HuggingFace

Notifications You must be signed in to change notification settings

d-kleine/NER_decoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Named Entity Recognition with LLaMA 3.2

Overview

This repository contains a Named Entity Recognition (NER) implementation using the LLaMA 3.2 1B model using HuggingFace, specifically leveraging its autoregressive (decoder-only) architecture. The project demonstrates how to adapt a LLaMA model for NER tasks by disabling causal masking, enabling bidirectional attention, and applying Low-Rank Adaptation (LoRA) for fine-tuning.

Important

This is a showcase project and thus not fully optimized or thoroughly tested. The finetuned model in its current state overfits, meaning it performs well on the training set but struggles to generalize to new, unseen data. This overfitting likely results from the model's complexity relative to the dataset size or insufficient regularization during fine-tuning. Further steps such as hyperparameter tuning, and more robust regularization techniques would be necessary to improve generalization. Additionally, the model's bidirectional attention mechanism, enabled by disabling causal masking, may require further refinement to balance its performance across both training and evaluation datasets.

Purpose

Traditionally, encoder-only models like BERT have dominated NER tasks due to their ability to process input text bidirectionally, capturing rich contextual information. However, by removing the causal mask in LLaMA, we enable it to leverage bidirectional context while maintaining its strengths in generative tasks, making it a versatile solution for NER. Compared to encoder-only models like BERT, LLaMA offers several advantages:

  • scalability: more effectively with larger datasets and parameters, enabling better performance on zero-shot and few-shot tasks.
  • its architecture can handle both generative and understanding tasks, making it more flexible.

This makes LLaMA a (potentially) superior choice when both text generation and understanding are needed, especially in large-scale applications.

Features

  • Model: LLaMA 3.2, a decoder-only transformer-based large language model (LLM). The 1B variant provides enough capacity to handle complex tasks like NER while remaining computationally feasible for experimentation.
  • Attention Mechanism: Causal masking is disabled to allow bidirectional attention, suitable for NER tasks.
  • Optimization:
    • LoRA: Low-Rank Adaptation is used to efficiently fine-tune the model with fewer trainable parameters.
    • AdamW Optimizer: The AdamW optimizer is used by default, as implemented in Hugging Face's Transformers library.
    • Cosine Learning Rate Scheduler: A cosine learning rate schedule is applied for smoother convergence.

Dataset

NER Tagging Structure

The NER (Named Entity Recognition) tags used in this project follow the BIO tagging scheme, which is commonly used in NER tasks to label entities in text. Here's a breakdown of what each tag means:

Tag Structure

  • B-: Indicates the beginning of a named entity.
  • I-: Indicates that the token is inside a named entity but not at the beginning.
  • O: Represents tokens that are not part of any named entity.

Entity Types

  • PER: Person (e.g., names of individuals)
  • ORG: Organization (e.g., companies, institutions)
  • LOC: Location (e.g., cities, countries, geographical regions)
  • MISC: Miscellaneous entities that don't fall into the other categories (e.g., events, nationalities)

Setup Instruction

  1. Ensure Python 3.11 is Installed

    • Verify that Python 3.11 is installed on your system by running:
      python3.11 --version
    • If Python 3.11 is not installed, you will need to install it first. This can typically be done through a package manager or by downloading from the official Python website.
  2. Navigate to Your Project Directory

    • Open your terminal and change to the directory where you want to set up your project:
      cd /path/to/your/project
  3. Create a Virtual Environment

    • Use the venv module to create a virtual environment:
      python3.11 -m venv .venv
    • This command creates a new directory named .venv in your project folder, which contains the virtual environment.
  4. Activate the Virtual Environment

    • Activate the virtual environment to start using it:
      • On macOS and Linux:
        source .venv/bin/activate
      • On Windows:
        .venv\Scripts\activate
  5. Upgrade pip

    • Before installing packages, ensure you have the latest version of pip:
      pip install --upgrade pip
  6. Install Packages from requirements.txt

    • Use pip to install all required packages listed in your requirements.txt file:
      pip install -r requirements.txt
  7. Deactivate the Virtual Environment (Optional)

    • Once you are done working in the virtual environment, you can deactivate it by simply running:
      deactivate

About

Named Entity Recognition with an decoder-only (autoregressive) LLM using HuggingFace

Topics

Resources

Stars

Watchers

Forks