PTS: Pivotal Token Search

A tool for discovering pivotal tokens in large language model generations and creating DPO datasets and steering vectors from them.

Features

Identifies pivotal tokens in language model generations
Supports various dataset formats including GSM8k, MATH, and custom datasets
Handles chain-of-thought reasoning output with <think></think> tags
Extracts answers from common formats like GSM8k's #### pattern and LaTeX's \boxed{} notation

What is Pivotal Token Search?

Pivotal Token Search (PTS) is a technique described in the Phi-4 Technical Report that identifies tokens in a language model's generation that significantly impact the probability of success for the task at hand. These "pivotal tokens" are decision points where the model's choice can dramatically alter the course of the solution.

Key features:

Identifies tokens that significantly increase or decrease the probability of a successful generation
Generates DPO (Direct Preference Optimization) pairs for fine-tuning
Creates steering vectors for activation-based steering during inference

Installation

git clone https://github.com/codelion/pts.git
cd pts
pip install -e .

Quick Start

# Find pivotal tokens in a dataset and save to file
pts run --model="Qwen/Qwen3-0.6B" --dataset="codelion/optillmbench" --output-path="pivotal_tokens.jsonl"

# Generate thought anchors dataset for reasoning analysis
pts run --model="Qwen/Qwen3-0.6B" --dataset="codelion/optillmbench" --output-path="thought_anchors.jsonl" --generate-thought-anchors

# Convert pivotal tokens to DPO dataset
pts export --input-path="pivotal_tokens.jsonl" --format="dpo" --output-path="dpo_dataset.jsonl" --model="Qwen/Qwen3-0.6B" --find-rejected-tokens

# Convert pivotal tokens to steering vectors
pts export --input-path="pivotal_tokens.jsonl" --format="steering" --output-path="steering_vectors.jsonl" --model="Qwen/Qwen3-0.6B"

# Export thought anchors for inference systems
pts export --input-path="thought_anchors.jsonl" --format="thought_anchors" --output-path="thought_anchors_export.jsonl"

# Push dataset to Hugging Face (creates README by default)
pts push --input-path="dpo_dataset.jsonl" --hf-repo="codelion/pts-dpo-dataset" --model="Qwen/Qwen3-0.6B"

Try Now

Use Case	Dataset	Link
Fine-tuning the model	dpo dataset
Optimizing the inference	steering vectors	optillm

You can also check out the datasets and models created with pts. It was used for the autothink approach in optillm as described in this paper.

Core Concepts

Pivotal Tokens

A pivotal token significantly changes the probability of success when it appears in a model's generation. By identifying these tokens, we can:

Understand where the model makes critical decisions
Create preference pairs for DPO fine-tuning
Extract activation vectors for steering during inference

DPO Datasets

PTS creates high-quality DPO datasets by isolating the specific token-level choices that lead to success or failure. This allows for more targeted and effective fine-tuning compared to using entire sequences.

Important: When exporting to DPO format, you must provide a model using the --model parameter and enable the --find-rejected-tokens flag. This is necessary because DPO pairs require both a chosen token (the pivotal token that increases success probability) and a rejected token (an alternative token that decreases success probability).

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
pts		pts
thought_anchors_analysis		thought_anchors_analysis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PTS: Pivotal Token Search

Features

What is Pivotal Token Search?

Installation

Quick Start

Try Now

Core Concepts

Pivotal Tokens

DPO Datasets

Steering Vectors

Thought Anchors

Dataset Field Customization

Command Reference

`pts run`

`pts export`

`pts push`

Examples

Finding Pivotal Tokens with OptillmBench

Working with a Custom Dataset

Working with a Dataset Requiring Configuration

Creating a DPO Dataset

Extracting Steering Vectors

Generating Thought Anchors

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

codelion/pts

Folders and files

Latest commit

History

Repository files navigation

PTS: Pivotal Token Search

Features

What is Pivotal Token Search?

Installation

Quick Start

Try Now

Core Concepts

Pivotal Tokens

DPO Datasets

Steering Vectors

Thought Anchors

Dataset Field Customization

Command Reference

pts run

pts export

pts push

Examples

Finding Pivotal Tokens with OptillmBench

Working with a Custom Dataset

Working with a Dataset Requiring Configuration

Creating a DPO Dataset

Extracting Steering Vectors

Generating Thought Anchors

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`pts run`

`pts export`

`pts push`

Packages