chess-puzzles

Overview

Environment ID: chess-puzzles
Short description: Multi-turn chess puzzle environment based on Lichess puzzles
Tags: chess, games, multi-turn

Datasets

Primary dataset(s): Lichess/chess-puzzles
Source links: Dataset

Task

Type: multi-turn
Parser: UCIParser - extracts UCI chess notation moves from model responses
Rubric overview: The reward incorporates whether the model makes correct moves (correct_move_reward), plays legal moves (legal_move_reward), and completes the puzzle (completion_reward) with weights [1.0, 0.5, 2.0]

Quickstart

Run an evaluation with default settings:

uv run vf-eval chess-puzzles

Configure model and sampling:

uv run vf-eval chess-puzzles \
  -m gpt-5-nano \
  -n 4 -r 4 \
  -a '{"min_rating": 500, "max_rating": 800, "themes": ["mateIn4"]}'

Environment Arguments

Arg	Type	Default	Description
`num_puzzles`	int	`100`	Number of puzzles to load from the dataset
`seed`	int	`None`	Random seed for dataset shuffling (uses random if None)
`min_rating`	int	`400`	Minimum puzzle rating filter
`max_rating`	int	`600`	Maximum puzzle rating filter
`themes`	List[str]	`["mateIn2"]`	List of puzzle themes to filter by (e.g., "mate", "fork", "endgame")
`show_legal_moves`	bool	`True`	Whether to include legal moves in the prompt

Metrics

Metric	Meaning
`reward`	Main scalar reward (weighted sum of all criteria)
`correct_move_reward`	Count of correct puzzle moves made by the model
`legal_move_reward`	Ratio of legal moves to total expected moves
`completion_reward`	1.0 if puzzle is solved, else 0.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
outputs/evals		outputs/evals
.gitignore		.gitignore
README.md		README.md
chess_puzzles.py		chess_puzzles.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

chess-puzzles

Overview

Datasets

Task

Quickstart

Environment Arguments

Metrics

About

Uh oh!

Releases

Languages

eigenpaul/chess-puzzles

Folders and files

Latest commit

History

Repository files navigation

chess-puzzles

Overview

Datasets

Task

Quickstart

Environment Arguments

Metrics

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages