- Environment ID:
chess-puzzles
- Short description: Multi-turn chess puzzle environment based on Lichess puzzles
- Tags: chess, games, multi-turn
- Primary dataset(s):
Lichess/chess-puzzles
- Source links: Dataset
- Type: multi-turn
- Parser:
UCIParser
- extracts UCI chess notation moves from model responses - Rubric overview: The reward incorporates whether the model makes correct moves (
correct_move_reward
), plays legal moves (legal_move_reward
), and completes the puzzle (completion_reward
) with weights [1.0, 0.5, 2.0]
Run an evaluation with default settings:
uv run vf-eval chess-puzzles
Configure model and sampling:
uv run vf-eval chess-puzzles \
-m gpt-5-nano \
-n 4 -r 4 \
-a '{"min_rating": 500, "max_rating": 800, "themes": ["mateIn4"]}'
Arg | Type | Default | Description |
---|---|---|---|
num_puzzles |
int | 100 |
Number of puzzles to load from the dataset |
seed |
int | None |
Random seed for dataset shuffling (uses random if None) |
min_rating |
int | 400 |
Minimum puzzle rating filter |
max_rating |
int | 600 |
Maximum puzzle rating filter |
themes |
List[str] | ["mateIn2"] |
List of puzzle themes to filter by (e.g., "mate", "fork", "endgame") |
show_legal_moves |
bool | True |
Whether to include legal moves in the prompt |
Metric | Meaning |
---|---|
reward |
Main scalar reward (weighted sum of all criteria) |
correct_move_reward |
Count of correct puzzle moves made by the model |
legal_move_reward |
Ratio of legal moves to total expected moves |
completion_reward |
1.0 if puzzle is solved, else 0.0 |