bmssp benchmark game

Implementation of the algorithm in multiple languages to compare performance in a standard GitHub Actions environment.

Languages currently wired in this repo: Rust, C, C++, Nim, Crystal, Kotlin (JAR), Elixir (.exs), Erlang (.erl).

Benchmark snapshot

BMSSP 1000x Report

Environment:

Host: Linux/6.11.0-1018-azure (x86_64)
CPU cores: 4
Git commit: 460fdc9

Best rows per implementation (largest explored set)

impl	lang	graph	n	m	k	B	threads	time_ns	popped	edges_scanned	heap_pushes	B_prime	mem_bytes
c-bmssp	C	ba	200000	999995	16	400	1	1779913	8010	40045	8995	400	17599920
cpp-bmssp	C++	ba	200000	999995	16	400	1	2254705	8374	41865	9473	400	17599920
kotlin-bmssp	Kotlin	ba	200000	1999958	16	400	1	282058140	200000	1999958	402594	400	35199328
nim-bmssp	Nim	ba	200000	999995	16	400	1	2143144	7267	36330	8252	400	17599920
rust-bmssp	Rust	ba	200000	999995	16	400	1	2461474	8331	41650	9331	400	22799944
c-bmssp	C	grid	102400	408320	16	400	1	732563	4878	19511	5957	400	7352320
cpp-bmssp	C++	grid	102400	408320	16	400	1	897503	4871	19455	5984	400	7352320
crystal-bmssp	Crystal	grid	102400	408320	16	400	1	66888845	4741	18945	5797	400	7152652
elixir-bmssp	Elixir	grid	102400	408320	16	400	1	431964997	4956	19775	6125	400	8171520
erlang-bmssp	Erlang	grid	102400	408320	16	400	1	11623659	4699	18772	5748	400	8171520
kotlin-bmssp	Kotlin	grid	102400	408320	16	400	1	34700586	5306	21202	6492	400	8171520
nim-bmssp	Nim	grid	102400	408320	16	400	1	1932144	5051	20204	6228	400	7352320
rust-bmssp	Rust	grid	102400	408320	16	400	1	1045159	5091	20326	6280	400	10014744

< 8000 th>graph

impl	lang	n	m	k	B	seed	threads	time_ns	popped	edges_scanned	heap_pushes	B_prime	mem_bytes
rust-bmssp	Rust	grid	2500	9800	4	50	1	1	741251	868	3423	1047	50	241824
c-bmssp	C	grid	2500	9800	4	50	1	1	99065	1289	5119	1565	50	176800
cpp-bmssp	C++	grid	2500	9800	4	50	1	1	117480	1064	4224	1261	50	176800
kotlin-bmssp	Kotlin	grid	2500	9800	4	50	1	1	5308820	1102	4386	1309	50	196800
elixir-bmssp	Elixir	grid	2500	9800	4	50	1	1	5410039	870	3447	1047	50	196800
erlang-bmssp	Erlang	grid	2500	9800	4	50	1	1	1155739	691	2701	818	50	196800

Rust implementation of bounded multi-source shortest paths (multi-source Dijkstra cut off at B).

Run

cargo test
cargo bench -p bmssp
python3 bench/runner.py --release --out results

# fast iteration
python3 bench/runner.py --quick --out results-dev --timeout-seconds 20 --jobs 2

# 1000x larger graphs (heavy):
python3 bench/runner.py --params bench/params_1000x.yaml --release --jobs 4 --timeout-seconds 600 --out results

One-time setup scripts

Linux/macOS:
- Run scripts/install_deps.sh --yes to auto-install Rust, Python deps, build tools, Crystal/shards, Nim.
- Use --check-only to only report missing items.
Windows:
- Open PowerShell (as Administrator recommended) and run scripts/Install-Dependencies.ps1 -Yes.
- Add -CheckOnly to only report.

Complexity

Let U = { v | d(v) < B } and E(U) be edges scanned from U.

Time: O((|E(U)| + |U|) log |U|) with binary heap; worst-case O((m+n) log n).
Space: Θ(n + m) graph + Θ(n) distances/flags + heap bounded by frontier size.

Use Graph::memory_estimate_bytes() to get a byte estimate at runtime.

Here’s the no-BS, self-contained write-up you asked for. It includes the theory, proofs at the right granularity, complexity, comparisons against the usual suspects, and charts (model-based, not empirical—use them to reason about trends, not absolutes).

Bounded Multi-Source Shortest Paths (BMSSP)

What problem it solves

Given a directed graph $G=(V,E)$ with non-negative weights $w:E\to \mathbb{R}_{\ge 0}$, a set of sources $S\subseteq V$ with initial offsets $d_0(s)$ (usually 0), and a distance bound $B$, compute:

For every vertex $v$ with true shortest-path distance $d(v) < B$, the exact distance $d(v)$.
The explored set $U={v\in V\mid d(v)<B}$.
The tight boundary $B' = \min{ \hat d(x)\mid x \notin U}$, i.e., the smallest tentative label never popped (next frontier).

This is Dijkstra from multiple sources that halts when the next tentative key would be $\ge B$.

Algorithm (binary-heap version)

Same invariant as Dijkstra: a node is popped exactly once with its final shortest distance. The only change is the early-exit cut.

Initialize

$\forall v\in V:\ \text{dist}[v] \leftarrow +\infty$
For each source $s\in S$: if $d_0(s) < B$ set $\text{dist}[s]\leftarrow d_0(s)$ and push $(d_0(s),s)$ into a min-PQ.
$B' \leftarrow B$

Main loop

Pop $(d,u)$. If $d \ne \text{dist}[u]$ continue (stale).
If $d \ge B$, set $B'\gets d$ and stop.
For each edge $(u,v,w)$: $\text{nd}=d+w$.
- If $\text{nd} < \text{dist}[v]$ and $\text{nd} < B$: relax and push.
- Else if $\text{nd} \ge B$: update $B' \leftarrow \min(B', \text{nd})$ (tracks the smallest over-bound key observed).

Return $(U, B')$ where $U$ are the popped vertices.

Multi-source is trivial: seed multiple entries. All standard Dijkstra proofs still apply.

Correctness (sketch you can actually use)

Invariant (Dijkstra): When a node $u$ is popped, $\text{dist}[u]=d(u)$. Proof carries over because (1) weights $\ge 0$, (2) heap order guarantees we never pop a label larger than any alternate path to that node, (3) early exit only reduces the set of popped nodes; it never allows popping out of order.

Boundary lemma. Let $U$ be nodes popped before exit. Then $B' = \min{\hat d(x)\mid x\in V\setminus U}$ — the smallest tentative distance still in the heap (or discovered and $\ge B$). So $B' \ge B$ and is the next tight phase boundary if you continue search.

Complexity

Let $U={v\mid d(v)<B}$ and $E(U)={(u,v)\in E\mid u\in U}$.

Time (binary heap):

$$ T = \mathcal{O}\big((|E(U)| + |U|)\log |U|\big) $$

In the worst case ($B$ large), $|U|=n$, $|E(U)|=m$ → standard $\mathcal{O}((m+n)\log n)$.
Space: Graph storage $\Theta(n+m)$ + working arrays $\Theta(n)$ + heap up to $\Theta(|U|)$. Asymptotically $\Theta(n+m)$.
Multi-source impact: Doesn’t change big-O. Practically lowers the radius needed to reach a given $f = |U|/n$; you explore more shallow balls from more centers instead of a deep ball from one center.

Where BMSSP wins / loses

Wins

You only care about a radius (range search, k-hop neighborhoods, geo/proximity queries, limited-distance labeling).
Repeated queries that increase $B$ gradually; reuse $B'$ to chunk work in phases.
Graphs where $f=|U|/n \ll 1$ for your typical $B$. Cost drops roughly like $f \log (fn)$.

Loses

$B$ approaches graph diameter ⇒ you’re doing full SSSP; there’s no magic: you pay Dijkstra.
Integer weights with small range: specialized queues (Dial/radix) beat binary heaps.
Heavy parallel iron: Δ-stepping or GPU SSSP can dominate on large, sparse graphs with good partitioning.

Comparison with existing SSSP families

Method	Weights	Time (typical)	Memory	Pros	Cons
BMSSP (this) + binary heap	non-negative	$\mathcal{O}((m+n)\log n)$	$\Theta(n+m)$	Exact within bound; simple; great when $f \ll 1$.	If $f\to1$, same as Dijkstra.
Dijkstra (binary heap)	non-negative	$\mathcal{O}((m+n)\log n)$	$\Theta(n+m)$	Baseline, robust.	Slower on integer weights; no early stop unless you add a bound.
Dijkstra (Fibonacci)	non-negative	$\mathcal{O}(m + n\log n)$ (theory)	big constants	Better asymptotics.	Not worth it in practice.
Dial / bucket queues	integer weights $0..C$	$\mathcal{O}(m + nC)$	$\Theta(n+m+C)$	Blazing if $C$ small; predictable.	Explodes with large $C$ or big $B$ (buckets).
Radix / 0-1 BFS	small integer	near-linear	$\Theta(n+m)$	Ideal for tiny ranges (0-1 BFS).	Narrow use-case.
Δ-stepping (parallel)	non-negative	$\approx$ near-linear per level on PRAM	buckets + frontier	Parallel-friendly; fast in practice.	Needs tuning (δ); irregular work.
Bellman-Ford	any	$\mathcal{O}(nm)$	$\Theta(n)$	Negative edges (no cycles).	Too slow; not comparable for your case.
A*	non-negative + heuristic	like Dijkstra on set actually explored	like Dijkstra	If heuristic is good, dominates.	Needs problem-specific heuristics.

Bottom line: If you need exact distances up to a bound and you’re not on tiny integer weights, BMSSP (bounded Dijkstra) is the simplest, most dependable hammer. If weights are small integers, buckets win. If you’re on many-core/GPU, consider Δ-stepping or GPU SSSP.

Charts (model-based trends)

These plots use simple cost models (they are not runtime measurements). They illustrate how the math scales with explored fraction $f$, bound $B$, and weight range. Use your own cargo bench to pin real numbers.

Relative time vs explored fraction Bounded Dijkstra cost shrinks roughly like $f \log (fn)$ vs full Dijkstra at $f=1$.

$Relative time vs fraction$

Bounded Dijkstra vs Dial (integer weights) When the weight range $C$ is small, Dial can beat binary heaps; as $B$ grows or $C$ is large, buckets become a tax.

Extra memory (excluding adjacency) All Dijkstra variants pay $O(n)$. Bucketed queues add $O(B)$ (or $O(C)$) worth of buckets.

Want empirical plots? Run the provided Rust bench across varying $B$, $k=|S|$, $n,m$, and (optionally) a bucketed queue variant. I can script cargo bench + CSV + gnuplot for you.

Crystal implementation

Crystal port lives in impls/crystal and follows the same CLI and JSON contract.

Build locally (if Crystal is installed):

cd impls/crystal
shards build --release
./bin/bmssp_cr --graph grid --rows 20 --cols 20 --k 8 --B 50 --seed 1 --trials 2 --json

The bench runner will auto-detect Crystal (crystal + shards in PATH) and include its results.

Implementation notes that actually matter

Keep dist as u64. Use saturating_add to avoid overflow during relaxations.
Don’t attempt “decrease-key”; just push duplicates and skip stale pops. It’s faster with Rust’s BinaryHeap.
Track B' while relaxing edges that would cross the bound—this is your next-phase threshold.
Multi-source is just seeding many (node, offset) entries. If you repeat with larger $B$, reuse dist as a warm-start and continue; it preserves optimality because labels are monotone.
Memory: adjacency dominates. The rest is a few Vecs of size $n$ and the heap/frontier.

Complexity re-stated (so you don’t scroll back)

Time: $\boxed{ \mathcal{O}\big((|E(U)|+|U|)\log |U|\big) }$ Worst case: $\mathcal{O}((m+n)\log n)$.
Space: $\boxed{ \Theta(n+m) }$ total; working state $\Theta(n)$ + heap $O(|U|)$.

If you want byte-accurate numbers, the Rust crate includes a memory_estimate_bytes() helper; for real allocations, measure with heaptrack/massif or Linux cgroups.

When to prefer other queues

Small integer weights (e.g., 0–255): Dial/radix will beat binary heaps. BMSSP still applies—just stop at $B$—but use buckets.
Parallel hardware: Δ-stepping buckets plus bound $B$ works well; you get level-synchronous waves clipped at $B$.
Heuristic-rich domains: A* with an admissible heuristic can explore far less than $U$ for the same $B$.

How to produce real comparison charts (bench recipe)

Use the Rust crate I gave you (cargo bench -p bmssp) and vary:
- $n,m$ (grid, ER random, power-law)
- number of sources $k$
- bound $B$
Implement a second queue:
- Dial (if your weights are bounded ints)
- or a bucketed Δ-stepping variant (single-thread first)
Capture: push counts, relax counts, popped vertices, wall-clock (release), and peak RSS.
Plot time vs |U|, time vs B, pushes per edge; the trends will match the charts above, with real constants.

TL;DR

If $B$ is small relative to typical distances, BMSSP saves you real work, exactly as the $f\log(fn)$ model predicts.
If weights are tiny integers, use buckets (Dial) and still bound by $B$.
Otherwise, bounded Dijkstra with a binary heap is the clean default—simple, exact, and fast enough.

If you want this rewritten around the recursive/pivoted BMSSP from your screenshot (the one with FindPivots, batching, etc.), send the full section of the paper and I’ll extend the Rust crate to that variant and give you measured charts next.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
bench		bench
bmssp		bmssp
docs		docs
impls		impls
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
make_report.sh		make_report.sh
run_fast.sh		run_fast.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

bmssp benchmark game

Benchmark snapshot

BMSSP 1000x Report

Best rows per implementation (largest explored set)

Run

One-time setup scripts

Complexity

Bounded Multi-Source Shortest Paths (BMSSP)

What problem it solves

Algorithm (binary-heap version)

Correctness (sketch you can actually use)

Complexity

Where BMSSP wins / loses

Comparison with existing SSSP families

Charts (model-based trends)

Crystal implementation

Implementation notes that actually matter

Complexity re-stated (so you don’t scroll back)

When to prefer other queues

How to produce real comparison charts (bench recipe)

TL;DR

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Licenses found

openSVM/bmssp-benchmark-game

Folders and files

Latest commit

History

Repository files navigation

bmssp benchmark game

Benchmark snapshot

BMSSP 1000x Report

Best rows per implementation (largest explored set)

Run

One-time setup scripts

Complexity

Bounded Multi-Source Shortest Paths (BMSSP)

What problem it solves

Algorithm (binary-heap version)

Correctness (sketch you can actually use)

Complexity

Where BMSSP wins / loses

Comparison with existing SSSP families

Charts (model-based trends)

Crystal implementation

Implementation notes that actually matter

Complexity re-stated (so you don’t scroll back)

When to prefer other queues

How to produce real comparison charts (bench recipe)

TL;DR

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages