Implementation of the algorithm in multiple languages to compare performance in a standard GitHub Actions environment.
Languages currently wired in this repo: Rust, C, C++, Nim, Crystal, Kotlin (JAR), Elixir (.exs), Erlang (.erl).
Environment:
- Host: Linux/6.11.0-1018-azure (x86_64)
- CPU cores: 4
- Git commit: 460fdc9
impl | lang | graph | n | m | k | B | threads | time_ns | popped | edges_scanned | heap_pushes | B_prime | mem_bytes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
c-bmssp | C | ba | 200000 | 999995 | 16 | 400 | 1 | 1779913 | 8010 | 40045 | 8995 | 400 | 17599920 |
cpp-bmssp | C++ | ba | 200000 | 999995 | 16 | 400 | 1 | 2254705 | 8374 | 41865 | 9473 | 400 | 17599920 |
kotlin-bmssp | Kotlin | ba | 200000 | 1999958 | 16 | 400 | 1 | 282058140 | 200000 | 1999958 | 402594 | 400 | 35199328 |
nim-bmssp | Nim | ba | 200000 | 999995 | 16 | 400 | 1 | 2143144 | 7267 | 36330 | 8252 | 400 | 17599920 |
rust-bmssp | Rust | ba | 200000 | 999995 | 16 | 400 | 1 | 2461474 | 8331 | 41650 | 9331 | 400 | 22799944 |
c-bmssp | C | grid | 102400 | 408320 | 16 | 400 | 1 | 732563 | 4878 | 19511 | 5957 | 400 | 7352320 |
cpp-bmssp | C++ | grid | 102400 | 408320 | 16 | 400 | 1 | 897503 | 4871 | 19455 | 5984 | 400 | 7352320 |
crystal-bmssp | Crystal | grid | 102400 | 408320 | 16 | 400 | 1 | 66888845 | 4741 | 18945 | 5797 | 400 | 7152652 |
elixir-bmssp | Elixir | grid | 102400 | 408320 | 16 | 400 | 1 | 431964997 | 4956 | 19775 | 6125 | 400 | 8171520 |
erlang-bmssp | Erlang | grid | 102400 | 408320 | 16 | 400 | 1 | 11623659 | 4699 | 18772 | 5748 | 400 | 8171520 |
kotlin-bmssp | Kotlin | grid | 102400 | 408320 | 16 | 400 | 1 | 34700586 | 5306 | 21202 | 6492 | 400 | 8171520 |
nim-bmssp | Nim | grid | 102400 | 408320 | 16 | 400 | 1 | 1932144 | 5051 | 20204 | 6228 | 400 | 7352320 |
rust-bmssp | Rust | grid | 102400 | 408320 | 16 | 400 | 1 | 1045159 | 5091 | 20326 | 6280 | 400 | 10014744 |
impl | lang | < 8000 th>graphn | m | k | B | seed | threads | time_ns | popped | edges_scanned | heap_pushes | B_prime | mem_bytes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rust-bmssp | Rust | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 741251 | 868 | 3423 | 1047 | 50 | 241824 |
c-bmssp | C | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 99065 | 1289 | 5119 | 1565 | 50 | 176800 |
cpp-bmssp | C++ | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 117480 | 1064 | 4224 | 1261 | 50 | 176800 |
kotlin-bmssp | Kotlin | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 5308820 | 1102 | 4386 | 1309 | 50 | 196800 |
elixir-bmssp | Elixir | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 5410039 | 870 | 3447 | 1047 | 50 | 196800 |
erlang-bmssp | Erlang | grid | 2500 | 9800 | 4 | 50 | 1 | 1 | 1155739 | 691 | 2701 | 818 | 50 | 196800 |
Rust implementation of bounded multi-source shortest paths (multi-source Dijkstra cut off at B
).
cargo test
cargo bench -p bmssp
python3 bench/runner.py --release --out results
# fast iteration
python3 bench/runner.py --quick --out results-dev --timeout-seconds 20 --jobs 2
# 1000x larger graphs (heavy):
python3 bench/runner.py --params bench/params_1000x.yaml --release --jobs 4 --timeout-seconds 600 --out results
- Linux/macOS:
- Run
scripts/install_deps.sh --yes
to auto-install Rust, Python deps, build tools, Crystal/shards, Nim. - Use
--check-only
to only report missing items.
- Run
- Windows:
- Open PowerShell (as Administrator recommended) and run
scripts/Install-Dependencies.ps1 -Yes
. - Add
-CheckOnly
to only report.
- Open PowerShell (as Administrator recommended) and run
Let U = { v | d(v) < B }
and E(U)
be edges scanned from U
.
- Time:
O((|E(U)| + |U|) log |U|)
with binary heap; worst-caseO((m+n) log n)
. - Space:
Θ(n + m)
graph +Θ(n)
distances/flags + heap bounded by frontier size.
Use Graph::memory_estimate_bytes()
to get a byte estimate at runtime.
Here’s the no-BS, self-contained write-up you asked for. It includes the theory, proofs at the right granularity, complexity, comparisons against the usual suspects, and charts (model-based, not empirical—use them to reason about trends, not absolutes).
Given a directed graph
- For every vertex
$v$ with true shortest-path distance$d(v) < B$ , the exact distance$d(v)$ . - The explored set
$U={v\in V\mid d(v)<B}$ . - The tight boundary
$B' = \min{ \hat d(x)\mid x \notin U}$ , i.e., the smallest tentative label never popped (next frontier).
This is Dijkstra from multiple sources that halts when the next tentative key would be
Same invariant as Dijkstra: a node is popped exactly once with its final shortest distance. The only change is the early-exit cut.
Initialize
$\forall v\in V:\ \text{dist}[v] \leftarrow +\infty$ - For each source
$s\in S$ : if$d_0(s) < B$ set$\text{dist}[s]\leftarrow d_0(s)$ and push$(d_0(s),s)$ into a min-PQ. $B' \leftarrow B$
Main loop
-
Pop
$(d,u)$ . If$d \ne \text{dist}[u]$ continue (stale). -
If
$d \ge B$ , set$B'\gets d$ and stop. -
For each edge
$(u,v,w)$ :$\text{nd}=d+w$ .- If
$\text{nd} < \text{dist}[v]$ and$\text{nd} < B$ : relax and push. - Else if
$\text{nd} \ge B$ : update$B' \leftarrow \min(B', \text{nd})$ (tracks the smallest over-bound key observed).
- If
Return
Multi-source is trivial: seed multiple entries. All standard Dijkstra proofs still apply.
Invariant (Dijkstra): When a node
Boundary lemma. Let
Let
-
Time (binary heap):
$$ T = \mathcal{O}\big((|E(U)| + |U|)\log |U|\big) $$
In the worst case (
$B$ large),$|U|=n$ ,$|E(U)|=m$ → standard$\mathcal{O}((m+n)\log n)$ . -
Space: Graph storage
$\Theta(n+m)$ + working arrays$\Theta(n)$ + heap up to$\Theta(|U|)$ . Asymptotically$\Theta(n+m)$ . -
Multi-source impact: Doesn’t change big-O. Practically lowers the radius needed to reach a given
$f = |U|/n$ ; you explore more shallow balls from more centers instead of a deep ball from one center.
Wins
- You only care about a radius (range search, k-hop neighborhoods, geo/proximity queries, limited-distance labeling).
-
Repeated queries that increase
$B$ gradually; reuse$B'$ to chunk work in phases. - Graphs where
$f=|U|/n \ll 1$ for your typical$B$ . Cost drops roughly like$f \log (fn)$ .
Loses
-
$B$ approaches graph diameter ⇒ you’re doing full SSSP; there’s no magic: you pay Dijkstra. - Integer weights with small range: specialized queues (Dial/radix) beat binary heaps.
- Heavy parallel iron: Δ-stepping or GPU SSSP can dominate on large, sparse graphs with good partitioning.
Method | Weights | Time (typical) | Memory | Pros | Cons | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
BMSSP (this) + binary heap | non-negative | Exact within bound; simple; great when |
If |
||||||||
Dijkstra (binary heap) | non-negative | Baseline, robust. | Slower on integer weights; no early stop unless you add a bound. | ||||||||
Dijkstra (Fibonacci) | non-negative |
|
big constants | Better asymptotics. | Not worth it in practice. | ||||||
Dial / bucket queues | integer weights |
Blazing if |
Explodes with large |
||||||||
Radix / 0-1 BFS | small integer | near-linear | Ideal for tiny ranges (0-1 BFS). | Narrow use-case. | |||||||
Δ-stepping (parallel) | non-negative |
|
buckets + frontier | Parallel-friendly; fast in practice. | Needs tuning (δ); irregular work. | ||||||
Bellman-Ford | any | Negative edges (no cycles). | Too slow; not comparable for your case. | ||||||||
A* | non-negative + heuristic | like Dijkstra on set actually explored | like Dijkstra | If heuristic is good, dominates. | Needs problem-specific heuristics. |
Bottom line: If you need exact distances up to a bound and you’re not on tiny integer weights, BMSSP (bounded Dijkstra) is the simplest, most dependable hammer. If weights are small integers, buckets win. If you’re on many-core/GPU, consider Δ-stepping or GPU SSSP.
These plots use simple cost models (they are not runtime measurements). They illustrate how the math scales with explored fraction cargo bench
to pin real numbers.
-
Relative time vs explored fraction
Bounded Dijkstra cost shrinks roughly like
$f \log (fn)$ vs full Dijkstra at$f=1$ .
-
Bounded Dijkstra vs Dial (integer weights)
When the weight range
$C$ is small, Dial can beat binary heaps; as$B$ grows or$C$ is large, buckets become a tax.
-
Extra memory (excluding adjacency)
All Dijkstra variants pay
$O(n)$ . Bucketed queues add$O(B)$ (or $O(C)$) worth of buckets.
Want empirical plots? Run the provided Rust bench across varying
$B$ ,$k=|S|$ ,$n,m$ , and (optionally) a bucketed queue variant. I can scriptcargo bench
+ CSV + gnuplot for you.
Crystal port lives in impls/crystal
and follows the same CLI and JSON contract.
Build locally (if Crystal is installed):
cd impls/crystal
shards build --release
./bin/bmssp_cr --graph grid --rows 20 --cols 20 --k 8 --B 50 --seed 1 --trials 2 --json
The bench runner will auto-detect Crystal (crystal
+ shards
in PATH) and include its results.
- Keep
dist
asu64
. Usesaturating_add
to avoid overflow during relaxations. - Don’t attempt “decrease-key”; just push duplicates and skip stale pops. It’s faster with Rust’s
BinaryHeap
. - Track
B'
while relaxing edges that would cross the bound—this is your next-phase threshold. - Multi-source is just seeding many
(node, offset)
entries. If you repeat with larger$B$ , reusedist
as a warm-start and continue; it preserves optimality because labels are monotone. - Memory: adjacency dominates. The rest is a few
Vec
s of size$n$ and the heap/frontier.
-
Time:
$\boxed{ \mathcal{O}\big((|E(U)|+|U|)\log |U|\big) }$ Worst case:$\mathcal{O}((m+n)\log n)$ . -
Space:
$\boxed{ \Theta(n+m) }$ total; working state$\Theta(n)$ + heap$O(|U|)$ .
If you want byte-accurate numbers, the Rust crate includes a memory_estimate_bytes()
helper; for real allocations, measure with heaptrack
/massif
or Linux cgroups.
-
Small integer weights (e.g., 0–255): Dial/radix will beat binary heaps. BMSSP still applies—just stop at
$B$ —but use buckets. -
Parallel hardware: Δ-stepping buckets plus bound
$B$ works well; you get level-synchronous waves clipped at$B$ . -
Heuristic-rich domains: A* with an admissible heuristic can explore far less than
$U$ for the same$B$ .
-
Use the Rust crate I gave you (
cargo bench -p bmssp
) and vary:-
$n,m$ (grid, ER random, power-law) - number of sources
$k$ - bound
$B$
-
-
Implement a second queue:
- Dial (if your weights are bounded ints)
- or a bucketed Δ-stepping variant (single-thread first)
-
Capture: push counts, relax counts, popped vertices, wall-clock (release), and peak RSS.
-
Plot time vs |U|, time vs B, pushes per edge; the trends will match the charts above, with real constants.
- If
$B$ is small relative to typical distances, BMSSP saves you real work, exactly as the$f\log(fn)$ model predicts. - If weights are tiny integers, use buckets (Dial) and still bound by
$B$ . - Otherwise, bounded Dijkstra with a binary heap is the clean default—simple, exact, and fast enough.
If you want this rewritten around the recursive/pivoted BMSSP from your screenshot (the one with FindPivots
, batching, etc.), send the full section of the paper and I’ll extend the Rust crate to that variant and give you measured charts next.