ashvardanian

☕

Less Slow

Ash Vardanian ashvardanian

☕

Less Slow

Converting outrageous amounts of coffee into modest amounts of Assembly for Intel, Arm, and Nvidia chips. @unum-cloud, @cpp-armenia

1.2k followers · 90 following

BDFL @ Unum
London, San Francisco, Yerevan
13:52 (UTC +01:00)
ashvardanian.com
https://orcid.org/0000-0002-4882-1815
@ashvardanian
in/ashvardanian
ashvardanian
@ashvardanian

Achievements

x3 x3 x3

Achievements

x3 x3 x3

Starred repositories

bertdobbelaere / SorterHunter

An evolutionary approach to find small and low latency sorting networks

HTML 72 6 Updated Apr 21, 2025

stillwater-sc / universal

Large collection of number systems providing custom arithmetic for mixed-precision algorithm development and optimization for AI, Machine Learning, Computer Vision, Signal Processing, CAE, EDA, con…

C++ 479 66 Updated Oct 20, 2025

ashvardanian / HashEvals

Minimalistic Rust toolkit for hash function quality analysis. Tests avalanche effect, differential patterns, and statistical distribution across variable-length n-grams.

Rust 9 1 Updated Oct 6, 2025

HJLebbink / RustGPT

Forked from tekaratzas/RustGPT

An transformer based LLM. Written completely in Rust

Rust 5 Updated Sep 17, 2025

ashvardanian / USearchBench.java

Apache Spark and Unum USearch integration example benchmarking distributed Vector Search against Lucene and OpenSearch

Java 4 Updated Sep 12, 2025

ashvardanian / StringTape

Apache Arrow-compatible space-efficient "tape" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variable length strings

Rust 27 1 Updated Oct 12, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,317 92 Updated Oct 20, 2025

ashvardanian / ForkUnion

Lower-latency OpenMP-style minimalistic scoped thread-pool designed for 'Fork-Join' parallelism in Rust and C++, avoiding memory allocations, mutexes, CAS-primitives, and false-sharing on the hot p…

C++ 276 24 Updated Oct 19, 2025

zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,808 110 Updated Oct 21, 2025

lightpanda-io / browser

Lightpanda: the headless browser designed for AI and automation

Zig 10,054 270 Updated Oct 21, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 587 26 Updated Sep 28, 2025

MinishLab / model2vec

Fast State-of-the-Art Static Embeddings

Python 1,870 103 Updated Oct 11, 2025

ashvardanian / JaccardIndex

Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables

Python 21 2 Updated May 18, 2025

tracel-ai / cubecl

Multi-platform high-performance compute language extension for Rust.

Rust 1,741 112 Updated Oct 21, 2025

scylladb / vector-store

The indexing service for ScyllaDB for vector searching functionality

Rust 23 10 Updated Oct 16, 2025

NVIDIA / nsight-vscode-edition

A Visual Studio Code extension for building and debugging CUDA applications.

TypeScript 90 15 Updated Sep 24, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,326 647 Updated Oct 21, 2025

zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Rust 68,025 5,623 Updated Oct 21, 2025

microsoft / vscode-cpptools

Official repository for the Microsoft C/C++ extension for VS Code.

TypeScript 5,926 1,678 Updated Oct 16, 2025

travisdowns / uarch-bench

A benchmark for low-level CPU micro-architectural features

C++ 748 69 Updated Feb 8, 2022

travisdowns / robsize

ROB size testing utility

C++ 158 14 Updated Dec 19, 2021

ashvardanian / NetworkXternal

NetworkX-like Python experience for Postgres, SQLite, MongoDB, and Neo4J

Python 28 3 Updated Feb 28, 2025

petermattis / fastcgo

Go 193 13 Updated Aug 16, 2017

lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences

C 2,053 446 Updated Oct 5, 2025

capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, LoongArch, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, T…

C 8,313 1,621 Updated Oct 20, 2025

openmm / openmm

OpenMM is a toolkit for molecular simulation using high performance GPU code.

C++ 1,694 567 Updated Oct 16, 2025

emeryberger / Heap-Layers

Heap Layers: An Extensible Memory Allocation Infrastructure

C++ 402 59 Updated Sep 22, 2025

hanickadot / compile-time-regular-expressions

Compile Time Regular Expression in C++

C++ 3,676 202 Updated Sep 12, 2025

ashvardanian / less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,865 75 Updated Sep 10, 2025

ashvardanian / PyBindToGPUs

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

Cuda 29 3 Updated Oct 14, 2025

Ash Vardanian ashvardanian

Starred repositories

image-search

similarity-search

Database

Compiler

C

Algorithm

Python

C++

hpc