bsdcfp

bsdcfp

Stars

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 769 52 Updated Mar 6, 2025

comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 91,705 10,294 Updated Oct 22, 2025

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …

Python 704 222 Updated Oct 16, 2025

showlab / Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

938 53 Updated Aug 17, 2025

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 264 46 Updated Oct 22, 2025

nunchaku-tech / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,236 184 Updated Oct 14, 2025

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,559 357 Updated Mar 19, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,567 231 Updated Oct 21, 2025

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 24,523 1,799 Updated Jul 31, 2025

marktext / marktext

📝A simple and elegant markdown editor, available for Linux, macOS and Windows.

JavaScript 52,016 3,819 Updated Aug 18, 2024

Shenyi-Z / Cache4Diffusion

Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.

Python 64 6 Updated Oct 18, 2025

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,261 2,256 Updated Sep 24, 2025

xuyang-liu16 / Awesome-Generation-Acceleration

📚 Collection of awesome generation acceleration resources.

356 10 Updated Jul 7, 2025

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 92 13 Updated Oct 17, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,655 1,494 Updated Oct 22, 2025

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,917 1,653 Updated Oct 22, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,185 97 Updated Oct 17, 2025

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

Python 3,107 231 Updated Sep 5, 2025

vipshop / cache-dit

A Unified Cache Acceleration Framework for 🤗Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, Wan, FLUX, etc.

Python 422 14 Updated Oct 23, 2025

huggingface / flux-fast

Making Flux go brrr on GPUs.

Python 148 14 Updated Jul 18, 2025

PrunaAI / pruna

Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.

Python 919 68 Updated Oct 22, 2025

dotnetcore / EasyCaching

💥 EasyCaching is an open source caching library that contains basic usages and some advanced usages of caching which can help us to handle caching more easier!

C# 2,071 334 Updated Mar 17, 2025

huggingface / diffusion-fast

Faster generation with text-to-image diffusion models.

Python 228 15 Updated Jun 28, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,113 2,080 Updated Oct 22, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 11,930 1,814 Updated Oct 23, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,442 348 Updated Oct 23, 2025

cumulo-autumn / StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 10,475 806 Updated Dec 4, 2024

mit-han-lab / radial-attention

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 525 29 Updated Sep 18, 2025

bytedance / LatentSync

Taming Stable Diffusion for Lip Sync!

Python 5,022 806 Updated Jun 20, 2025

NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 457 63 Updated Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly