Stars
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A model compilation solution for various hardware
ncnn is a high-performance neural network inference framework optimized for the mobile platform
A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
MOT using deepsort and yolov3 with pytorch
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.