Highlights
- Pro
Stars
An end-to-end Transformer fusion integrating DAG-based pipeline scheduling and whole encoder and decoder fusion.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Official repo for the paper "An Effective Training Framework for Light-Weight Automatic Speech Recognition Models" accepted at InterSpeech 2025.
Efficient vision foundation models for high-resolution generation and perception.
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and mini-batch training. Provides unification of full-/mini-batch t…
A verification tool for ensuring parallelization equivalence in distributed model training.
verify-llm / TrainVerify
Forked from microsoft/TrainVerifyA verification tool for ensuring parallelization equivalence in distributed model training.
Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode …
UniSparse: An Intermediate Language for General Sparse Format Customization (OOPSLA'24)
SparseTIR: Sparse Tensor Compiler for Deep Learning
This project includes a prototype implementation of BOLT—a bandwidth-optimized, lightning-fast Oblivious Map—along with benchmarking code for performance comparisons.
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
Official code repository for "Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving [MICRO'25]"
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
Benchmark harness and baseline results for the NeuroBench algorithm track.
[TVLSI 2025] ACiM Inference Simulation Framework in "ASiM: Modeling and Analyzing Inference Accuracy of SRAM-Based Analog CiM Circuits"
Fast and memory-efficient exact attention
A sparse attention kernel supporting mix sparse patterns
Fast and low-memory attention layer written in CUDA