-
Sandia National Laboratories
- https://vlkale.github.io
- @vivek_lkale
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
Explain Compiler Explorer output using AI
Run compilers interactively from your web browser and interact with the assembly
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
OSV-SCALIBR: A library for Software Composition Analysis
Claude Code for CUDA. Free AI assistant that actually understands GPU architecture
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
abouteiller / rocSHMEM
Forked from ROCm/rocSHMEMrocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
A tool and a library for bi-directional translation between SPIR-V and LLVM IR
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, con…
vlkale / kokkos.github.io
Forked from kokkos/kokkos.github.ioSource code for kokkos.org pages - Vivek's fork
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
A Python framework for accelerated simulation, data generation and spatial computing.
OpenTofu lets you declaratively manage your cloud infrastructure.
The C++ Standard Library for Parallelism and Concurrency