March 06, 2025
Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel
LinkedIn: Shivam Sahni, Byron Hsu, Yanning Chen Meta: Ankith Gunapal, Evan Smothers
March 05, 2025
Current and New Activation Checkpointing Techniques in PyTorch
As models scale in depth, batch size, and sequence length, etc, activation memory becomes an increasingly significant contributor to the overall memory usage. To help address this, PyTorch provides utilities for activation checkpointing, which reduce the number of saved tensors by recomputing them when needed, trading off memory usage for additional compute.
March 04, 2025
📣 Submit to Speak at PyTorch Conference + Save on Registration
Step into the Future of AI at PyTorch Conference 2025.
February 26, 2025
Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts
This post is a follow-up to our first entry in the multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch and a focus on latency and elastic scalability. We use torch.compile and torch.export to create highly optimized low latency versions of SAM2 that can be quickly scaled up on new instances.
February 11, 2025
Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms
PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility with Python 3.13, new security and performance enhancements, and a change in the default parameter for torch.load. PyTorch also announced the deprecation of its official Anaconda channel.
February 05, 2025
Enabling advanced GPU features in PyTorch - Warp Specialization
Meta: Hongtao Yu, Manman Ren, Bert Maher, Shane Nay NVIDIA: Gustav Zhu, Shuhao Jiang
January 29, 2025
PyTorch 2.6 Release Blog
We are excited to announce the release of PyTorch® 2.6 (release notes)! This release features multiple improvements for PT2: torch.compile can now be used with Python 3.13; new performance-related knob torch.compiler.set_stance; several AOTInductor enhancements. Besides the PT2 improvements, another highlight is FP16 support on X86 CPUs.