March 06, 2025

Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel

LinkedIn: Shivam Sahni, Byron Hsu, Yanning Chen Meta: Ankith Gunapal, Evan Smothers

Read More

March 05, 2025

Current and New Activation Checkpointing Techniques in PyTorch

As models scale in depth, batch size, and sequence length, etc, activation memory becomes an increasingly significant contributor to the overall memory usage. To help address this, PyTorch provides utilities for activation checkpointing, which reduce the number of saved tensors by recomputing them when needed, trading off memory usage for additional compute.

Read More

March 04, 2025

📣 Submit to Speak at PyTorch Conference + Save on Registration

Step into the Future of AI at PyTorch Conference 2025.

Read More

February 26, 2025

Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts

This post is a follow-up to our first entry in the multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch and a focus on latency and elastic scalability. We use torch.compile and torch.export to create highly optimized low latency versions of SAM2 that can be quickly scaled up on new instances.

Read More

February 11, 2025

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility with Python 3.13, new security and performance enhancements, and a change in the default parameter for torch.load. PyTorch also announced the deprecation of its official Anaconda channel.

Read More

February 05, 2025

Enabling advanced GPU features in PyTorch - Warp Specialization

Meta: Hongtao Yu, Manman Ren, Bert Maher, Shane Nay NVIDIA: Gustav Zhu, Shuhao Jiang

Read More

January 29, 2025

PyTorch 2.6 Release Blog

We are excited to announce the release of PyTorch® 2.6 (release notes)! This release features multiple improvements for PT2: torch.compile can now be used with Python 3.13; new performance-related knob torch.compiler.set_stance; several AOTInductor enhancements. Besides the PT2 improvements, another highlight is FP16 support on X86 CPUs.

Read More