Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
()
About this ebook
This book, written by an HPC expert with over 25 years of experience, guides you through enhancing model training performance using PyTorch. Here you’ll learn how model complexity impacts training time and discover performance tuning levels to expedite the process, as well as utilize PyTorch features, specialized libraries, and efficient data pipelines to optimize training on CPUs and accelerators. You’ll also reduce model complexity, adopt mixed precision, and harness the power of multicore systems and multi-GPU environments for distributed training. By the end, you'll be equipped with techniques and strategies to speed up training and focus on building stunning models.
Related to Accelerate Model Training with PyTorch 2.X
Related ebooks
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition) Rating: 0 out of 5 stars0 ratingsDistributed Machine Learning with Python: Accelerating model training and serving with distributed systems Rating: 0 out of 5 stars0 ratingsActive Machine Learning with Python: Refine and elevate data quality over quantity with active learning Rating: 0 out of 5 stars0 ratingsMachine Learning with LightGBM and Python: A practitioner's guide to developing production-ready machine learning systems Rating: 0 out of 5 stars0 ratingsPractical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions Rating: 0 out of 5 stars0 ratingsFeature Store for Machine Learning: Curate, discover, share and serve ML features at scale Rating: 0 out of 5 stars0 ratingsBlockchain Beyond Bitcoin Real-World Use Cases and Emerging Trends Rating: 0 out of 5 stars0 ratingsMachine Learning Engineering with MLflow: Manage the end-to-end machine learning life cycle with MLflow Rating: 0 out of 5 stars0 ratingsPython Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition) Rating: 0 out of 5 stars0 ratingsMastering PyTorch: Build powerful neural network architectures using advanced PyTorch 1.x features Rating: 0 out of 5 stars0 ratingsA Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg Rating: 0 out of 5 stars0 ratingsComputer Vision with Maker Tech: Detecting People With a Raspberry Pi, a Thermal Camera, and Machine Learning Rating: 0 out of 5 stars0 ratingsPython Machine Learning By Example Rating: 4 out of 5 stars4/5Machine Learning Automation with TPOT: Build, validate, and deploy fully automated machine learning models with Python Rating: 0 out of 5 stars0 ratingsPython for Geeks: Build production-ready applications using advanced Python concepts and industry best practices Rating: 0 out of 5 stars0 ratingsMachine Learning with TensorFlow 1.x: Second generation machine learning with Google's brainchild - TensorFlow 1.x Rating: 0 out of 5 stars0 ratingsLearning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition) Rating: 0 out of 5 stars0 ratingsHyperparameter Tuning with Python: Boost your machine learning model's performance via hyperparameter tuning Rating: 0 out of 5 stars0 ratingsOptimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition) Rating: 0 out of 5 stars0 ratingsDeep Learning By Example: A hands-on guide to implementing advanced machine learning algorithms and neural networks Rating: 0 out of 5 stars0 ratingsPragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production Rating: 0 out of 5 stars0 ratingsHands-On Neural Network Programming with C#: Add powerful neural network capabilities to your C# enterprise applications Rating: 0 out of 5 stars0 ratings
Programming For You
Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsC# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratingsLearn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsLearning JavaScript Data Structures and Algorithms Rating: 5 out of 5 stars5/5The Most Concise Step-By-Step Guide To ChatGPT Ever Rating: 3 out of 5 stars3/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsBeginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5
Reviews for Accelerate Model Training with PyTorch 2.X
0 ratings0 reviews
Book preview
Accelerate Model Training with PyTorch 2.X - Maicon Melo Alves
Accelerate Model Training with PyTorch 2.X
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Sanjana Gupta
Book Project Manager: Kirti Pisat
Content Development Editor: Manikandan Kurup
Technical Editor: Seemanjay Ameriya
Copy Editor: Safis Editing
Proofreader: Safis Editing and Manikandan Kurup
Indexer: Hemangini Bari
Production Designer: Aparna Bhagat
Senior DevRel Marketing Coordinator: Vinishka Kalra
First published: April 2024
Production reference: 1050424
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul's Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80512-010-0
To my wife and best friend, Cristiane, for being my loving partner throughout our joint life journey. To my daughters, Giovana and Camila, for being my real treasure; I’m so proud of you. To my mom, Fatima, and brothers, Johny and Karoline, for being my safe harbor. Despite everything, I also dedicate this book to my (late) father Jorge.
– Maicon Melo Alves
Foreword
Accelerating model training is critical in the area of machine learning for several reasons. As datasets grow larger and models become more complex, training times can become prohibitively long, hindering research and development progress. This is where machine learning frameworks such as PyTorch come into play, providing tools and techniques to accelerate the training process.
PyTorch, with its flexibility, GPU acceleration, optimization techniques, and distributed training capabilities, plays a crucial role in this endeavor by enabling researchers and developers to iterate quickly, train complex models efficiently, and deploy solutions faster. By leveraging PyTorch’s capabilities, practitioners can push the boundaries of what is possible in artificial intelligence and drive innovation across various domains.
Since learning all of these capabilities is not a straightforward task, this book is a great resource for all students, researchers, and professionals who intend to learn how to accelerate model training with the latest release of PyTorch in a smooth way.
This very didactic book starts by introducing how the training process works and what kind of modifications can be done at the application and environment layers to accelerate the training process.
Only after that, the following chapters describe methods to accelerate the training model, such as the Compile API, a novel capability launched in PyTorch 2.0 useful for compiling a model, and the use of specialized libraries such as OpenMP and IPEX to speed up the training process of our models even more.
It also describes the building of an efficient data pipeline to keep your GPU working at its peak for the entire training process, simplifying a model by reducing the number of parameters, and reducing the numerical precision adopted by the neural network to accelerate the training process and decrease the amount of memory needed to store the model.
Finally, this book also explains how to spread out the distributed training process to run on multiple CPUs and GPUs.
This book not only provides current and highly relevant content for the learning and updating of any professional working in the field of computing but also impresses with its extremely didactic presentation of the subject. You will certainly appreciate the quiz at the end of each chapter and the connection made between the chapters in the summary at the end of each chapter.
In all chapters, codes, and examples of use are presented. For all these reasons, I believe that the book could be successfully adopted by undergraduate and graduate courses as a support bibliography for them too.
Prof. Lúcia Maria de Assumpção Drummond
Titular professor at Fluminense Federal University, Brazil
Contributors
About the author
Dr. Maicon Melo Alves is a senior system analyst and academic professor who specializes in High-Performance Computing (HPC) systems. In the last five years, he has become interested in understanding how HPC systems have been used in AI applications. To better understand this topic, he completed an MBA in data science in 2021 at Pontifícia Universidade Católica of Rio de Janeiro (PUC-RIO). He has over 25 years of experience in IT infrastructure, and since 2006, he has worked with HPC systems at Petrobras, the Brazilian state energy company. He obtained his DSc degree in computer science from the Fluminense Federal University (UFF) in 2018 and has published three books and publications in international journals in HPC.
About the reviewer
Dimitra Charalampopoulou is a machine learning engineer with a background in technology consulting and a strong interest in AI and machine learning. She has led numerous large-scale digital transformation engineering projects for clients across the US and EMEA and has received various awards, including recognition for her start-up at the MIT Startup Competition. Additionally, she has been a speaker at two conferences in Europe on the topic of GenAI. As an advocate for women in tech, she is the founder and managing director of an NGO that promotes gender equality in tech and has taught programming classes to female students internationally.
Table of Contents
Preface
Part 1: Paving the Way
1
Deconstructing the Training Process
Technical requirements
Remembering the training process
Dataset
The training algorithm
Understanding the computational burden of the model training phase
Hyperparameters
Operations
Parameters
Quiz time!
Summary
2
Training Models Faster
Technical requirements
What options do we have?
Modifying the software stack
Increasing computing resources
Modifying the application layer
What can we change in the application layer?
Getting hands-on
What if we change the batch size?
Modifying the environment layer
What can we change in the environment layer?
Getting hands-on
Quiz time!
Summary
Part 2: Going Faster
3
Compiling the Model
Technical requirements
What do you mean by compiling?
Execution modes
Model compiling
Using the Compile API
Basic usage
Give me a real fight – training a heavier model!
How does the Compile API work under the hood?
Compiling workflow and components
Backends
Quiz time!
Summary
4
Using Specialized Libraries
Technical requirements
Multithreading with OpenMP
What is multithreading?
Using and configuring OpenMP
Using and configuring Intel OpenMP
Optimizing Intel CPU with IPEX
Using IPEX
How does IPEX work under the hood?
Quiz time!
Summary
5
Building an Efficient Data Pipeline
Technical requirements
Why do we need an efficient data pipeline?
What is a data pipeline?
How to build a data pipeline
Data pipeline bottleneck
Accelerating data loading
Optimizing a data transfer to the GPU
Configuring data pipeline workers
Reaping the rewards
Quiz time!
Summary
6
Simplifying the Model
Technical requirements
Knowing the model simplifying process
Why simplify a model? (reason)
How to simplify a model? (process)
When do we simplify a model? (moment)
Using Microsoft NNI to simplify a model
Overview of NNI
NNI in action!
Quiz time!
Summary
7
Adopting Mixed Precision
Technical requirements
Remembering numeric precision
How do computers represent numbers?
Floating-point representation
Novel data types
A summary, please!
Understanding the mixed precision strategy
What is mixed precision?
Why use mixed precision?
How to use mixed precision
How about Tensor Cores?
Enabling AMP
Activating AMP on GPU
AMP, show us what you are capable of!
Quiz time!
Summary
Part 3: Going Distributed
8
Distributed Training at a Glance
Technical requirements
A first look at distributed training
When do we need to distribute the training process?
Where do we execute distributed training?
Learning the fundamentals of parallelism strategies
Model parallelism
Data parallelism
Distributed training on PyTorch
Basic workflow
Communication backend and program launcher
Quiz time!
Summary
9
Training with Multiple CPUs
Technical requirements
Why distribute the training on multiple CPUs?
Why not increase the number of threads?
Distributed training on rescue
Implementing distributed training on multiple CPUs
The Gloo communication backend
Coding distributed training to run on multiple CPUs
Launching distributed training on multiple CPUs
Getting faster with Intel oneCCL
What is Intel oneCCL?
Code implementation and launching
Is oneCCL really better?
Quiz time!
Summary
10
Training with Multiple GPUs
Technical requirements
Demystifying the multi-GPU environment
The popularity of multi-GPU environments
Understanding multi-GPU interconnection
How does interconnection topology affect performance?
Discovering the interconnection topology
Setting GPU affinity
Implementing distributed training on multiple GPUs
The NCCL communication backend
Coding and launching distributed training with multiple GPUs
Experimental evaluation
Quiz time!
Summary
11
Training with Multiple Machines
Technical requirements
What is a computing cluster?
Workload manager
Understanding the high-performance network
Implementing distributed training on multiple machines
Getting introduced to Open MPI
Why use Open MPI and NCCL?
Coding and launching the distributed training on multiple machines
Experimental evaluation
Quiz time!
Summary
Index
Other Books You May Enjoy
Preface
Hello there! I’m a system analyst and academic professor specializing in High-Performance Computing (HPC). Yes, you read it right! I’m not a data scientist. So, you are probably wondering why on Earth I decided to write a book about machine learning. Don’t worry; I will explain.
HPC systems comprise powerful computing resources tightly integrated to solve complex problems. The main goal of HPC is to employ resources, techniques, and methods to accelerate the execution of highly intensive computing tasks. Traditionally, HPC environments have been used to execute scientific applications from biology, physics, chemistry, and many other areas.
But this has changed in the past few years. Nowadays, HPC systems run tasks beyond scientific applications. In fact, the most prominent non-scientific workload executed in HPC environments is precisely the subject of this book: the building process of complex neural network models.
As a data scientist, you know better than anyone else how long it could take to train complex models and how many times you need to retrain the model to evaluate different scenarios. For this reason, the usage of HPC systems to accelerate Artificial Intelligence (AI) applications (not only for training but also for inference) is a growth-demanding area.
This close relationship between AI and HPC sparked my interest in diving into the fields of machine learning and AI. By doing this, I could better understand how HPC has been applied to accelerate these applications.
So, here we are. I wrote this book to share what I have learned about this topic. My mission here is to give you the necessary knowledge to train your model faster by employing optimization techniques and methods using single or multiple computing resources.
By accelerating the training process, you can concentrate on what really matters: building stunning models!
Who this book is for
This book is for intermediate-level data scientists, engineers, and developers who want to know how to use PyTorch to accelerate the training process of their machine learning models. Although they are not the primary audience for this material, system analysts responsible for administrating and providing infrastructure for AI workloads will also find valuable information in this book.
Basic knowledge of machine learning, PyTorch, and Python is required to get the most out of this material. However, there is no obligation to have a prior understanding of distributed computing, accelerators, or multicore processors.
What this book covers
Chapter 1
, Deconstructing the Training Process, provides an overview of how the training process works under the hood, describing the training algorithm and covering the phases executed by this process. This chapter also explains how factors such as hyperparameters, operations, and neural network parameters impact the training process’s computational burden.
Chapter 2
, Training Models Faster, provides an overview of the possible approaches to accelerate the training process. This chapter discusses how to modify the application and environment layers of the software stack to reduce the training time. Moreover, it explains vertical and horizontal scalability as another option to improve performance by increasing the number of resources.
Chapter 3
, Compiling the Model, provides an overview of the novel Compile API introduced on PyTorch 2.0. This chapter covers the differences between eager and graph modes and describes how to use the Compile API to accelerate the model-building process. This chapter also explains the compiling workflow and components involved in the compiling process.
Chapter 4
, Using Specialized Libraries, provides an overview of the libraries used by PyTorch to execute specialized tasks. This chapter describes how to install and configure OpenMP to deal with multithreading and IPEX to optimize the training process on an Intel CPU.
Chapter 5
, Building an Efficient Data Pipeline, provides an overview of how to build an efficient data pipeline to keep the GPU working as much as possible. Besides explaining the steps executed on the data pipeline, this chapter describes how to accelerate the data-loading process by optimizing GPU data transfer and increasing the number of workers on the data pipeline.
Chapter 6
, Simplifying the Model, provides an overview of how to simplify a model by reducing the number of parameters of the neural network without sacrificing the model’s quality. This chapter describes techniques used to reduce the model complexity, such as model pruning and compression, and explains how to use the Microsoft NNI toolkit to simplify a model easily.
Chapter 7
, Adopting Mixed Precision, provides an overview of how to adopt a mixed precision strategy to burst the model training process without penalizing the model’s accuracy. This chapter briefly explains numeric representation in computer systems and describes how to employ PyTorch’s automatic mixed precision approach.
Chapter 8
, Distributed Training at a Glance, provides an overview of the basic concepts of distributed training. This chapter presents the most adopted parallel strategies and describes the basic workflow to implement distributed training on PyTorch.
Chapter 9
, Training with Multiple CPUs, provides an overview of how to code and execute distributed training in multiple CPUs on a single machine using a general approach and Intel oneCCL to optimize the execution on Intel platforms.
Chapter 10
, Training with Multiple GPUs, provides an overview of how to code and execute distributed training in a multi-GPU environment on a single machine. This chapter presents the main characteristics of a multi-GPU environment and explains how to code and launch distributed training on multiple GPUs using NCCL, the default communication backend for NVIDIA GPUs.
Chapter 11
, Training with Multiple Machines, provides an overview of how to code and execute distributed training in multiple GPUs on multiple machines. Besides an introductory explanation of computing clusters, this chapter shows how to code and launch distributed training among multiple machines using Open MPI as the launcher and NCCL as the communication backend.
To get the most out of this book
You will need to have an understanding of the basics of machine learning, PyTorch, and Python.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Download the example code files
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Accelerate-Model-Training-with-PyTorch-2.X
. If there’s an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/
. Check them out!
Conventions used
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The ipex.optimize function returns an optimized version of the model.
A block of code is set as follows:
config_list = [{ 'op_types': ['Linear'],
'exclude_op_names': ['layer4'],
'sparse_ratio': 0.3
}]
When we wish to draw your attention to a particular part of a code