Nothing Special   »   [go: up one dir, main page]

Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Ebook463 pages3 hours

Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book, written by an HPC expert with over 25 years of experience, guides you through enhancing model training performance using PyTorch. Here you’ll learn how model complexity impacts training time and discover performance tuning levels to expedite the process, as well as utilize PyTorch features, specialized libraries, and efficient data pipelines to optimize training on CPUs and accelerators. You’ll also reduce model complexity, adopt mixed precision, and harness the power of multicore systems and multi-GPU environments for distributed training. By the end, you'll be equipped with techniques and strategies to speed up training and focus on building stunning models.

LanguageEnglish
Release dateApr 30, 2024
ISBN9781805121916
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process

Related to Accelerate Model Training with PyTorch 2.X

Related ebooks

Programming For You

View More

Related articles

Reviews for Accelerate Model Training with PyTorch 2.X

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Accelerate Model Training with PyTorch 2.X - Maicon Melo Alves

    cover.png

    Accelerate Model Training with PyTorch 2.X

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Niranjan Naikwadi

    Publishing Product Manager: Sanjana Gupta

    Book Project Manager: Kirti Pisat

    Content Development Editor: Manikandan Kurup

    Technical Editor: Seemanjay Ameriya

    Copy Editor: Safis Editing

    Proofreader: Safis Editing and Manikandan Kurup

    Indexer: Hemangini Bari

    Production Designer: Aparna Bhagat

    Senior DevRel Marketing Coordinator: Vinishka Kalra

    First published: April 2024

    Production reference: 1050424

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul's Square

    Birmingham

    B3 1RB, UK.

    ISBN 978-1-80512-010-0

    To my wife and best friend, Cristiane, for being my loving partner throughout our joint life journey. To my daughters, Giovana and Camila, for being my real treasure; I’m so proud of you. To my mom, Fatima, and brothers, Johny and Karoline, for being my safe harbor. Despite everything, I also dedicate this book to my (late) father Jorge.

    – Maicon Melo Alves

    Foreword

    Accelerating model training is critical in the area of machine learning for several reasons. As datasets grow larger and models become more complex, training times can become prohibitively long, hindering research and development progress. This is where machine learning frameworks such as PyTorch come into play, providing tools and techniques to accelerate the training process.

    PyTorch, with its flexibility, GPU acceleration, optimization techniques, and distributed training capabilities, plays a crucial role in this endeavor by enabling researchers and developers to iterate quickly, train complex models efficiently, and deploy solutions faster. By leveraging PyTorch’s capabilities, practitioners can push the boundaries of what is possible in artificial intelligence and drive innovation across various domains.

    Since learning all of these capabilities is not a straightforward task, this book is a great resource for all students, researchers, and professionals who intend to learn how to accelerate model training with the latest release of PyTorch in a smooth way.

    This very didactic book starts by introducing how the training process works and what kind of modifications can be done at the application and environment layers to accelerate the training process.

    Only after that, the following chapters describe methods to accelerate the training model, such as the Compile API, a novel capability launched in PyTorch 2.0 useful for compiling a model, and the use of specialized libraries such as OpenMP and IPEX to speed up the training process of our models even more.

    It also describes the building of an efficient data pipeline to keep your GPU working at its peak for the entire training process, simplifying a model by reducing the number of parameters, and reducing the numerical precision adopted by the neural network to accelerate the training process and decrease the amount of memory needed to store the model.

    Finally, this book also explains how to spread out the distributed training process to run on multiple CPUs and GPUs.

    This book not only provides current and highly relevant content for the learning and updating of any professional working in the field of computing but also impresses with its extremely didactic presentation of the subject. You will certainly appreciate the quiz at the end of each chapter and the connection made between the chapters in the summary at the end of each chapter.

    In all chapters, codes, and examples of use are presented. For all these reasons, I believe that the book could be successfully adopted by undergraduate and graduate courses as a support bibliography for them too.

    Prof. Lúcia Maria de Assumpção Drummond

    Titular professor at Fluminense Federal University, Brazil

    Contributors

    About the author

    Dr. Maicon Melo Alves is a senior system analyst and academic professor who specializes in High-Performance Computing (HPC) systems. In the last five years, he has become interested in understanding how HPC systems have been used in AI applications. To better understand this topic, he completed an MBA in data science in 2021 at Pontifícia Universidade Católica of Rio de Janeiro (PUC-RIO). He has over 25 years of experience in IT infrastructure, and since 2006, he has worked with HPC systems at Petrobras, the Brazilian state energy company. He obtained his DSc degree in computer science from the Fluminense Federal University (UFF) in 2018 and has published three books and publications in international journals in HPC.

    About the reviewer

    Dimitra Charalampopoulou is a machine learning engineer with a background in technology consulting and a strong interest in AI and machine learning. She has led numerous large-scale digital transformation engineering projects for clients across the US and EMEA and has received various awards, including recognition for her start-up at the MIT Startup Competition. Additionally, she has been a speaker at two conferences in Europe on the topic of GenAI. As an advocate for women in tech, she is the founder and managing director of an NGO that promotes gender equality in tech and has taught programming classes to female students internationally.

    Table of Contents

    Preface

    Part 1: Paving the Way

    1

    Deconstructing the Training Process

    Technical requirements

    Remembering the training process

    Dataset

    The training algorithm

    Understanding the computational burden of the model training phase

    Hyperparameters

    Operations

    Parameters

    Quiz time!

    Summary

    2

    Training Models Faster

    Technical requirements

    What options do we have?

    Modifying the software stack

    Increasing computing resources

    Modifying the application layer

    What can we change in the application layer?

    Getting hands-on

    What if we change the batch size?

    Modifying the environment layer

    What can we change in the environment layer?

    Getting hands-on

    Quiz time!

    Summary

    Part 2: Going Faster

    3

    Compiling the Model

    Technical requirements

    What do you mean by compiling?

    Execution modes

    Model compiling

    Using the Compile API

    Basic usage

    Give me a real fight – training a heavier model!

    How does the Compile API work under the hood?

    Compiling workflow and components

    Backends

    Quiz time!

    Summary

    4

    Using Specialized Libraries

    Technical requirements

    Multithreading with OpenMP

    What is multithreading?

    Using and configuring OpenMP

    Using and configuring Intel OpenMP

    Optimizing Intel CPU with IPEX

    Using IPEX

    How does IPEX work under the hood?

    Quiz time!

    Summary

    5

    Building an Efficient Data Pipeline

    Technical requirements

    Why do we need an efficient data pipeline?

    What is a data pipeline?

    How to build a data pipeline

    Data pipeline bottleneck

    Accelerating data loading

    Optimizing a data transfer to the GPU

    Configuring data pipeline workers

    Reaping the rewards

    Quiz time!

    Summary

    6

    Simplifying the Model

    Technical requirements

    Knowing the model simplifying process

    Why simplify a model? (reason)

    How to simplify a model? (process)

    When do we simplify a model? (moment)

    Using Microsoft NNI to simplify a model

    Overview of NNI

    NNI in action!

    Quiz time!

    Summary

    7

    Adopting Mixed Precision

    Technical requirements

    Remembering numeric precision

    How do computers represent numbers?

    Floating-point representation

    Novel data types

    A summary, please!

    Understanding the mixed precision strategy

    What is mixed precision?

    Why use mixed precision?

    How to use mixed precision

    How about Tensor Cores?

    Enabling AMP

    Activating AMP on GPU

    AMP, show us what you are capable of!

    Quiz time!

    Summary

    Part 3: Going Distributed

    8

    Distributed Training at a Glance

    Technical requirements

    A first look at distributed training

    When do we need to distribute the training process?

    Where do we execute distributed training?

    Learning the fundamentals of parallelism strategies

    Model parallelism

    Data parallelism

    Distributed training on PyTorch

    Basic workflow

    Communication backend and program launcher

    Quiz time!

    Summary

    9

    Training with Multiple CPUs

    Technical requirements

    Why distribute the training on multiple CPUs?

    Why not increase the number of threads?

    Distributed training on rescue

    Implementing distributed training on multiple CPUs

    The Gloo communication backend

    Coding distributed training to run on multiple CPUs

    Launching distributed training on multiple CPUs

    Getting faster with Intel oneCCL

    What is Intel oneCCL?

    Code implementation and launching

    Is oneCCL really better?

    Quiz time!

    Summary

    10

    Training with Multiple GPUs

    Technical requirements

    Demystifying the multi-GPU environment

    The popularity of multi-GPU environments

    Understanding multi-GPU interconnection

    How does interconnection topology affect performance?

    Discovering the interconnection topology

    Setting GPU affinity

    Implementing distributed training on multiple GPUs

    The NCCL communication backend

    Coding and launching distributed training with multiple GPUs

    Experimental evaluation

    Quiz time!

    Summary

    11

    Training with Multiple Machines

    Technical requirements

    What is a computing cluster?

    Workload manager

    Understanding the high-performance network

    Implementing distributed training on multiple machines

    Getting introduced to Open MPI

    Why use Open MPI and NCCL?

    Coding and launching the distributed training on multiple machines

    Experimental evaluation

    Quiz time!

    Summary

    Index

    Other Books You May Enjoy

    Preface

    Hello there! I’m a system analyst and academic professor specializing in High-Performance Computing (HPC). Yes, you read it right! I’m not a data scientist. So, you are probably wondering why on Earth I decided to write a book about machine learning. Don’t worry; I will explain.

    HPC systems comprise powerful computing resources tightly integrated to solve complex problems. The main goal of HPC is to employ resources, techniques, and methods to accelerate the execution of highly intensive computing tasks. Traditionally, HPC environments have been used to execute scientific applications from biology, physics, chemistry, and many other areas.

    But this has changed in the past few years. Nowadays, HPC systems run tasks beyond scientific applications. In fact, the most prominent non-scientific workload executed in HPC environments is precisely the subject of this book: the building process of complex neural network models.

    As a data scientist, you know better than anyone else how long it could take to train complex models and how many times you need to retrain the model to evaluate different scenarios. For this reason, the usage of HPC systems to accelerate Artificial Intelligence (AI) applications (not only for training but also for inference) is a growth-demanding area.

    This close relationship between AI and HPC sparked my interest in diving into the fields of machine learning and AI. By doing this, I could better understand how HPC has been applied to accelerate these applications.

    So, here we are. I wrote this book to share what I have learned about this topic. My mission here is to give you the necessary knowledge to train your model faster by employing optimization techniques and methods using single or multiple computing resources.

    By accelerating the training process, you can concentrate on what really matters: building stunning models!

    Who this book is for

    This book is for intermediate-level data scientists, engineers, and developers who want to know how to use PyTorch to accelerate the training process of their machine learning models. Although they are not the primary audience for this material, system analysts responsible for administrating and providing infrastructure for AI workloads will also find valuable information in this book.

    Basic knowledge of machine learning, PyTorch, and Python is required to get the most out of this material. However, there is no obligation to have a prior understanding of distributed computing, accelerators, or multicore processors.

    What this book covers

    Chapter 1

    , Deconstructing the Training Process, provides an overview of how the training process works under the hood, describing the training algorithm and covering the phases executed by this process. This chapter also explains how factors such as hyperparameters, operations, and neural network parameters impact the training process’s computational burden.

    Chapter 2

    , Training Models Faster, provides an overview of the possible approaches to accelerate the training process. This chapter discusses how to modify the application and environment layers of the software stack to reduce the training time. Moreover, it explains vertical and horizontal scalability as another option to improve performance by increasing the number of resources.

    Chapter 3

    , Compiling the Model, provides an overview of the novel Compile API introduced on PyTorch 2.0. This chapter covers the differences between eager and graph modes and describes how to use the Compile API to accelerate the model-building process. This chapter also explains the compiling workflow and components involved in the compiling process.

    Chapter 4

    , Using Specialized Libraries, provides an overview of the libraries used by PyTorch to execute specialized tasks. This chapter describes how to install and configure OpenMP to deal with multithreading and IPEX to optimize the training process on an Intel CPU.

    Chapter 5

    , Building an Efficient Data Pipeline, provides an overview of how to build an efficient data pipeline to keep the GPU working as much as possible. Besides explaining the steps executed on the data pipeline, this chapter describes how to accelerate the data-loading process by optimizing GPU data transfer and increasing the number of workers on the data pipeline.

    Chapter 6

    , Simplifying the Model, provides an overview of how to simplify a model by reducing the number of parameters of the neural network without sacrificing the model’s quality. This chapter describes techniques used to reduce the model complexity, such as model pruning and compression, and explains how to use the Microsoft NNI toolkit to simplify a model easily.

    Chapter 7

    , Adopting Mixed Precision, provides an overview of how to adopt a mixed precision strategy to burst the model training process without penalizing the model’s accuracy. This chapter briefly explains numeric representation in computer systems and describes how to employ PyTorch’s automatic mixed precision approach.

    Chapter 8

    , Distributed Training at a Glance, provides an overview of the basic concepts of distributed training. This chapter presents the most adopted parallel strategies and describes the basic workflow to implement distributed training on PyTorch.

    Chapter 9

    , Training with Multiple CPUs, provides an overview of how to code and execute distributed training in multiple CPUs on a single machine using a general approach and Intel oneCCL to optimize the execution on Intel platforms.

    Chapter 10

    , Training with Multiple GPUs, provides an overview of how to code and execute distributed training in a multi-GPU environment on a single machine. This chapter presents the main characteristics of a multi-GPU environment and explains how to code and launch distributed training on multiple GPUs using NCCL, the default communication backend for NVIDIA GPUs.

    Chapter 11

    , Training with Multiple Machines, provides an overview of how to code and execute distributed training in multiple GPUs on multiple machines. Besides an introductory explanation of computing clusters, this chapter shows how to code and launch distributed training among multiple machines using Open MPI as the launcher and NCCL as the communication backend.

    To get the most out of this book

    You will need to have an understanding of the basics of machine learning, PyTorch, and Python.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Accelerate-Model-Training-with-PyTorch-2.X

    . If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The ipex.optimize function returns an optimized version of the model.

    A block of code is set as follows:

    config_list = [{    'op_types': ['Linear'],

        'exclude_op_names': ['layer4'],

        'sparse_ratio': 0.3

    }]

    When we wish to draw your attention to a particular part of a code

    Enjoying the preview?
    Page 1 of 1