Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques

Ebook699 pages4 hours

Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques

By Abhishek Kumar and Dr. Mounir Abdelaziz

Rating: 0 out of 5 stars

()

Read preview

About this ebook

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance.

Machine Learning for Imbalanced Data begins by introducing you to the challenges posed by imbalanced datasets and the importance of addressing these issues. It then guides you through techniques that enhance the performance of classical machine learning models when using imbalanced data, including various sampling and cost-sensitive learning methods.

As you progress, you’ll delve into similar and more advanced techniques for deep learning models, employing PyTorch as the primary framework. Throughout the book, hands-on examples will provide working and reproducible code that’ll demonstrate the practical implementation of each technique.

By the end of this book, you’ll be adept at identifying and addressing class imbalances and confidently applying various techniques, including sampling, cost-sensitive techniques, and threshold adjustment, while using traditional machine learning or deep learning models.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateNov 30, 2023

ISBN9781801070881

Author

Abhishek Kumar

Dr. Abhishek Kumar is a post-doctorate fellow in computer science at Ingenium Research Group, based at Universidad De Castilla-La Mancha in Spain. He has been teaching in academia for more than 8 years, and published more than 50 articles in reputed, peer reviewed national and international journals, books, and conferences. His research area includes artificial intelligence, image processing, computer vision, data mining, and machine learning.

Related to Machine Learning for Imbalanced Data

Related ebooks

Skip carousel

Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
Ebook
Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models
byAli Madani
Rating: 0 out of 5 stars
0 ratings
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
Ebook
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
byMiroslaw Staron
Rating: 0 out of 5 stars
0 ratings
Synthetic Data for Machine Learning: Revolutionize your approach to machine learning with this comprehensive conceptual guide
Ebook
Synthetic Data for Machine Learning: Revolutionize your approach to machine learning with this comprehensive conceptual guide
byAbdulrahman Kerim
Rating: 0 out of 5 stars
0 ratings
Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples
Ebook
Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples
bySerg Masís
Rating: 0 out of 5 stars
0 ratings
Azure Machine Learning Engineering: Deploy, fine-tune, and optimize ML models using Microsoft Azure
Ebook
Azure Machine Learning Engineering: Deploy, fine-tune, and optimize ML models using Microsoft Azure
bySina Fakhraee
Rating: 0 out of 5 stars
0 ratings
The Definitive Guide to Google Vertex AI: Accelerate your machine learning journey with Google Cloud Vertex AI and MLOps best practices
Ebook
The Definitive Guide to Google Vertex AI: Accelerate your machine learning journey with Google Cloud Vertex AI and MLOps best practices
byJasmeet Bhatia
Rating: 0 out of 5 stars
0 ratings
Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data
Ebook
Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data
byBrett Lantz
Rating: 0 out of 5 stars
0 ratings
Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)
Ebook
Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)
bySunil Patel
Rating: 0 out of 5 stars
0 ratings
Ultimate Machine Learning with ML.NET: Build, Optimize, and Deploy Powerful Machine Learning Models for Data-Driven Insights with ML.NET, Azure Functions, and Web API (English Edition)
Ebook
Ultimate Machine Learning with ML.NET: Build, Optimize, and Deploy Powerful Machine Learning Models for Data-Driven Insights with ML.NET, Azure Functions, and Web API (English Edition)
byKalicharan Mahasivabhattu
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners - 2nd Edition: Build and deploy Machine Learning systems using Python (English Edition)
Ebook
Machine Learning for Beginners - 2nd Edition: Build and deploy Machine Learning systems using Python (English Edition)
byDr. Harsh Bhasin
Rating: 0 out of 5 stars
0 ratings
Deep Learning with PyTorch: A practical approach to building neural network models using PyTorch
Ebook
Deep Learning with PyTorch: A practical approach to building neural network models using PyTorch
byVishnu Subramanian
Rating: 0 out of 5 stars
0 ratings
R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5
Ebook
R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5
byDr. Sunil Kumar Chinnamgari
Rating: 0 out of 5 stars
0 ratings
Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more
Ebook
Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more
byAditya Bhattacharya
Rating: 0 out of 5 stars
0 ratings
Hands-On One-shot Learning with Python: Learn to implement fast and accurate deep learning models with fewer training samples using PyTorch
Ebook
Hands-On One-shot Learning with Python: Learn to implement fast and accurate deep learning models with fewer training samples using PyTorch
byShruti Jadon
Rating: 0 out of 5 stars
0 ratings
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Ebook
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
byMargaux Masson-Forsythe
Rating: 0 out of 5 stars
0 ratings
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
Ebook
Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples
byAndrew P. McMahon
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning with Azure: Build powerful models with cognitive machine learning and artificial intelligence
Ebook
Hands-On Machine Learning with Azure: Build powerful models with cognitive machine learning and artificial intelligence
byThomas K Abraham
Rating: 0 out of 5 stars
0 ratings
Deep Learning with TensorFlow: Explore neural networks with Python
Ebook
Deep Learning with TensorFlow: Explore neural networks with Python
byGiancarlo Zaccone
Rating: 0 out of 5 stars
0 ratings
Automated Machine Learning: Hyperparameter optimization, neural architecture search, and algorithm selection with cloud platforms
Ebook
Automated Machine Learning: Hyperparameter optimization, neural architecture search, and algorithm selection with cloud platforms
byAdnan Masood
Rating: 0 out of 5 stars
0 ratings
Hands-On Automated Machine Learning: A beginner's guide to building automated machine learning systems using AutoML and Python
Ebook
Hands-On Automated Machine Learning: A beginner's guide to building automated machine learning systems using AutoML and Python
bySibanjan Das
Rating: 0 out of 5 stars
0 ratings
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
Ebook
A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg
byDr. Ranja Sarkar
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Emotion Analysis in Python: Build AI-powered tools for analyzing emotion using natural language processing and machine learning
Ebook
Machine Learning for Emotion Analysis in Python: Build AI-powered tools for analyzing emotion using natural language processing and machine learning
byAllan Ramsay
Rating: 0 out of 5 stars
0 ratings
R Machine Learning Essentials
Ebook
R Machine Learning Essentials
byUsuelli Michele
Rating: 0 out of 5 stars
0 ratings
Privacy-Preserving Machine Learning: A use-case-driven approach to building and protecting ML pipelines from privacy and security threats
Ebook
Privacy-Preserving Machine Learning: A use-case-driven approach to building and protecting ML pipelines from privacy and security threats
bySrinivasa Rao Aravilli
Rating: 0 out of 5 stars
0 ratings
Python Deep Learning Projects: 9 projects demystifying neural network and deep learning models for building intelligent systems
Ebook
Python Deep Learning Projects: 9 projects demystifying neural network and deep learning models for building intelligent systems
byMatthew Lamons
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
Ebook
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
byShekhar Khandelwal
Rating: 0 out of 5 stars
0 ratings
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
Ebook
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
bySiddhanta Bhatta
Rating: 0 out of 5 stars
0 ratings
Modern Computer Vision with PyTorch: A practical roadmap from deep learning fundamentals to advanced applications and Generative AI
Ebook
Modern Computer Vision with PyTorch: A practical roadmap from deep learning fundamentals to advanced applications and Generative AI
byV Kishore Ayyadevara
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning and Image Processing: For Facial Recognition, Object Detection, and Pattern Recognition Using Python
Ebook
Practical Machine Learning and Image Processing: For Facial Recognition, Object Detection, and Pattern Recognition Using Python
byHimanshu Singh
Rating: 0 out of 5 stars
0 ratings
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
Ebook
Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide
byMatt R. Cole
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byT.C. Boyle
Rating: 0 out of 5 stars
0 ratings
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Make Your PC Stable and Fast: What Microsoft Forgot to Tell You
Ebook
Make Your PC Stable and Fast: What Microsoft Forgot to Tell You
byCharles Spender
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Tor and the Dark Art of Anonymity
Ebook
Tor and the Dark Art of Anonymity
byLance Henderson
Rating: 5 out of 5 stars
5/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
Podcast episode
Breaking Down Today’s Machine Learning Technology with Christina Pawlikowski: Melissa Perri is joined by Christina Pawlikowski, a teaching fellow at Harvard and co-founder of Causal, to help demystify machine learning and AI on this episode of Product Thinking.
byProduct Thinking
0 ratings
0% found this document useful
The Role of Infrastructure in ML // Niels Bantilan // #197
Podcast episode
The Role of Infrastructure in ML // Niels Bantilan // #197
byMLOps.community
0 ratings
0% found this document useful
343: Forging Effective Learning with Bror Saxberg
Podcast episode
343: Forging Effective Learning with Bror Saxberg
byLeading Learning Podcast
0 ratings
0% found this document useful
Product Enrichment and Recommender Systems // Marc Lindner and Amr Mashlah // Coffee Sessions #114
Podcast episode
Product Enrichment and Recommender Systems // Marc Lindner and Amr Mashlah // Coffee Sessions #114
byMLOps.community
0 ratings
0% found this document useful
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
Podcast episode
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
byEye On A.I.
0 ratings
0% found this document useful
Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
Podcast episode
Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
byPartially Redacted: Data, AI, Security, and Privacy
0 ratings
0% found this document useful
Ads Ranking Evolution at Pinterest // Aayush Mudgal // #211
Podcast episode
Ads Ranking Evolution at Pinterest // Aayush Mudgal // #211
byMLOps.community
0 ratings
0% found this document useful
#98 Interpretable Machine Learning
Podcast episode
#98 Interpretable Machine Learning
byDataFramed
0 ratings
0% found this document useful
Diving Into Machine Learning | Using AI To Enhance Education & Student Success: In this episode, we discuss machine learning research and its many benefits for students. As technology progresses, this system can be used to develop data-driven teaching strategies that may redefine the future of education… Want to find out more...
Podcast episode
Diving Into Machine Learning | Using AI To Enhance Education & Student Success: In this episode, we discuss machine learning research and its many benefits for students. As technology progresses, this system can be used to develop data-driven teaching strategies that may redefine the future of education… Want to find out more...
byFinding Genius Podcast
0 ratings
0% found this document useful
PERPLEXITY AI - The future of search.
Podcast episode
PERPLEXITY AI - The future of search.
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Casess // Amritha Arun Babu & Abhik Choudhury // #221
Podcast episode
MLOps - Design Thinking to Build ML Infra for ML and LLM Use Casess // Amritha Arun Babu & Abhik Choudhury // #221
byMLOps.community
0 ratings
0% found this document useful
Open Source Software as a Triumph of Information Hiding, Modularity, and Creating Optionality with Dr. Gail Murphy: In this newest episode of The Idealcast, Gene Kim speaks with Dr. Gail Murphy, Professor of Computer Science and Vice President of Research and Innovation at the University of British Columbia. She is also the co-founder, board member, and former Chi...
Podcast episode
Open Source Software as a Triumph of Information Hiding, Modularity, and Creating Optionality with Dr. Gail Murphy: In this newest episode of The Idealcast, Gene Kim speaks with Dr. Gail Murphy, Professor of Computer Science and Vice President of Research and Innovation at the University of British Columbia. She is also the co-founder, board member, and former Chi...
byThe Idealcast with Gene Kim by IT Revolution
0 ratings
0% found this document useful
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
Podcast episode
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
byMLOps.community
0 ratings
0% found this document useful
Trustworthy Machine Learning // Kush Varshney // Coffee Sessions #124
Podcast episode
Trustworthy Machine Learning // Kush Varshney // Coffee Sessions #124
byMLOps.community
0 ratings
0% found this document useful
Evaluation Panel // Large Language Models in Production Conference Part II
Podcast episode
Evaluation Panel // Large Language Models in Production Conference Part II
byMLOps.community
0 ratings
0% found this document useful
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
Podcast episode
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
byMLOps.community
0 ratings
0% found this document useful
The Three Roles of the Chief Data Officer: ADP’s Jack Berkowitz
Podcast episode
The Three Roles of the Chief Data Officer: ADP’s Jack Berkowitz
byMe, Myself, and AI
0 ratings
0% found this document useful
EP 150: Navigating AI's Tsunami - Strategies for Recruitment, Retention + Growth
Podcast episode
EP 150: Navigating AI's Tsunami - Strategies for Recruitment, Retention + Growth
byEveryday AI Podcast – An AI and ChatGPT Podcast
0 ratings
0% found this document useful
271 AI, Machine Learning And Robotics: The future Of Dentistry With Andrew Carr: THIS EPISODE COUNTS FOR CE! - but read the disclaimers it might not count for your state. Go to take the test and get your free CE Credit! Amanda Hill joins Andrew as a co-host today for a fascinating interview with...
Podcast episode
271 AI, Machine Learning And Robotics: The future Of Dentistry With Andrew Carr: THIS EPISODE COUNTS FOR CE! - but read the disclaimers it might not count for your state. Go to take the test and get your free CE Credit! Amanda Hill joins Andrew as a co-host today for a fascinating interview with...
byA Tale of Two Hygienists Podcast
0 ratings
0% found this document useful
Machine Learning, Business Success – Charles Martin, PhD, Data Scientist, Machine Learning AI Consultant, and Chief Scientist at Calculation Consulting – Rapidly Evolving Opportunities For Business Via Machine Learning and Data Science: Charles Martin, PhD, data scientist, machine learning AI consultant, and chief scientist at Calculation Consulting, delivers a thorough overview of the technologies that are helping companies expand their customer base and increase revenue. Martin is...
Podcast episode
Machine Learning, Business Success – Charles Martin, PhD, Data Scientist, Machine Learning AI Consultant, and Chief Scientist at Calculation Consulting – Rapidly Evolving Opportunities For Business Via Machine Learning and Data Science: Charles Martin, PhD, data scientist, machine learning AI consultant, and chief scientist at Calculation Consulting, delivers a thorough overview of the technologies that are helping companies expand their customer base and increase revenue. Martin is...
byFinding Genius Podcast
0 ratings
0% found this document useful
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
Podcast episode
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Challenges Operationalizing ML (And Some Solutions) // Nathan Ryan Frank // #199
Podcast episode
Challenges Operationalizing ML (And Some Solutions) // Nathan Ryan Frank // #199
byMLOps.community
0 ratings
0% found this document useful
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics: Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.
Podcast episode
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics: Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.
byData Engineering Podcast
0 ratings
0% found this document useful
275: Designing Content Scientifically with Ruth Colvin Clark and Myra Roldan
Podcast episode
275: Designing Content Scientifically with Ruth Colvin Clark and Myra Roldan
byLeading Learning Podcast
0 ratings
0% found this document useful
474 The AI Playbook by Eric Siegel: The AI Playbook: Mastering the Rare Art of Machine Learning Deployment by Eric Siegel ABOUT THE BOOK: In his bestselling first book, Eric Siegel explained how machine learning works. Now, in , he shows how to capitalize on it. The greatest tool...
Podcast episode
474 The AI Playbook by Eric Siegel: The AI Playbook: Mastering the Rare Art of Machine Learning Deployment by Eric Siegel ABOUT THE BOOK: In his bestselling first book, Eric Siegel explained how machine learning works. Now, in , he shows how to capitalize on it. The greatest tool...
byThe Marketing Book Podcast
0 ratings
0% found this document useful
Some Big AI Problems: The Eliza Effect and More: Yes, everyone is talking about AI. However, how do the concerns about AI apply to our classrooms today? Tom Mullaney talks about concerns with: The Eliza effect—where people attribute human characteristics such as trust and credibility to...
Podcast episode
Some Big AI Problems: The Eliza Effect and More: Yes, everyone is talking about AI. However, how do the concerns about AI apply to our classrooms today? Tom Mullaney talks about concerns with: The Eliza effect—where people attribute human characteristics such as trust and credibility to...
by10 Minute Teacher Podcast with Cool Cat Teacher
0 ratings
0% found this document useful
Eco-Friendly and Cost-Effective #EdTech - HoET221: Featured Content (2:47) Frank Bouchard is the co-founder of Wipebook. During our conversation Frank and I talk about the following: The history and technology behind Wipebook's reusable notebooks Examples of how Wipebook's notebooks are being used in...
Podcast episode
Eco-Friendly and Cost-Effective #EdTech - HoET221: Featured Content (2:47) Frank Bouchard is the co-founder of Wipebook. During our conversation Frank and I talk about the following: The history and technology behind Wipebook's reusable notebooks Examples of how Wipebook's notebooks are being used in...
byHouse of #EdTech
0 ratings
0% found this document useful
Working backward from winning & preparing for future tech innovation w/ Evan Welbourne #181: We discuss how to navigate the delicate balance between meeting current customer needs while also preparing for future tech trends & opportunities with Evan Welbourne, Head of AI and Data @ Samsara. Evan dissects the rapidly transforming pace of developing AI/ML products, sharing strategies for merging conversations around differing product-building processes, tips for moving seamlessly / gaining approval between product development stages, defining what customer success looks like, methods for working backward from problems, and best practices for avoiding friction throughout the product development process. He also shares frameworks for envisioning & working toward future tech possibilities while simultaneously developing hypotheses that inform future direction, creating diverse AI/ML team composition, and effectively communicating with stakeholders.
Podcast episode
Working backward from winning & preparing for future tech innovation w/ Evan Welbourne #181: We discuss how to navigate the delicate balance between meeting current customer needs while also preparing for future tech trends & opportunities with Evan Welbourne, Head of AI and Data @ Samsara. Evan dissects the rapidly transforming pace of developing AI/ML products, sharing strategies for merging conversations around differing product-building processes, tips for moving seamlessly / gaining approval between product development stages, defining what customer success looks like, methods for working backward from problems, and best practices for avoiding friction throughout the product development process. He also shares frameworks for envisioning & working toward future tech possibilities while simultaneously developing hypotheses that inform future direction, creating diverse AI/ML team composition, and effectively communicating with stakeholders.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
AI Ingenuity – Dr. Lisa Amini, Director, MIT-IBM Watson AI Lab – The Future of Machine Learning and Natural Language Processing in AI-based Products and Structures: Dr. Lisa Amini is the director of IBM Research Cambridge, which includes the MIT-IBM Watson AI Lab. Watson is a complex question-answering computer system that is capable of providing answers to questions that are directed in natural language; it was...
Podcast episode
AI Ingenuity – Dr. Lisa Amini, Director, MIT-IBM Watson AI Lab – The Future of Machine Learning and Natural Language Processing in AI-based Products and Structures: Dr. Lisa Amini is the director of IBM Research Cambridge, which includes the MIT-IBM Watson AI Lab. Watson is a complex question-answering computer system that is capable of providing answers to questions that are directed in natural language; it was...
byFinding Genius Podcast
0 ratings
0% found this document useful
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
Podcast episode
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
byMLOps.community
0 ratings
0% found this document useful

Skip carousel

Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Tech Tutor Exponential Technologies Are Changing
Business Today
Article
Tech Tutor Exponential Technologies Are Changing
Mar 5, 2020
8 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Things Get Strange When AI Starts Training Itself
The Atlantic
Article
Things Get Strange When AI Starts Training Itself
Feb 16, 2024
7 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Interviewing With Bots
Finweek - English
Article
Interviewing With Bots
Oct 8, 2021
imagine that your next job interview is with an artificial intelligence (AI) recruiting platform. It is a virtual meeting and the computer-generated person on your screen looks as life-like as you could imagine. It displays all the emotions and facia
3 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
Jobs Of The Future
True Love
Article
Jobs Of The Future
Jan 26, 2023
5 min read
Forward Thinking
Racecar Engineering
Article
Forward Thinking
Feb 4, 2022
8 min read
Fact-check And Verify Information
Post South Africa
Article
Fact-check And Verify Information
Mar 13, 2024
Q: What is AI? A: AI is the acronym for artificial intelligence (AI) and refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-maki
3 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Seed of Doubt
Business Today
Article
Seed of Doubt
Feb 6, 2018
2 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
AI Revolutionaries Are At The Gates Of Your Organization's HR Department
The European Business Review
Article
AI Revolutionaries Are At The Gates Of Your Organization's HR Department
Dec 2, 2022
5 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
The Future Is Here
Business Today
Article
The Future Is Here
Oct 30, 2017
17 min read
Brain Trust
Fast Company
Article
Brain Trust
Aug 8, 2016
5 min read
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Rotman Management
Article
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Jan 1, 2018
You believe that the world of leadership has hit an inflection point. How so? As useful as popular mental models and heuristics are, machine models now outstrip human performance in about half of the portfolio of cognitive tasks. Going forward, we wi
6 min read
Mythbusting AI, What Marketers Should Really Know
AdNews
Article
Mythbusting AI, What Marketers Should Really Know
Nov 20, 2019
2 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
The Tech Takeover
Business Today
Article
The Tech Takeover
Oct 15, 2020
6 min read
Understanding The POTENTIAL OF AI In A Technology Driven World
The European Business Review
Article
Understanding The POTENTIAL OF AI In A Technology Driven World
Apr 3, 2019
9 min read
Cognitive Agents and Reinforcement of User Experience
Techfastly
Article
Cognitive Agents and Reinforcement of User Experience
Dec 1, 2021
3 min read
Learning Code
India Today
Article
Learning Code
Feb 1, 2020
2 min read
More Companies See 'Neurodiverse' Job Candidates As Untapped Talent Pool
NPR
Article
More Companies See 'Neurodiverse' Job Candidates As Untapped Talent Pool
Jan 3, 2019
3 min read

Related categories

Skip carousel

Reviews for Machine Learning for Imbalanced Data

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning for Imbalanced Data - Abhishek Kumar

Cover_(3).png

Machine Learning for Imbalanced Data

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Sanjana Gupta

Book Project Manager: Kirti Pisat

Senior Editor: Rohit Singh

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Nilesh Mohite

DevRel Marketing Coordinator: Vinishka Kalra

First published: November 2023

Production reference: 2221123

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80107-083-6

www.packtpub.com

Contributors

About the authors

Kumar Abhishek is a seasoned senior machine learning engineer at Expedia Group, US, specializing in risk analysis and fraud detection. With over a decade of machine learning and software engineering experience, Kumar has worked for companies such as Microsoft, Amazon, and a Bay Area start-up. Kumar holds a master’s degree in computer science from the University of Florida, Gainesville.

To my incredible wife who has been my rock and constant source of inspiration, our adorable son who fills our lives with joy, my wonderful parents for their unwavering support, and my close friends. Immense thanks to Christian, who has been a pivotal mentor and guide, for his meticulous reviews. My deepest gratitude to my co-author, Mounir, and contributor, Anshul; their dedication and solid contributions were essential in shaping this book. Lastly, I extend my sincere appreciation to Abhiram and the Packt team for their unwavering support.

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification.

I would like to thank my family, especially my parents, for their support and encouragement. I also want to thank all the fantastic people I collaborated with, including my co-author, Packt editors, and reviewers. Without their help, writing this book wouldn’t have been possible.

Other contributor

Anshul Yadav is a software developer and trainer with a keen interest in machine learning, web development, and theoretical computer science. He likes to solve technical problems: the slinkier, the better. He has a B.Tech. degree in computer science and engineering from IIT Kanpur. Anshul loves to share the joy of learning with his audience.

About the reviewers

Christian Monson has nine years of industry experience working as a machine learning scientist specializing in Natural Language Processing (NLP) and speech recognition. For five of those years, he worked at Amazon improving the Alexa personal assistant. During the 2000s, he was a graduate student at Carnegie Mellon University and a postdoc at Oregon Health and Science University working on NLP. Christian completed his bachelor’s degree in computer science, with minors in math and physics, at Brigham Young University in 2000. In his free time, Christian creates video games and plays with his kids. Currently, he is a full-time tutor and mentor in machine learning. You can find Christian at or watch his videos at .

Abhiram Jagarlapudi is a principal software engineer with 10 years of experience in cloud computing and Artificial Intelligence (AI). At Amazon Web Services and Oracle Cloud, Abhiram was part of launching several public cloud services, later specializing in cloud AI services. He was part of a small team that built the software delivery infrastructure of Oracle Cloud, which started in 2016 and has since grown into a multi-billion-dollar business. He also designed and developed AI services for the Oracle Cloud and is passionate about applying that experience to improve and accelerate the delivery of machine learning.

Table of Contents

Preface

Introduction to Data Imbalance in Machine Learning

Technical requirements

Introduction to imbalanced datasets

Machine learning 101

What happens during model training?

Types of dataset and splits

Cross-validation

Common evaluation metrics

Confusion matrix

ROC

Precision-Recall curve

Relation between the ROC curve and PR curve

Challenges and considerations when dealing with imbalanced data

When can we have an imbalance in datasets?

Why can imbalanced data be a challenge?

When to not worry about data imbalance

Introduction to the imbalanced-learn library

General rules to follow

Summary

Questions

References

Oversampling Methods

Technical requirements

What is oversampling?

Random oversampling

Problems with random oversampling

SMOTE

How SMOTE works

Problems with SMOTE

SMOTE variants

Borderline-SMOTE

ADASYN

Working of ADASYN

Categorical features and SMOTE variants (SMOTE-NC and SMOTEN)

Model performance comparison of various oversampling methods

Guidance for using various oversampling techniques

When to avoid oversampling

Oversampling in multi-class classification

Summary

Exercises

References

Undersampling Methods

Technical requirements

Introducing undersampling

When to avoid undersampling the majority class

Fixed versus cleaning undersampling

Undersampling approaches

Removing examples uniformly

Random UnderSampling

ClusterCentroids

Strategies for removing noisy observations

ENN, RENN, and AllKNN

Tomek links

Neighborhood Cleaning Rule

Instance hardness threshold

Strategies for removing easy observations

Condensed Nearest Neighbors

One-sided selection

Combining undersampling and oversampling

Model performance comparison

Summary

Exercises

References

Ensemble Methods

Technical requirements

Bagging techniques for imbalanced data

UnderBagging

OverBagging

SMOTEBagging

Comparative performance of bagging methods

Boosting techniques for imbalanced data

AdaBoost

RUSBoost, SMOTEBoost, and RAMOBoost

Ensemble of ensembles

EasyEnsemble

Comparative performance of boosting methods

Model performance comparison

Summary

Questions

References

Cost-Sensitive Learning

Technical requirements

The concept of Cost-Sensitive Learning

Costs and cost functions

Types of cost-sensitive learning

Difference between CSL and resampling

Problems with rebalancing techniques

Understanding costs in practice

Cost-Sensitive Learning for logistic regression

Cost-Sensitive Learning for decision trees

Cost-Sensitive Learning using scikit-learn and XGBoost models

MetaCost – making any classification model cost-sensitive

Threshold adjustment

Methods for threshold tuning

Summary

Questions

References

Data Imbalance in Deep Learning

Technical requirements

A brief introduction to deep learning

Neural networks

Perceptron

Activation functions

Layers

Feedforward neural networks

Training neural networks

The effect of the learning rate on data imbalance

Image processing using Convolutional Neural Networks

Text analysis using Natural Language Processing

Data imbalance in deep learning

The impact of data imbalance on deep learning models

Overview of deep learning techniques to handle data imbalance

Multi-label classification

Summary

Questions

References

Data-Level Deep Learning Methods

Technical requirements

Preparing the data

Creating the training loop

Sampling techniques for deep learning models

Random oversampling

Dynamic sampling

Data augmentation techniques for vision

Data-level techniques for text classification

Dataset and baseline model

Document-level augmentation

Character and word-level augmentation

Discussion of other data-level deep learning methods and their key ideas

Two-phase learning

Expansive Over-Sampling

Using generative models for oversampling

DeepSMOTE

Neural style transfer

Summary

Questions

References

Algorithm-Level Deep Learning Techniques

Technical requirements

Motivation for algorithm-level techniques

Weighting techniques

Using PyTorch’s weight parameter

Handling textual data

Deferred re-weighting – a minor variant of the class weighting technique

Explicit loss function modification

Focal loss

Class-balanced loss

Class-dependent temperature Loss

Class-wise difficulty-balanced loss

Discussing other algorithm-based techniques

Regularization techniques

Siamese networks

Deeper neural networks

Threshold adjustment

Summary

Questions

References

Hybrid Deep Learning Methods

Technical requirements

Using graph machine learning for imbalanced data

Understanding graphs

Graph machine learning

Dealing with imbalanced data

Case study – the performance of XGBoost, MLP, and a GCN on an imbalanced dataset

Hard example mining

Online Hard Example Mining

Minority class incremental rectification

Utilizing the hard sample mining technique in minority class incremental rectification

Summary

Questions

References

Model Calibration

Technical requirements

Introduction to model calibration

Why bother with model calibration

Models with and without well-calibrated probabilities

Calibration curves or reliability plot

Brier score

Expected Calibration Error

The influence of data balancing techniques on model calibration

Plotting calibration curves for a model trained on a real-world dataset

Model calibration techniques

The calibration of model scores to account for sampling

Platt’s scaling

Isotonic regression

Choosing between Platt’s scaling and Isotonic regression

Temperature scaling

Label smoothing

The impact of calibration on a model’s performance

Summary

Questions

References

Appendix

Machine Learning Pipeline in Production

Machine learning training pipeline

Inferencing (online or batch)

Assessments

Chapter 1 – Introduction to Data Imbalance in Machine Learning

Chapter 2 – Oversampling Methods

Chapter 3 – Undersampling Methods

Chapter 4 – Ensemble Methods

Chapter 5 – Cost-Sensitive Learning

Chapter 6 – Data Imbalance in Deep Learning

Chapter 7 – Data-Level Deep Learning Methods

Chapter 8 – Algorithm-Level Deep Learning Techniques

Chapter 9 – Hybrid Deep Learning Methods

Chapter 10 – Model Calibration

Index

Other Books You May Enjoy

Preface

Hello and welcome! Machine Learning (ML) enables computers to learn from data using algorithms to make informed decisions, automate tasks, and extract valuable insights. One particular aspect that often garners attention is imbalanced data, where certain classes may have considerably fewer samples than others.

This book provides an in-depth guide to understanding and navigating the intricacies of skewed data. You will gain insights into best practices for managing imbalanced datasets in ML contexts.

While imbalanced data can present challenges, it’s important to understand that the techniques to address this imbalance are not universally applicable. Their relevance and necessity depend on various factors such as the domain, the data distribution, the performance metrics you’re optimizing, and the business objectives. Before adopting any techniques, it’s essential to establish a baseline. Even if you don’t currently face issues with imbalanced data, it can be beneficial to be aware of the challenges and solutions discussed in this book. Familiarizing yourself with these techniques will provide you with a comprehensive toolkit, preparing you for scenarios that you may not yet know you’ll encounter. If you do find that model performance is lacking, especially for underrepresented (minority) classes, the insights and strategies covered in the book can be instrumental in guiding effective improvements.

As the domains of ML and artificial intelligence continue to grow, there will be an increasing demand for professionals who can adeptly handle various data challenges, including imbalance. This book aims to equip you with the knowledge and tools to be one of those sought-after experts.

Who this book is for

This comprehensive book is thoughtfully tailored to meet the needs of a variety of professionals, including the following:

ML researchers, ML scientists, ML engineers, and students: Professionals and learners in the fields of ML and deep learning who seek to gain valuable insights and practical knowledge for tackling the challenges posed by data imbalance

Data scientists and analysts: Experienced data experts eager to expand their knowledge of handling skewed data with practical, real-world solutions

Software engineers: Software engineers who want to effectively integrate ML and deep learning solutions into their applications when dealing with imbalanced data

Practical insight seekers: Professionals and enthusiasts from various backgrounds who want to use hands-on, industry-relevant approaches for efficiently dealing with data imbalance in ML and deep learning, enabling them to excel in their respective roles

What this book covers

Chapter 1, Introduction to Data Imbalance in Machine Learning, serves as an exploration of data imbalance within the context of ML. This chapter elucidates the nature of imbalanced data, distinguishing it from other dataset types. It also provides a comprehensive introduction to the essential components of ML and model performance metrics most relevant for cases when there is a data imbalance. The chapter looks into the issues and concerns involved in dealing with imbalanced data, explaining when it can occur and why it can sometimes be a challenge. More importantly, we will go over when not to worry about data imbalance at all or when it may not be worth worrying about. Furthermore, it introduces the imbalanced-learn library, offering invaluable insights and general guidelines to navigate the intricacies of dealing with imbalanced datasets effectively.

Chapter 2, Oversampling Methods, introduces the concept of oversampling, outlining when to employ it and when not to, and various techniques to augment imbalanced datasets. It guides you through the practical application of these techniques using the imbalanced-learn library and compares their performance across classical ML models. Practical advice on the effectiveness of these techniques in real-world scenarios concludes the chapter.

Chapter 3, Undersampling Methods, presents the concept of undersampling as an effective approach for data balancing when standard oversampling isn’t an option. This chapter covers strategies to effectively remove examples from imbalanced data, different ways of addressing noisy observations, and procedures for handling easily categorized instances. We will also discuss when to avoid undersampling of the majority class.

Chapter 4, Ensemble Methods, explores the application of ensemble techniques, including bagging and boosting, to enhance the performance of ML models. Moreover, it tackles the challenge of imbalanced datasets, where traditional ensemble methods may be ineffective, by combining the ensemble methods with the techniques introduced in previous chapters.

Chapter 5, Cost-Sensitive Learning, explores some alternatives to sampling techniques, including oversampling and undersampling. This chapter highlights the significance of cost-sensitive learning as an effective strategy to overcome the problem of imbalanced datasets. We also discuss threshold-tuning techniques, which can be very relevant in the context of data imbalance.

Chapter 6, Data Imbalance in Deep Learning, presents the core concepts of deep learning and walks through the issues posed by imbalanced datasets. You will investigate typical types of imbalanced data challenges in various deep learning applications and develop an understanding of their impact.

Chapter 7, Data-Level Deep Learning Methods, marks a transition from classical ML to deep learning, exploring the adaptation of familiar data-level sampling techniques and unveiling opportunities for enhancing these methods in the context of deep learning models. It dives into combining deep learning with oversampling and undersampling techniques, covering dynamic sampling and data augmentation for images and text. It emphasizes the fundamental differences between deep learning and classical ML, particularly the nature of the data they handle, whereas deep learning deals with unstructured data such as images, text, audio, and video. The chapter also explores techniques to address class imbalance in computer vision and their applicability to Natural Language Processing (NLP) problems.

Chapter 8, Algorithm-Level Deep Learning Techniques, expands on the concepts from Chapter 5, Cost-Sensitive Learning, and applies them to deep learning models. We adapt deep learning models through loss function modifications using the PyTorch deep learning framework, ultimately enhancing model performance and enabling more effective predictions.

Chapter 9, Hybrid Deep Learning Methods, explores innovative techniques that bridge the gap between data-level and algorithm-level methods from the previous two chapters. This chapter introduces the concept of graph ML and employs a real-world Facebook social network dataset to provide valuable insights and practical applications for addressing data imbalance in deep learning. We will also introduce the concept of hard mining loss and build upon it to explore a specialized technique called minority class incremental rectification, which combines hard mining with cross-entropy loss.

Chapter 10, Model Calibration, takes a different angle of addressing data imbalance. Rather than focusing on data preprocessing or model building, this chapter highlights the post-processing of prediction scores obtained from trained models. Such post-processing can be valuable for both real-time predictions and offline model evaluation. The chapter offers insights into measuring the calibration of a model and explains why this aspect can be indispensable when dealing with imbalanced data. This is particularly important since data balancing techniques can often lead to model miscalibration.

Appendix, Machine Learning Pipeline in Production, offers a foundational guide to constructing ML pipelines in production environments that encounter imbalanced data. This appendix provides a brief roadmap, going over the sequence and stage at which techniques for addressing data imbalance should be integrated.

📌 Usage of techniques – In production tips

Throughout this book, you will come across In production tip boxes like the following one, highlighting real-world applications of the techniques discussed:

🚀 Class reweighting in production at OpenAI

OpenAI was trying to solve the problem of bias in training data of the image generation model DALL-E 2 [1]. DALL-E 2 is trained on a massive dataset of images from the internet, which can contain biases. For example, the dataset may contain more images of men than women or more images of people from certain racial or ethnic groups than others.

These snippets offer insights into how well-known companies grappled with data imbalance and what strategies they adopted to effectively navigate these challenges. For instance, the tip on OpenAI’s approach with DALL-E 2 sheds light on the intricate balance between filtering training data and inadvertently amplifying biases. Such examples underscore the importance of being both strategic and cautious when dealing with imbalanced data. To delve deeper into the specifics and understand the nitty-gritty of these implementations, you are encouraged to follow the company blog or paper links provided. These insights can provide a clearer understanding of how to adapt and apply techniques in varied real-world scenarios effectively.

To get the most out of this book

This book assumes some foundational knowledge of ML, deep learning, and Python programming. Some basic working knowledge of scikit-learn and PyTorch can be helpful, although they can be learned on the go.

For the software requirements, you have two options to execute the code provided in this book. You can choose to either run the code within Google Colab online at https://colab.research.google.com/ or download the code to your local computer and execute it there. Google Colab provides a hassle-free option as it comes with all the necessary libraries pre-installed, so you don’t need to install anything on your local machine. All you need is a web browser to access Google Colab and a Google account. If you prefer to work locally, ensure that you have Python (3.6 or higher) installed, as well as the specified libraries such as PyTorch, torchvision, NumPy, and scikit-learn. A list of required libraries can be found in the GitHub repository of the book. These libraries are compatible with Windows, macOS, and Linux operating systems. A modern GPU can speed up the code execution for the deep learning chapters that appear later in the book; however, it’s not mandatory.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Regarding references, we use numbered references such as [6], where you can go to the References section at the end of that chapter and download the corresponding reference (paper/blog/article) either using the link (if mentioned) or searching for that reference on Google Scholar (https://scholar.google.com/).

At the conclusion of each chapter, you will find a set of questions designed to test your comprehension of the material covered. We strongly encourage you to engage with these questions to reinforce your learning. Solutions or answers to selected questions can be found in Assessments towards the end of this book.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Imbalanced-Data. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Since it’s possible to provide a base estimator to BaggingClassifier, let’s use DecisionTreeClassifier with the maximum depth of the trees being 6.

A block of code is set as follows:

from collections import Counter X, y = make_data(sep=2)print(y.value_counts()) sns.scatterplot(data=X, x=feature_1, y=feature_2)plt.title('Separation: {}'.format(separation))plt.show()

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: True Negative Rate (TNR): TNR measures the proportion of actual negatives that are correctly identified as such.

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you’ve read Machine Learning for Imbalanced Data, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781801070836

Submit your proof of purchase

That’s it! We’ll send your free PDF and other benefits to your email directly

Introduction to Data Imbalance in Machine Learning

Machine learning algorithms have helped solve real-world problems as diverse as disease prediction and online shopping. However, many problems we would like to address with machine learning involve imbalanced datasets. In this chapter, we will discuss and define imbalanced datasets, explaining how they differ from other types of datasets. The ubiquity of imbalanced data will be demonstrated with examples of common problems and scenarios. We will also go through the basics of machine learning and cover the essentials, such as loss functions, regularization, and feature engineering. We will also learn about common evaluation metrics, particularly those that can be very helpful for imbalanced datasets. We will then introduce the imbalanced-learn library.

In particular, we will learn about the following topics:

Introduction to imbalanced datasets

Machine learning 101

Types of datasets and splits

Common evaluation metrics

Challenges and considerations when dealing with imbalanced data

When can we have an imbalance in datasets?

Why can imbalanced data be a challenge?

When to not worry about data imbalance

Introduction to the imbalanced-learn library

General rules to follow

Technical requirements

In this chapter, we will utilize common libraries such as numpy and scikit-learn and introduce the imbalanced-learn library. The code and notebooks for this chapter are available on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Imbalanced-Data/tree/main/chapter01. You can fire up the GitHub notebook using Google Colab by clicking on the Open in Colab icon at the top of this chapter’s notebook or by launching it from https://colab.research.google.com using the GitHub URL of the notebook.

Introduction to imbalanced datasets

Machine learning algorithms learn from collections of examples that we call datasets. These datasets contain multiple data samples or points, which we may refer to as examples, samples, or instances interchangeably throughout this book.

A dataset can be said to have a balanced distribution when all the target classes have a similar number of examples, as shown in Figure 1.1:

Figure 1.1 – Balanced distribution with an almost equal number of examples for each class

Enjoying the preview?

Page 1 of 1

Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques

About this ebook

Abhishek Kumar

Read more from Abhishek Kumar

Rust Crash Course: Build High-Performance, Efficient and Productive Software with the Power of Next-Generation Programming Skills (English Edition)

The Ultimate Guide to Scholarships for Indian Citizens Planning to Study Abroad: Get Access to Scholarships for Colleges across USA, Australia, Europe and Canada

Career 3.0: Practical Career Planning Advice to Find your Dream Job in Today's Digital World

Robust Cloud Integration with Azure: Unleash the power of serverless integration with Azure

Serverless Integration Design Patterns with Azure: Build powerful cloud solutions that sustain next-generation products

Travel: The Ultimate Budget Travel Guide for Students to make Every Destination a Wild Lifetime Adventure for under $30 a day

Beginning PBR Texturing: Learn Physically Based Rendering with Allegorithmic’s Substance Painter

Immersive 3D Design Visualization: With Autodesk Maya and Unreal Engine 4

Related authors

Related to Machine Learning for Imbalanced Data

Related ebooks

Debugging Machine Learning Models with Python: Develop high-performance, low-bias, and explainable machine learning and deep learning models

Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system

Synthetic Data for Machine Learning: Revolutionize your approach to machine learning with this comprehensive conceptual guide

Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples

Azure Machine Learning Engineering: Deploy, fine-tune, and optimize ML models using Microsoft Azure

The Definitive Guide to Google Vertex AI: Accelerate your machine learning journey with Google Cloud Vertex AI and MLOps best practices

Machine Learning with R: Learn techniques for building and improving machine learning models, from data preparation to model tuning, evaluation, and working with big data

Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)

Ultimate Machine Learning with ML.NET: Build, Optimize, and Deploy Powerful Machine Learning Models for Data-Driven Insights with ML.NET, Azure Functions, and Web API (English Edition)

Machine Learning for Beginners - 2nd Edition: Build and deploy Machine Learning systems using Python (English Edition)

Deep Learning with PyTorch: A practical approach to building neural network models using PyTorch

R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5

Applied Machine Learning Explainability Techniques: Make ML models explainable and trustworthy for practical applications using LIME, SHAP, and more

Hands-On One-shot Learning with Python: Learn to implement fast and accurate deep learning models with fewer training samples using PyTorch

Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning

Machine Learning Engineering with Python: Manage the production life cycle of machine learning models using MLOps with practical examples

Hands-On Machine Learning with Azure: Build powerful models with cognitive machine learning and artificial intelligence

Deep Learning with TensorFlow: Explore neural networks with Python

Automated Machine Learning: Hyperparameter optimization, neural architecture search, and algorithm selection with cloud platforms

Hands-On Automated Machine Learning: A beginner's guide to building automated machine learning systems using AutoML and Python

A Handbook of Mathematical Models with Python: Elevate your machine learning projects with NetworkX, PuLP, and linalg

Machine Learning for Emotion Analysis in Python: Build AI-powered tools for analyzing emotion using natural language processing and machine learning

R Machine Learning Essentials

Privacy-Preserving Machine Learning: A use-case-driven approach to building and protecting ML pipelines from privacy and security threats

Python Deep Learning Projects: 9 projects demystifying neural network and deep learning models for building intelligent systems

Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)

Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)

Modern Computer Vision with PyTorch: A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

Practical Machine Learning and Image Processing: For Facial Recognition, Object Detection, and Pattern Recognition Using Python

Deep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide

Computers For You

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters

The Invisible Rainbow: A History of Electricity and Life

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls

The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game

Uncanny Valley: A Memoir

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Elon Musk

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61

Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics

The Professional Voiceover Handbook: Voiceover training, #1

Deep Search: How to Explore the Internet More Effectively

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology

How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide

Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

The Hacker Crackdown: Law and Disorder on the Electronic Frontier

Make Your PC Stable and Fast: What Microsoft Forgot to Tell You

Dark Aeon: Transhumanism and the War Against Humanity

How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally

Tor and the Dark Art of Anonymity

Master Builder Roblox: The Essential Guide

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning for Imbalanced Data

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters