Python Tools for Data Scientists Pocket Primer: A Quick Guide to Essential Python Libraries for Data Science

Ebook764 pages4 hours

Python Tools for Data Scientists Pocket Primer: A Quick Guide to Essential Python Libraries for Data Science

By Mercury Learning and Information and Oswald Campesato

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book, part of the best-selling Pocket Primer series, offers a comprehensive introduction to essential Python tools for data scientists. It begins with an overview of Python basics, followed by in-depth coverage of NumPy and Pandas, focusing on their features and applications. The text also addresses the critical tasks of writing regular expressions and performing data cleaning.
Further sections delve into data visualization techniques and the use of Sklearn and SciPy, providing practical knowledge and skills for handling complex data analysis tasks. This structured approach ensures that readers gain a complete understanding of the tools and techniques necessary for effective data science.
Designed to be accessible yet thorough, this book includes numerous code samples to reinforce learning. Companion files with source code are available for download, making it an invaluable resource for anyone looking to master Python for data science and enhance their data analysis capabilities.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateAug 12, 2024

ISBN9781836643487

Author

Mercury Learning and Information

Related to Python Tools for Data Scientists Pocket Primer

Related ebooks

Skip carousel

Python 3 Data Visualization Using ChatGPT / GPT-4: Master Python Visualization Techniques with AI Integration
Ebook
Python 3 Data Visualization Using ChatGPT / GPT-4: Master Python Visualization Techniques with AI Integration
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Google Gemini for Python: Coding with Bard: Mastering Python with Google's AI Tools
Ebook
Google Gemini for Python: Coding with Bard: Mastering Python with Google's AI Tools
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Science Fundamentals Pocket Primer: An Essential Guide to Data Science Concepts and Techniques
Ebook
Data Science Fundamentals Pocket Primer: An Essential Guide to Data Science Concepts and Techniques
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Python 3 Data Visualization Using Google Gemini: Unlock the Power of Python and Google Gemini for Stunning Data Visualizations
Ebook
Python 3 Data Visualization Using Google Gemini: Unlock the Power of Python and Google Gemini for Stunning Data Visualizations
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Python for Programmers: A Comprehensive Guide for Intermediate to Advanced Python Programmers and Developers
Ebook
Python for Programmers: A Comprehensive Guide for Intermediate to Advanced Python Programmers and Developers
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Literacy With Python: A Comprehensive Guide to Understanding and Analyzing Data with Python
Ebook
Data Literacy With Python: A Comprehensive Guide to Understanding and Analyzing Data with Python
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Python 3 for Machine Learning: Harness the Power of Python for Advanced Machine Learning Projects
Ebook
Python 3 for Machine Learning: Harness the Power of Python for Advanced Machine Learning Projects
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Python Data Structures Pocket Primer: A concise guide to Python data structures to enhance your skills
Ebook
Python Data Structures Pocket Primer: A concise guide to Python data structures to enhance your skills
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Python 3 and Machine Learning Using ChatGPT / GPT-4: Harness the Power of Python, Machine Learning, and Generative AI
Ebook
Python 3 and Machine Learning Using ChatGPT / GPT-4: Harness the Power of Python, Machine Learning, and Generative AI
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Wrangling Using Pandas, SQL, and Java: A Comprehensive Guide to Data Cleaning and Transformation
Ebook
Data Wrangling Using Pandas, SQL, and Java: A Comprehensive Guide to Data Cleaning and Transformation
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Pandas Basics: Mastering Data Analysis with Pandas
Ebook
Pandas Basics: Mastering Data Analysis with Pandas
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Natural Language Processing using R Pocket Primer: Learn Essential NLP Techniques and Tools for Developers
Ebook
Natural Language Processing using R Pocket Primer: Learn Essential NLP Techniques and Tools for Developers
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Structures and Program Design Using Python: A Self-Teaching Introduction to Data Structures and Python
Ebook
Data Structures and Program Design Using Python: A Self-Teaching Introduction to Data Structures and Python
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence, Machine Learning, and Deep Learning: A Practical Guide to Advanced AI Techniques
Ebook
Artificial Intelligence, Machine Learning, and Deep Learning: A Practical Guide to Advanced AI Techniques
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Angular and Machine Learning Pocket Primer: A Comprehensive Guide to Angular and Integrating Machine Learning
Ebook
Angular and Machine Learning Pocket Primer: A Comprehensive Guide to Angular and Integrating Machine Learning
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Bash for Data Scientists: A Comprehensive Guide to Shell Scripting for Data Science Tasks
Ebook
Bash for Data Scientists: A Comprehensive Guide to Shell Scripting for Data Science Tasks
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Computational Physics: A Comprehensive Guide to Numerical Methods in Physics
Ebook
Computational Physics: A Comprehensive Guide to Numerical Methods in Physics
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Computer Concepts and Management Information Systems: A Comprehensive Guide to Modern Computing and Information Management
Ebook
Computer Concepts and Management Information Systems: A Comprehensive Guide to Modern Computing and Information Management
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Java for Developers Pocket Primer: A Concise Guide to Mastering Java Programming
Ebook
Java for Developers Pocket Primer: A Concise Guide to Mastering Java Programming
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Discrete Mathematics With Cryptographic Applications: A Self-Teaching Guide to Unlocking the Power of Advanced Concepts and Computational Techniques
Ebook
Discrete Mathematics With Cryptographic Applications: A Self-Teaching Guide to Unlocking the Power of Advanced Concepts and Computational Techniques
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Digital Signal Processing: An Introduction to Mastering Advanced Techniques for Transforming and Analyzing Signals
Ebook
Digital Signal Processing: An Introduction to Mastering Advanced Techniques for Transforming and Analyzing Signals
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Angular and Deep Learning Pocket Primer: A Comprehensive Guide to AI and Expert Systems for Professionals
Ebook
Angular and Deep Learning Pocket Primer: A Comprehensive Guide to AI and Expert Systems for Professionals
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Programming Fundamentals Using JAVA: A Game Application Approach: Unlock Your Potential with Comprehensive Java Training
Ebook
Programming Fundamentals Using JAVA: A Game Application Approach: Unlock Your Potential with Comprehensive Java Training
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Bash Command Line and Shell Scripts Pocket Primer: Mastering Bash Commands and Scripting Techniques
Ebook
Bash Command Line and Shell Scripts Pocket Primer: Mastering Bash Commands and Scripting Techniques
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Structures and Program Design Using Java: A Self-Teaching Introduction to Data Structures and Java
Ebook
Data Structures and Program Design Using Java: A Self-Teaching Introduction to Data Structures and Java
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Transformer, BERT, and GPT: Unlock the Power of Transformers, BERT, GPT-3, and GPT-4 in Natural Language Processing
Ebook
Transformer, BERT, and GPT: Unlock the Power of Transformers, BERT, GPT-3, and GPT-4 in Natural Language Processing
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Text Analytics for Business Decisions: Mastering Techniques for Insightful Data Interpretation through a Case Study Approach
Ebook
Text Analytics for Business Decisions: Mastering Techniques for Insightful Data Interpretation through a Case Study Approach
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
WORKING WITH grep, sed, AND awk Pocket Primer: A Quick Guide to Mastering Powerful Command Line Tools
Ebook
WORKING WITH grep, sed, AND awk Pocket Primer: A Quick Guide to Mastering Powerful Command Line Tools
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Analysis for Business Decisions: A Laboratory Manual
Ebook
Data Analysis for Business Decisions: A Laboratory Manual
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings
Data Science for IoT Engineers: Master Data Science Techniques and Machine Learning Applications for Innovative IoT Solutions
Ebook
Data Science for IoT Engineers: Master Data Science Techniques and Machine Learning Applications for Innovative IoT Solutions
byMercury Learning and Information
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
Ebook
So You Want to Start a Podcast: Finding Your Voice, Telling Your Story, and Building a Community That Will Listen
byKristen Meinzer
Rating: 3 out of 5 stars
3/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 5 out of 5 stars
5/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
HTML in 30 Pages
Ebook
HTML in 30 Pages
byU.Q. Magnusson
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
Ebook
C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast!
byTim Warren
Rating: 5 out of 5 stars
5/5
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
Ebook
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
byChris Y. Reynolds
Rating: 0 out of 5 stars
0 ratings
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Linux Command Line and Shell Scripting Bible
Ebook
Linux Command Line and Shell Scripting Bible
byRichard Blum
Rating: 3 out of 5 stars
3/5
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2
Ebook
C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2
byPatrick Felicia
Rating: 0 out of 5 stars
0 ratings
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Lua Game Development Cookbook
Ebook
Lua Game Development Cookbook
byMário Kašuba
Rating: 0 out of 5 stars
0 ratings
Python Data Structures and Algorithms
Ebook
Python Data Structures and Algorithms
byBenjamin Baka
Rating: 5 out of 5 stars
5/5
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
Ebook
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
byFlynn Fisher
Rating: 4 out of 5 stars
4/5
Narrative Design for Indies: Getting Started
Ebook
Narrative Design for Indies: Getting Started
byEdwin McRae
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl: Getting visibility into .NET code whether it runs on a developer machine, on a windows server on-premise or as a serverless function in the cloud is the day2day job of Georg Schausberger (@BombadilThomas) and Bernhard Ruebl, part of the Dynatrace .NET...
Podcast episode
Bringing Observability to .NET with Georg Schausberger and Bernhard Ruebl: Getting visibility into .NET code whether it runs on a developer machine, on a windows server on-premise or as a serverless function in the cloud is the day2day job of Georg Schausberger (@BombadilThomas) and Bernhard Ruebl, part of the Dynatrace .NET...
byPurePerformance
0 ratings
0% found this document useful
X-Ray Vision For Your Flink Stream Processing With Datorios: Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. To address this shortcoming Datorios created an observability platform for Flink that brings visibility to the internals of this popular stream processing system. In this episode Ronen Korman and Stav Elkayam discuss how the increased understanding provided by purpose built observability improves the usefulness of Flink.
Podcast episode
X-Ray Vision For Your Flink Stream Processing With Datorios: Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. To address this shortcoming Datorios created an observability platform for Flink that brings visibility to the internals of this popular stream processing system. In this episode Ronen Korman and Stav Elkayam discuss how the increased understanding provided by purpose built observability improves the usefulness of Flink.
byData Engineering Podcast
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Dev Tools Tabs Explained — Plus Tips & Tricks: In this episode of Syntax, Scott and Wes talk about dev tools tabs, what each tab does and how you can use them. Vonage - Sponsor Vonage is a Cloud Communications platform that allows developers to integrate voice, video and messaging into their...
Podcast episode
Dev Tools Tabs Explained — Plus Tips & Tricks: In this episode of Syntax, Scott and Wes talk about dev tools tabs, what each tab does and how you can use them. Vonage - Sponsor Vonage is a Cloud Communications platform that allows developers to integrate voice, video and messaging into their...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
#213: Data Contracts: What They Are, Their Role, and Their Evolution with Shane Murray: When it comes to data, there are data consumers (analysts, builders and users of data products, and various other business stakeholders) and data producers (software engineers and various adjacent roles and systems). It's all too common for data...
Podcast episode
#213: Data Contracts: What They Are, Their Role, and Their Evolution with Shane Murray: When it comes to data, there are data consumers (analysts, builders and users of data products, and various other business stakeholders) and data producers (software engineers and various adjacent roles and systems). It's all too common for data...
byThe Analytics Power Hour
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
Podcast episode
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
byScreaming in the Cloud
0 ratings
0% found this document useful
Release Management For Data Platform Services And Logic: Building a data platform is a substrantial engineering endeavor. Once it is running, the next challenge is figuring out how to address release management for all of the different component parts. The services and systems need to be kept up to date, but so does the code that controls their behavior. In this episode your host Tobias Macey reflects on his current challenges in this area and some of the factors that contribute to the complexity of the problem.
Podcast episode
Release Management For Data Platform Services And Logic: Building a data platform is a substrantial engineering endeavor. Once it is running, the next challenge is figuring out how to address release management for all of the different component parts. The services and systems need to be kept up to date, but so does the code that controls their behavior. In this episode your host Tobias Macey reflects on his current challenges in this area and some of the factors that contribute to the complexity of the problem.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
Podcast episode
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
byData Engineering Podcast
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
Podcast episode
Open Standards Make MLOps Easier and Silos Harder // Cody Peterson // #234
byMLOps.community
0 ratings
0% found this document useful
Building Your Bill of Materials (BOM) to Accommodate Crossfunctional Needs: For many new medical device professionals a bill of materials (BOM) may feel like a big black box. Who owns it? How does it function within a QMS? How is it used differently in design versus in manufacturing?In this episode of the Global Medi...
Podcast episode
Building Your Bill of Materials (BOM) to Accommodate Crossfunctional Needs: For many new medical device professionals a bill of materials (BOM) may feel like a big black box. Who owns it? How does it function within a QMS? How is it used differently in design versus in manufacturing?In this episode of the Global Medi...
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
Mastering Algorithms and Data Structures - Marcello La Rocca
Podcast episode
Mastering Algorithms and Data Structures - Marcello La Rocca
byDataTalks.Club
0 ratings
0% found this document useful
Selenium Insight from All-Star SeleniumConf Speakers!: In this episode, you'll hear from five SeleniumConf Chicago 2023 speakers and/or project core committers about their upcoming talks, the reasons for their participation, and the benefits attendees can expect to gain from the conference. Additionally,...
Podcast episode
Selenium Insight from All-Star SeleniumConf Speakers!: In this episode, you'll hear from five SeleniumConf Chicago 2023 speakers and/or project core committers about their upcoming talks, the reasons for their participation, and the benefits attendees can expect to gain from the conference. Additionally,...
byTestGuild Automation Podcast
0 ratings
0% found this document useful
Episode 258: Data-driven Design Systems with Berk Çebi: How can you measure the success of your design system? Our guest today is Berk Çebi, co-founder and Chief Product Officer at Zeplin. You’ll learn what a design system is, how their customer interviews led to a new product, when you should start investing in design documentation, and more.
Podcast episode
Episode 258: Data-driven Design Systems with Berk Çebi: How can you measure the success of your design system? Our guest today is Berk Çebi, co-founder and Chief Product Officer at Zeplin. You’ll learn what a design system is, how their customer interviews led to a new product, when you should start investing in design documentation, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
2020-027-RIPPLE20 Report, supply chain security, responsible disclosure, software development, and vendor care.: Whitepaper: [blog] Build your own custom TCP/IP stack: Another custom TCP/IP stack: RIPPLE 20 Whitepaper: Agenda: Part 1: Background on the report Why is it called RIPPLE20? What’s the RIPPLE about? Communications with Treck (and...
Podcast episode
2020-027-RIPPLE20 Report, supply chain security, responsible disclosure, software development, and vendor care.: Whitepaper: [blog] Build your own custom TCP/IP stack: Another custom TCP/IP stack: RIPPLE 20 Whitepaper: Agenda: Part 1: Background on the report Why is it called RIPPLE20? What’s the RIPPLE about? Communications with Treck (and...
byBrakeSec Education Podcast
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
Podcast episode
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
2020-028-Shlomi Oberman, RIPPLE20, supply chain security discussion, software bill of materials: Whitepaper: [blog] Build your own custom TCP/IP stack: Another custom TCP/IP stack: RIPPLE 20 Whitepaper: Agenda: Part 1: Background on the report Why is it called RIPPLE20? What’s the RIPPLE about? Communications...
Podcast episode
2020-028-Shlomi Oberman, RIPPLE20, supply chain security discussion, software bill of materials: Whitepaper: [blog] Build your own custom TCP/IP stack: Another custom TCP/IP stack: RIPPLE 20 Whitepaper: Agenda: Part 1: Background on the report Why is it called RIPPLE20? What’s the RIPPLE about? Communications...
byBrakeSec Education Podcast
0 ratings
0% found this document useful
What Evolutionary Biology Can Tell Us About Software Development - Part 1: Etienne De Bruin, Aaron Longwell, Scott Graves and Judah McAuley discuss what engineers can learn from evolutionary biology when it comes to the software development process
Podcast episode
What Evolutionary Biology Can Tell Us About Software Development - Part 1: Etienne De Bruin, Aaron Longwell, Scott Graves and Judah McAuley discuss what engineers can learn from evolutionary biology when it comes to the software development process
byThe CTO Podcast
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
Podcast episode
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 77 - Open Source
Podcast episode
Episode 77 - Open Source
byThe Structural Engineering Podcast
0 ratings
0% found this document useful
New Tools for Cloud Native Developers
Podcast episode
New Tools for Cloud Native Developers
byThe Cloudcast
0 ratings
0% found this document useful
Zenlytic Is Building You A Better Coworker With AI Agents: The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Unfortunately this often turns into an exercise in frustration for everyone involved due to complex workflows and hard-to-understand dashboards. The team at Zenlytic have leaned on the promise of large language models to build an AI agent that lets you converse with your data. In this episode they share their journey through the fast-moving landscape of generative AI and unpack the difference between an AI chatbot and an AI agent.
Podcast episode
Zenlytic Is Building You A Better Coworker With AI Agents: The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Unfortunately this often turns into an exercise in frustration for everyone involved due to complex workflows and hard-to-understand dashboards. The team at Zenlytic have leaned on the promise of large language models to build an AI agent that lets you converse with your data. In this episode they share their journey through the fast-moving landscape of generative AI and unpack the difference between an AI chatbot and an AI agent.
byData Engineering Podcast
0 ratings
0% found this document useful
Stitching Together Enterprise Analytics With Microsoft Fabric: Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fabric platform, based on their OneLake architecture. In this episode Dipti Borkar shares her experiences working on the product team at Fabric and explains the various use cases for the Fabric service.
Podcast episode
Stitching Together Enterprise Analytics With Microsoft Fabric: Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fabric platform, based on their OneLake architecture. In this episode Dipti Borkar shares her experiences working on the product team at Fabric and explains the various use cases for the Fabric service.
byData Engineering Podcast
0 ratings
0% found this document useful
Privacy is a moving target. Here’s how engineering teams can stay on track.: On this sponsored episode of the podcast, we talk with Rob Pickard and Matt Cooper of Vanta, who get that question every day. Their company makes security monitoring software that helps companies get into compliance quickly. We spoke about the shifting sands of privacy rules and regulations, tracking data flows through systems and across corporate borders, and how security automation can put up guardrails instead of gates.
Podcast episode
Privacy is a moving target. Here’s how engineering teams can stay on track.: On this sponsored episode of the podcast, we talk with Rob Pickard and Matt Cooper of Vanta, who get that question every day. Their company makes security monitoring software that helps companies get into compliance quickly. We spoke about the shifting sands of privacy rules and regulations, tracking data flows through systems and across corporate borders, and how security automation can put up guardrails instead of gates.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Data Brew Season 2 Episode 9: Data Driven Software
Podcast episode
Data Brew Season 2 Episode 9: Data Driven Software
byData Brew by Databricks
0 ratings
0% found this document useful
Using ChatGPT to Search Enterprise Data with Pamela Fox: In this episode, Thomas Betts talks with Pamela F…
Podcast episode
Using ChatGPT to Search Enterprise Data with Pamela Fox: In this episode, Thomas Betts talks with Pamela F…
byThe InfoQ Podcast
0 ratings
0% found this document useful

Skip carousel

Awesome Apps For Less
MacFormat
Article
Awesome Apps For Less
Aug 23, 2022
3 min read
Awesome Apps For Less
MacLife
Article
Awesome Apps For Less
Oct 11, 2022
3 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
Software You Should Never Install
Computeractive
Article
Software You Should Never Install
Jan 4, 2021
It’s always tempting to pack your PC with software, whether it’s tools you think you need, programs that claim to offer brilliant features, or software you feel obliged to install because they came bundled with something you bought. However, overload
15 min read
“It’s Time To Put On Your Seatbelt Because It’s About To Get A Little Rough And Tumble”
PC Pro Magazine
Article
“It’s Time To Put On Your Seatbelt Because It’s About To Get A Little Rough And Tumble”
Sep 5, 2024
10 min read
Build A Club On The Next-gen Web
Linux Format
Article
Build A Club On The Next-gen Web
Aug 23, 2022
OUR EXPERT Onthe current web there are a few, enormous companies that dominate your activities and collect your data. For many of us this is a worrying development that we need to do something about. One technical solution is to develop a new versio
9 min read
Trend Micro Maximum Security
APC
Article
Trend Micro Maximum Security
Apr 29, 2024
Direct price: 1 device, 1yr, $69.95 | trendmicro.com Trend Micro packs a lot of features into Maximum Security. What’s more, it looks good doing it. Unusually for antivirus software, the desktop client’s neutral interface is customisable with appeali
2 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Opinion
Linux Format
Article
Opinion
Jul 23, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “LibreOffice 24.8 will be announced in the second half of August, and the developers are working hard to optimise the new features that will be included. It will be the
3 min read
Using EBPF To Trace Disk Transfer Actions
Linux Format
Article
Using EBPF To Trace Disk Transfer Actions
Nov 15, 2022
Credit: https://ebpf.io Mihalis Tsoukalos is the author of Go Systems Programming and Mastering Go, 3rd edition. You can reach him at www.mtsoukalos.eu and @mactsouk. Seekwatcher is a tool that can generate graphs from blktrace data in order to hel
8 min read
Opinion
Linux Format
Article
Opinion
Aug 20, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “Think about the personal and confidential information in your office suite documents; it’s essential your office suite respects user privacy. LibreOffice does not ask y
3 min read
The Best Of The Freebies
APC
Article
The Best Of The Freebies
Jun 14, 2021
15 min read
Best Free Downloads For 2023
Computeractive
Article
Best Free Downloads For 2023
Dec 21, 2022
• Remove hidden PC junk using Microsoft’s new clean-up tool • Back up files and drives without paying for premium options • Try Thunderbird’s new Android app before anyone else • Legally download thousands of classic books for free • Download high-qu
16 min read
The Machine Learning Revolution
Maximum PC
Article
The Machine Learning Revolution
Aug 17, 2021
8 min read
“Everyone Knows That What Drives The Google Machineis Data: Your Data”
PC Pro Magazine
Article
“Everyone Knows That What Drives The Google Machineis Data: Your Data”
Mar 10, 2022
7 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
The Best Of The Freebies
Maximum PC
Article
The Best Of The Freebies
May 25, 2021
15 min read
Named & Shamed
Computeractive
Article
Named & Shamed
Mar 16, 2022
MyCleaner’s ‘Russian Federation’ terms In my considerable experience (the beard’s got a bit greyer since the mugshot above was taken), most PC-cleaning tools leave behind more dirt than they remove. You can add MyCleaner to that list. It pops up in t
2 min read
Manage Your Apps!
Linux Format
Article
Manage Your Apps!
Nov 14, 2023
17 min read
Security Suite GROUP TEST
APC
Article
Security Suite GROUP TEST
Oct 4, 2021
4 min read
Exclusive Downloads
APC
Article
Exclusive Downloads
Sep 11, 2023
FREE FULL SOFTWARE FOR A C READERS Simple optical text recognition (OCR) for Windows. The problem: PDF files and images from a scanned paper document contain text that cannot be copied or edited. However, manual typing is not possible due to the siz
2 min read
Newsdesk
Linux Format
Article
Newsdesk
Mar 5, 2024
11 min read
PC Matic For Mac: Don’t Bother
MacWorld
Article
PC Matic For Mac: Don’t Bother
Feb 13, 2024
3 min read
25 Apps For Geeks
PC Pro Magazine
Article
25 Apps For Geeks
Aug 7, 2022
12 min read
The Machine Learning Revolution
APC
Article
The Machine Learning Revolution
Sep 6, 2021
8 min read
HotPicks
Linux Format
Article
HotPicks
Nov 19, 2019
12 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Trend Micro Antivirus: Does Its Job Well, But Needs To Update Some Features
MacWorld
Article
Trend Micro Antivirus: Does Its Job Well, But Needs To Update Some Features
Nov 14, 2023
3 min read
10 Programs You Won’t Regret Buying
Tech Advisor
Article
10 Programs You Won’t Regret Buying
Nov 4, 2020
6 min read
The Best Free Software Of 2020
Maximum PC
Article
The Best Free Software Of 2020
Apr 28, 2020
16 min read

Related categories

Skip carousel

Reviews for Python Tools for Data Scientists Pocket Primer

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Python Tools for Data Scientists Pocket Primer - Mercury Learning and Information

PYTHON TOOLS

FOR

DATA SCIENTISTS

Pocket Primer

LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY

By purchasing or using this book and companion files (the Work), you agree that this license grants permission to use the contents contained herein, including the disc, but does not give you the right of ownership to any of the textual content in the book / disc or ownership to any of the information or products contained in it. This license does not permit uploading of the Work onto the Internet or on a network (of any kind) without the written consent of the Publisher. Duplication or dissemination of any text, code, simulations, images, etc. contained herein is limited to and subject to licensing terms for the respective products, and permission must be obtained from the Publisher or the owner of the content, etc., in order to reproduce or network any portion of the textual material (in any media) that is contained in the Work.

MERCURY LEARNING AND INFORMATION (MLI or the Publisher) and anyone involved in the creation, writing, or production of the companion disc, accompanying algorithms, code, or computer programs (the software), and any accompanying Web site or software of the Work, cannot and do not warrant the performance or results that might be obtained by using the contents of the Work. The author, developers, and the Publisher have used their best efforts to ensure the accuracy and functionality of the textual material and/or programs contained in this package; we, however, make no warranty of any kind, express or implied, regarding the performance of these contents or programs. The Work is sold as is without warranty (except for defective materials used in manufacturing the book or due to faulty workmanship).

The author, developers, and the publisher of any accompanying content, and anyone involved in the composition, production, and manufacturing of this work will not be liable for damages of any kind arising out of the use of (or the inability to use) the algorithms, source code, computer programs, or textual material contained in this publication. This includes, but is not limited to, loss of revenue or profit, or other incidental, physical, or consequential damages arising out of the use of this Work.

The sole remedy in the event of a claim of any kind is expressly limited to replacement of the book and/or disc, and only at the discretion of the Publisher. The use of implied warranty and certain exclusions vary from state to state, and might not apply to the purchaser of this product.

Companion files for this title are available by writing to the publisher at info@merclearning.com.

PYTHON TOOLS

FOR

DATA SCIENTISTS

Pocket Primer

Oswald Campesato

This publication, portions of it, or any accompanying software may not be reproduced in any way, stored in a retrieval system of any type, or transmitted by any means, media, electronic display or mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning, without prior permission in writing from the publisher.

Publisher: David Pallai

MERCURY LEARNING AND INFORMATION

22841 Quicksilver Drive

Dulles, VA 20166

info@merclearning.com

www.merclearning.com

800-232-0223

O. Campesato. Python Tools for Data Scientists Pocket Primer.

ISBN: 978-1-68392-823-2

The publisher recognizes and respects all marks used by companies, manufacturers, and developers as a means to distinguish their products. All brand names and product names mentioned in this book are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of service marks or trademarks, etc. is not an attempt to infringe on the property of others.

Library of Congress Control Number: 2022943452

222324321 This book is printed on acid-free paper in the United States of America.

Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).

All of our titles are available in digital format at academiccourseware.com and other digital vendors. Companion files (figures and code listings) for this title are available by contacting info@merclearning.com. The sole obligation of MERCURY LEARNING AND INFORMATION to the purchaser is to replace the disc, based on defective materials or faulty workmanship, but not based on the operation or functionality of the product.

I’d like to dedicate this book to my parents –

may this bring joy and happiness into their lives.

Preface

Chapter 1: Introduction to Python

Tools for Python

easy_install and pip

virtualenv

Python Installation

Setting the PATH Environment Variable (Windows Only)

Launching Python on Your Machine

The Python Interactive Interpreter

Python Identifiers

Lines, Indentations, and Multi-Lines

Quotation and Comments in Python

Saving Your Code in a Module

Some Standard Modules in Python

The help() and dir() Functions

Compile Time and Runtime Code Checking

Simple Data Types in Python

Working with Numbers

Working with Other Bases

The chr() Function

The round() Function in Python

Formatting Numbers in Python

Unicode and UTF-8

Working with Unicode

Listing 1.1: Unicode1.py

Working with Strings

Comparing Strings

Listing 1.2: Compare.py

Formatting Strings in Python

Uninitialized Variables and the Value None in Python

Slicing and Splicing Strings

Testing for Digits and Alphabetic Characters

Listing 1.3: CharTypes.py

Search and Replace a String in Other Strings

Listing 1.4: FindPos1.py

Listing 1.5: Replace1.py

Remove Leading and Trailing Characters

Listing 1.6: Remove1.py

Printing Text without NewLine Characters

Text Alignment

Working with Dates

Listing 1.7: Datetime2.py

Listing 1.8: datetime2.out

Converting Strings to Dates

Listing 1.9: String2Date.py

Exception Handling in Python

Listing 1.10: Exception1.py

Handling User Input

Listing 1.11: UserInput1.py

Listing 1.12: UserInput2.py

Listing 1.13: UserInput3.py

Command-Line Arguments

Listing 1.14: Hello.py

Summary

Chapter 2: Introduction to NumPy

What is NumPy?

Useful NumPy Features

What are NumPy Arrays?

Listing 2.1: nparray1.py

Working with Loops

Listing 2.2: loop1.py

Appending Elements to Arrays (1)

Listing 2.3: append1.py

Appending Elements to Arrays (2)

Listing 2.4: append2.py

Multiplying Lists and Arrays

Listing 2.5: multiply1.py

Doubling the Elements in a List

Listing 2.6: double_list1.py

Lists and Exponents

Listing 2.7: exponent_list1.py

Arrays and Exponents

Listing 2.8: exponent_array1.py

Math Operations and Arrays

Listing 2.9: mathops_array1.py

Working with −1 Sub-ranges With Vectors

Listing 2.10: npsubarray2.py

Working with −1 Sub-ranges with Arrays

Listing 2.11: np2darray2.py

Other Useful NumPy Methods

Arrays and Vector Operations

Listing 2.12: array_vector.py

NumPy and Dot Products (1)

Listing 2.13: dotproduct1.py

NumPy and Dot Products (2)

Listing 2.14: dotproduct2.py

NumPy and the Length of Vectors

Listing 2.15: array_norm.py

NumPy and Other Operations

Listing 2.16: otherops.py

NumPy and the reshape() Method

Listing 2.17: numpy_reshape.py

Calculating the Mean and Standard Deviation

Listing 2.18: sample_mean_std.py

Code Sample with Mean and Standard Deviation

Listing 2.19: stat_values.py

Trimmed Mean and Weighted Mean

Working with Lines in the Plane (Optional)

Plotting Randomized Points with NumPy and Matplotlib

Listing 2.20: np_plot.py

Plotting a Quadratic with NumPy and Matplotlib

Listing 2.21: np_plot_quadratic.py

What is Linear Regression?

What is Multivariate Analysis?

What about Non-Linear Datasets?

The MSE (Mean Squared Error) Formula

Other Error Types

Non-Linear Least Squares

Calculating the MSE Manually

Find the Best-Fitting Line in NumPy

Listing 2.22: find_best_fit.py

Calculating MSE by Successive Approximation (1)

Listing 2.23: plain_linreg1.py

Calculating MSE by Successive Approximation (2)

Listing 2.24: plain_linreg2.py

Google Colaboratory

Uploading CSV Files in Google Colaboratory

Listing 2.25: upload_csv_file.ipynb

Summary

Chapter 3: Introduction to Pandas

What is Pandas?

Pandas Options and Settings

Pandas Data Frames

Data Frames and Data Cleaning Tasks

Alternatives to Pandas

A Pandas Data Frame with a NumPy Example

Listing 3.1: pandas_df.py

Describing a Pandas Data Frame

Listing 3.2: pandas_df_describe.py

Pandas Boolean Data Frames

Listing 3.3: pandas_boolean_df.py

Transposing a Pandas Data Frame

Pandas Data Frames and Random Numbers

Listing 3.4: pandas_random_df.py

Listing 3.5: pandas_combine_df.py

Reading CSV Files in Pandas

Listing 3.6: sometext.txt

Listing 3.7: read_csv_file.py

The loc() and iloc() Methods in Pandas

Converting Categorical Data to Numeric Data

Listing 3.8: cat2numeric.py

Listing 3.9: shirts.csv

Listing 3.10: shirts.py

Matching and Splitting Strings in Pandas

Listing 3.11: shirts_str.py

Converting Strings to Dates in Pandas

Listing 3.12: string2date.py

Merging and Splitting Columns in Pandas

Listing 3.13: employees.csv

Listing 3.14: emp_merge_split.py

Combining Pandas Data Frames

Listing 3.15: concat_frames.py

Data Manipulation with Pandas Data Frames (1)

Listing 3.16: pandas_quarterly_df1.py

Data Manipulation with Pandas Data Frames (2)

Listing 3.17: pandas_quarterly_df2.py

Data Manipulation with Pandas Data Frames (3)

Listing 3.18: pandas_quarterly_df3.py

Pandas Data Frames and CSV Files

Listing 3.19: weather_data.py

Listing 3.20: people.csv

Listing 3.21: people_pandas.py

Managing Columns in Data Frames

Switching Columns

Appending Columns

Deleting Columns

Inserting Columns

Scaling Numeric Columns

Listing 3.22: numbers.csv

Listing 3.23: scale_columns.py

Managing Rows in Pandas

Selecting a Range of Rows in Pandas

Listing 3.24: duplicates.csv

Listing 3.25: row_range.py

Finding Duplicate Rows in Pandas

Listing 3.26: duplicates.py

Listing 3.27: drop_duplicates.py

Inserting New Rows in Pandas

Listing 3.28: emp_ages.csv

Listing 3.29: insert_row.py

Handling Missing Data in Pandas

Listing 3.30: employees2.csv

Listing 3.31: missing_values.py

Multiple Types of Missing Values

Listing 3.32: employees3.csv

Listing 3.33: missing_multiple_types.py

Test for Numeric Values in a Column

Listing 3.34: test_for_numeric.py

Replacing NaN Values in Pandas

Listing 3.35: missing_fill_drop.py

Sorting Data Frames in Pandas

Listing 3.36: sort_df.py

Working with groupby() in Pandas

Listing 3.37: groupby1.py

Working with apply() and mapapply() in Pandas

Listing 3.38: apply1.py

Listing 3.39: apply2.py

Listing 3.40: mapapply1.py

Listing 3.41: mapapply2.py

Handling Outliers in Pandas

Listing 3.42: outliers_zscores.py

Pandas Data Frames and Scatterplots

Listing 3.43: pandas_scatter_df.py

Pandas Data Frames and Simple Statistics

Listing 3.44: housing.csv

Listing 3.45: housing_stats.py

Aggregate Operations in Pandas Data Frames

Listing 3.46: aggregate1.py

Aggregate Operations with the titanic.csv Dataset

Listing 3.47: aggregate2.py

Save Data Frames as CSV Files and Zip Files

Listing 3.48: save2csv.py

Pandas Data Frames and Excel Spreadsheets

Listing 3.49: write_people_xlsx.py

Listing 3.50: read_people_xslx.py

Working with JSON-based Data

Python Dictionary and JSON

Listing 3.51: dict2json.py

Python, Pandas, and JSON

Listing 3.52: pd_python_json.py

Useful One-line Commands in Pandas

What is Method Chaining?

Pandas and Method Chaining

Pandas Profiling

Listing 3.53: titanic.csv

Listing 3.54: profile_titanic.py

Summary

Chapter 4: Working with Sklearn and Scipy

What is Sklearn?

Sklearn Features

The Digits Dataset in Sklearn

Listing 4.1: load_digits1.py

Listing 4.2: load_digits2.py

Listing 4.3: sklearn_digits.py

The train_test_split() Class in Sklearn

Selecting Columns for X and y

What is Feature Engineering?

The Iris Dataset in Sklearn (1)

Listing 4.4: sklearn_iris1.py

Sklearn, Pandas, and the Iris Dataset

Listing 4.5: pandas_iris.py

The Iris Dataset in Sklearn (2)

Listing 4.6: sklearn_iris2.py

The Faces Dataset in Sklearn (Optional)

Listing 4.7: sklearn_faces.py

What is SciPy?

Installing SciPy

Permutations and Combinations in SciPy

Listing 4.8: scipy_perms.py

Listing 4.9: scipy_combinatorics.py

Calculating Log Sums

Listing 4.10: scipy_matrix_inv.py

Calculating Polynomial Values

Listing 4.11: scipy_poly.py

Calculating the Determinant of a Square Matrix

Listing 4.12: scipy_determinant.py

Calculating the Inverse of a Matrix

Listing 4.13: scipy_matrix_inv.py

Calculating Eigenvalues and Eigenvectors

Listing 4.14: scipy_eigen.py

Calculating Integrals (Calculus)

Listing 4.15: scipy_integrate.py

Calculating Fourier Transforms

Listing 4.16: scipy_fourier.py

Flipping Images in SciPy

Listing 4.17: scipy_flip_image.py

Rotating Images in SciPy

Listing 4.18: scipy_rotate_image.py

Google Colaboratory

Uploading CSV Files in Google Colaboratory

Listing 4.19: upload_csv_file.ipynb

Summary

Chapter 5: Data Cleaning Tasks

What is Data Cleaning?

Data Cleaning for Personal Titles

Data Cleaning in SQL

Replace NULL with 0

Replace NULL Values with the Average Value

Listing 5.1: replace_null_values.sql

Replace Multiple Values with a Single Value

Listing 5.2: reduce_values.sql

Handle Mismatched Attribute Values

Listing 5.3: type_mismatch.sql

Convert Strings to Date Values

Listing 5.4: str_to_date.sql

Data Cleaning from the Command Line (optional)

Working with the sed Utility

Listing 5.5: delimiter1.txt

Listing 5.6: delimiter1.sh

Working with Variable Column Counts

Listing 5.7: variable_columns.csv

Listing 5.8: variable_columns.sh

Listing 5.9: variable_columns2.sh

Truncating Rows in CSV Files

Listing 5.10: variable_columns3.sh

Generating Rows with Fixed Columns with the awk Utility

Listing 5.11: FixedFieldCount1.sh

Listing 5.12: employees.txt

Listing 5.13: FixedFieldCount2.sh

Converting Phone Numbers

Listing 5.14: phone_numbers.txt

Listing 5.15: phone_numbers.sh

Converting Numeric Date Formats

Listing 5.16: dates.txt

Listing 5.17: dates.sh

Listing 5.18: dates2.sh

Converting Alphabetic Date Formats

Listing 5.19: dates2.txt

Listing 5.20: dates3.sh

Working with Date and Time Date Formats

Listing 5.21: date-times.txt

Listing 5.22: date-times-padded.sh

Working with Codes, Countries, and Cities

Listing 5.23: country_codes.csv

Listing 5.24: add_country_codes.sh

Listing 5.25: countries_cities.csv

Listing 5.26: split_countries_codes.sh

Listing 5.27: countries_cities2.csv

Listing 5.28: split_countries_codes2.sh

Data Cleaning on a Kaggle Dataset

Listing 5.29: convert_marketing.sh

Summary

Chapter 6: Data Visualization

What is Data Visualization?

Types of Data Visualization

What is Matplotlib?

Diagonal Lines in Matplotlib

Listing 6.1: diagonallines.py

A Colored Grid in Matplotlib

Listing 6.2: plotgrid2.py

Randomized Data Points in Matplotlib

Listing 6.3: lin_plot_reg.py

A Histogram in Matplotlib

Listing 6.4: histogram1.py

A Set of Line Segments in Matplotlib

Listing 6.5: line_segments.py

Plotting Multiple Lines in Matplotlib

Listing 6.6: plt_array2.py

Trigonometric Functions in Matplotlib

Listing 6.7: sincos.py

Display IQ Scores in Matplotlib

Listing 6.8: iq_scores.py

Plot a Best-Fitting Line in Matplotlib

Listing 6.9: plot_best_fit.py

The Iris Dataset in SkLearn

Listing 6.10: sklearn_iris1.py

SkLearn, Pandas, and the Iris Dataset

Listing 6.11: pandas_iris.py

Working with Seaborn

Features of Seaborn

Seaborn Built-in Datasets

Listing 6.12: seaborn_tips.py

The Iris Dataset in Seaborn

Listing 6.13: seaborn_iris.py

The Titanic Dataset in Seaborn

Listing 6.14: seaborn_titanic_plot.py

Extracting Data from the Titanic Dataset in Seaborn (1)

Listing 6.15: seaborn_titanic.py

Extracting Data from the Titanic Dataset in Seaborn (2)

Listing 6.16: seaborn_titanic2.py

Visualizing a Pandas Dataset in Seaborn

Listing 6.17: pandas_seaborn.py

Data Visualization in Pandas

Listing 6.18: pandas_viz1.py

What is Bokeh?

Listing 6.19: bokeh_trig.py

Summary

Appendix A: Working with Data

What are Datasets?

Data Preprocessing

Data Types

Preparing Datasets

Discrete Data vs. Continuous Data

Binning Continuous Data

Scaling Numeric Data via Normalization

Scaling Numeric Data via Standardization

What to Look for in Categorical Data

Mapping Categorical Data to Numeric Values

Working with Dates

Working with Currency

Missing Data, Anomalies, and Outliers

Missing Data

Anomalies and Outliers

Outlier Detection

What is Data Drift?

What is Imbalanced Classification?

What is SMOTE?

SMOTE Extensions

Analyzing Classifiers (Optional)

What is LIME?

What is ANOVA?

The Bias-Variance Trade-Off

Types of Bias in Data

Summary

Appendix B: Working with awk

The awk Command

Built-in Variables that Control awk

How Does the awk Command Work?

Aligning Text with the printf Statement

Listing B.1: columns2.txt

Listing B.2: AlignColumns1.sh

Conditional Logic and Control Statements

The while Statement

A for loop in awk

Listing B.3: Loop.sh

A for loop with a break Statement

The next and continue Statements

Deleting Alternate Lines in Datasets

Listing B.4: linepairs.csv

Listing B.5: deletelines.sh

Merging Lines in Datasets

Listing B.6: columns.txt

Listing B.7: ColumnCount1.sh

Printing File Contents as a Single Line

Joining Groups of Lines in a Text File

Listing B.8: digits.txt

Listing B.9: digits.sh

Joining Alternate Lines in a Text File

Listing B.10: columns2.txt

Listing B.11: JoinLines.sh

Listing B.12: JoinLines2.sh

Listing B.13: JoinLines2.sh

Matching with Meta Characters and Character Sets

Listing B.14: Patterns1.sh

Listing B.15: columns3.txt

Listing B.16: MatchAlpha1.sh

Printing Lines Using Conditional Logic

Listing B.17: products.txt

Splitting Filenames with awk

Listing B.18: SplitFilename2.sh

Working with Postfix Arithmetic Operators

Listing B.19: mixednumbers.txt

Listing B.20: AddSubtract1.sh

Numeric Functions in awk

One Line awk Commands

Useful Short awk Scripts

Listing B.21: data.txt

Printing the Words in a Text String in awk

Listing B.22: Fields2.sh

Count Occurrences of a String in Specific Rows

Listing B.23: data1.csv

Listing B.24: data2.csv

Listing B.25: checkrows.sh

Printing a String in a Fixed Number of Columns

Listing B.26: FixedFieldCount1.sh

Printing a Dataset in a Fixed Number of Columns

Listing B.27: VariableColumns.txt

Listing B.28: Fields3.sh

Aligning Columns in Datasets

Listing B.29: mixed-data.csv

Listing B.30: mixed-data.sh

Aligning Columns and Multiple Rows in Datasets

Listing B.31: mixed-data2.csv

Listing B.32: aligned-data2.csv

Listing B.33: mixed-data2.sh

Removing a Column from a Text File

Listing B.34: VariableColumns.txt

Listing B.35: RemoveColumn.sh

Subsets of Column-aligned Rows in Datasets

Listing B.36: sub-rows-cols.txt

Listing B.37: sub-rows-cols.sh

Counting Word Frequency in Datasets

Listing B.38: WordCounts1.sh

Listing B.39: WordCounts2.sh

Listing B.40: columns4.txt

Displaying Only Pure Words in a Dataset

Listing B.41: onlywords.sh

Working with Multi-line Records in awk

Listing B.42: employees.txt

Listing B.43: employees.sh

A Simple Use Case

Listing B.44: quotes3.csv

Listing B.45 delim1.sh

Another Use Case

Listing B.46: dates2.csv

Listing B.47: string2date2.sh

Summary

Index

PREFACE

What is the Primary Value Proposition for this Book?

This book contains a fast-paced introduction to as much relevant information about Python tools for data scientists as possible that can be reasonably included in a book of this size. If you are a novice, this book will give you a starting point from which you can decide which Python technologies that you want to explore in greater detail.

You will be exposed to features of NumPy and Pandas, how to write regular expressions, and how to perform data cleaning tasks. Some topics are presented in a cursory manner, which is for two main reasons. First, it’s important that you be exposed to these concepts. In some cases, you will find topics that might pique your interest, and hence motivate you to learn more about them through self-study; in other cases, you will probably be satisfied with a brief introduction. In other words, you decide whether to delve deeply into each of the topics in this book.

Second, a full treatment of all the topics that are covered in this book would significantly increase its size, and few people are interested in reading technical tomes with 500 or more pages.

However, it’s important for you to decide if this approach is suitable for your needs and learning style. If not, you can select one or more of the plethora of data analytics books that are available.

The Target Audience

This book is intended primarily for people who have worked with Python and are interested in learning about several important Python libraries. Moreover, this book is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. Consequently, this book uses standard English rather than colloquial expressions that might be confusing to those readers. As you know, many people learn by different types of imitation, which includes reading, writing, or hearing new material. This book takes these points into consideration to provide a comfortable and meaningful learning experience for the intended readers.

What Will I Learn from This Book?

The first chapter contains a quick tour of basic Python, followed by a chapter that introduces you to Python data structures. Next, Chapter 3 introduces you to NumPy, followed by a chapter for Pandas. Chapter 5 provides a high-level view of Sklearn, which is an extremely powerful Python library that is central to many machine learning tasks.

Chapter 6 contains an assortment of data cleaning tasks that are solved via Python as well as the awk programming language. Chapter 6 delves into data visualization with Matplotlib, Seaborn, and Bokeh. Next, one appendix explores issues that can arise with data, followed by an appendix for awk.

Why is an Appendix for awk Included in This Book?

While many data cleaning tasks can be performed via Python, sometimes it’s much easier to perform data cleaning via awk. If you have not worked with awk, it’s a venerable Unix utility that was developed almost 50 years ago by Aho, Weinberger, and Kernighan (the latter is a coauthor of the famous K&R book for C).

Incidentally, most of the Python code samples are short (usually less than one page and sometimes less than half a page), and if need be, you can easily and quickly copy/paste the code into a new Jupyter notebook. For the Python code samples that reference a CSV file, you do not need any additional code in the corresponding Jupyter notebook to access the CSV file. Moreover, the code samples execute quickly, so you won’t need to avail yourself of the free GPU that is provided in Google Colaboratory.

If you do decide to use Google Colaboratory, you can easily copy/paste the Python code into a notebook, and also use the upload feature to upload existing Jupyter notebooks. Keep in mind the following point: if the Python code references a CSV file, make sure that you include the appropriate code snippet (as explained in Chapter 1) to access the CSV file in the corresponding Jupyter notebook in Google Colaboratory.

Do I Need to Learn the Theory Portions of this Book?

Once again, the answer depends on the extent to which you plan to become involved in data analytics. For example, if you plan to study machine learning, then you will probably learn how to create and train a model, which is a task that is performed after data cleaning tasks. In general, you will probably need to learn everything that you encounter in this book if you are planning to become a machine learning engineer.

Why Does This Book Include Sklearn Material?

The amount of Sklearn material in this book is minimal because this book is not about machine learning. The Sklearn material is located in Chapter 6, where you will learn about some of the Sklearn built-in datasets. If you decide to delve into machine learning, you will have already been introduced to some aspects of Sklearn.

Getting the Most from This Book

Some programmers learn well from prose, others learn well from sample code (and lots of it), which means that there’s no single style that can be used for everyone.

Moreover, some programmers want to run the

Enjoying the preview?

Page 1 of 1

Python Tools for Data Scientists Pocket Primer: A Quick Guide to Essential Python Libraries for Data Science

About this ebook

Mercury Learning and Information

Read more from Mercury Learning And Information

Related authors

Related to Python Tools for Data Scientists Pocket Primer

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Python Tools for Data Scientists Pocket Primer

What did you think?

Book preview

Python Tools for Data Scientists Pocket Primer - Mercury Learning and Information

PYTHON TOOLS

FOR

DATA SCIENTISTS

Pocket Primer

CONTENTS

PREFACE

What is the Primary Value Proposition for this Book?

The Target Audience

What Will I Learn from This Book?

Why is an Appendix for awk Included in This Book?

Do I Need to Learn the Theory Portions of this Book?

Why Does This Book Include Sklearn Material?

Getting the Most from This Book