DSGO 2018 - Machine Learning With R + H2O Workshop

Get ready to learn how to predict credit defaults with R + H2O!

Program

Data is Credit Loan Applications to a Bank.
Objective is to assess Risk Of Default, prevent bad loans, save bank lots of $$$
Best Kagglers got 0.80 AUC with more 100's of manhours, feature engineering, combining more data sets
We'll get 0.74 AUC in 30 minutes of coding (+1.5 hour of explaining)

Data

Kaggle Competition: Home Credit Default Risk
Data is large (166MB unzipped, 308K rows, 122 columns)
Will work with sampled data 20% to keep manageable

Machine Learning With H2O

The goal of Machine Learning with H2O is to get you experience with:

The R programming language
h2o for machine learning
lime for feature explanation
recipes for preprocessing

Becoming A Data Science Rockstar

This 3 hour workshop will teach you some of the latest tools & techniques for Machine Learning in business
With this said, you will spend 5% of your time on modeling (machine learning) & 95% of your time:
- Managing projects
- Collecting & working with data (manipulating, combining, cleaning)
- Visualizing information - showing the size of problems and what is likely contributing
- Communicating results in terms the business cares about
- Recommending actions that improve the business
Further, your organization will be keenly aware of what you contribute financially. You need to show them Return on Investment (ROI). They are making an investment in having a data science team. They expect tangible results.
Important Actions:
- Attend my talk on the Business Science Problem Framework tomorrow. The BSPF is the essential system that enables driving ROI with data science.
- Take my DS4B 201-R course. This teaches you a 10-Week Program that has cut data science projects in half for consultants and has progressed data scientists more than any other course they've take. You will get 20% OFF (expires after DSGO conference).

Installation Instructions

Option 1: RStudio IDE Desktop + Install R Packages

Step 1: Install R and RStudio IDE

Step 2: Open Rstudio and run the following scripts

pkgs <- c("h2o", "tidyverse", "rsample", "recipes", "lime")
install.packages(pkgs)

Test H2O - You may need the Java Developer Kit

library(h2o)
h2o.init()

If H2O cannot connect, you probably need to install Java.

Step 3: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo

Option 2: If You Have Docker Installed

Step 0: Docker Installation (Takes Time)

Skip this step if you already have Docker Community Edition installed

Docker Community Edition Installation Instructions

Step 1: Run the DSGO Workshop Docker Image

In a terminal / command line, run the following command to download and install the workshop container. This will take a few minutes to load.

docker run -d -p 8787:8787 -v "`pwd`":/home/rstudio/working -e PASSWORD=rstudio -e ROOT=TRUE mdancho/workshop_2018_dsgo

Step 3: Fire Up RStudio IDE in your Browser

Go into you favorite browser (I'll be using Chrome), and enter the following in the web address field.

localhost:8787

Step 4: Log into RStudio Server

Use the following credentials.

User Name: rstudio
Password: rstudio

Step 5: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo

Further Resources

tidyverse: A meta-package for data wrangling and visualization. Loads dplyr, ggplot2, and a number of essential packages for working with data. Documentation: https://www.tidyverse.org/
recipes: A preprocessing package that includes many standard preprocessing steps. Documentation: https://tidymodels.github.io/recipes/
h2o: A high-performance machine learning library that is scalable and is optimized for perfromance. Documentation: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html
- GLM: Elastic Net (Generalized Linear Regression with L1 + L2 Regularization)
- GBM: Gradient Boosted Machines (Tree-Based + Boosting)
- Random Forest: Tree Based + Bagging
- Deep Learning: Neural Network
- Automated Machine Learning: Stacked Ensemble, All Models and Best of Family
lime: A package for explaining black-box models. LIME Tutorial: https://www.business-science.io/business/2018/06/25/lime-local-feature-interpretation.html

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
00_data		00_data
00_images		00_images
00_scripts		00_scripts
01_machine_learning_h2o		01_machine_learning_h2o
.gitignore		.gitignore
Dockerfile		Dockerfile
README.html		README.html
README.md		README.md
workshop_2018_dsgo.Rproj		workshop_2018_dsgo.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSGO 2018 - Machine Learning With R + H2O Workshop

Program

Data

Machine Learning With H2O

Becoming A Data Science Rockstar

Installation Instructions

Option 1: RStudio IDE Desktop + Install R Packages

Step 1: Install R and RStudio IDE

Step 2: Open Rstudio and run the following scripts

Step 3: Load the Project From GitHub

Option 2: If You Have Docker Installed

Step 0: Docker Installation (Takes Time)

Step 1: Run the DSGO Workshop Docker Image

Step 3: Fire Up RStudio IDE in your Browser

Step 4: Log into RStudio Server

Step 5: Load the Project From GitHub

Further Resources

About

Releases

Packages

Languages

business-science/workshop_2018_dsgo

Folders and files

Latest commit

History

Repository files navigation

DSGO 2018 - Machine Learning With R + H2O Workshop

Program

Data

Machine Learning With H2O

Becoming A Data Science Rockstar

Installation Instructions

Option 1: RStudio IDE Desktop + Install R Packages

Step 1: Install R and RStudio IDE

Step 2: Open Rstudio and run the following scripts

Step 3: Load the Project From GitHub

Option 2: If You Have Docker Installed

Step 0: Docker Installation (Takes Time)

Step 1: Run the DSGO Workshop Docker Image

Step 3: Fire Up RStudio IDE in your Browser

Step 4: Log into RStudio Server

Step 5: Load the Project From GitHub

Further Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages