Nothing Special   »   [go: up one dir, main page]

Skip to content

business-science/workshop_2018_dsgo

Repository files navigation

DSGO 2018 - Machine Learning With R + H2O Workshop

Get ready to learn how to predict credit defaults with R + H2O!

Program

  • Data is Credit Loan Applications to a Bank.

  • Objective is to assess Risk Of Default, prevent bad loans, save bank lots of $$$

  • Best Kagglers got 0.80 AUC with more 100's of manhours, feature engineering, combining more data sets

  • We'll get 0.74 AUC in 30 minutes of coding (+1.5 hour of explaining)

Data

  • Kaggle Competition: Home Credit Default Risk

  • Data is large (166MB unzipped, 308K rows, 122 columns)

  • Will work with sampled data 20% to keep manageable

Machine Learning With H2O

The goal of Machine Learning with H2O is to get you experience with:

  1. The R programming language

  2. h2o for machine learning

  3. lime for feature explanation

  4. recipes for preprocessing

Becoming A Data Science Rockstar

  • This 3 hour workshop will teach you some of the latest tools & techniques for Machine Learning in business

  • With this said, you will spend 5% of your time on modeling (machine learning) & 95% of your time:

    • Managing projects
    • Collecting & working with data (manipulating, combining, cleaning)
    • Visualizing information - showing the size of problems and what is likely contributing
    • Communicating results in terms the business cares about
    • Recommending actions that improve the business
  • Further, your organization will be keenly aware of what you contribute financially. You need to show them Return on Investment (ROI). They are making an investment in having a data science team. They expect tangible results.

  • Important Actions:

    • Attend my talk on the Business Science Problem Framework tomorrow. The BSPF is the essential system that enables driving ROI with data science.

    • Take my DS4B 201-R course. This teaches you a 10-Week Program that has cut data science projects in half for consultants and has progressed data scientists more than any other course they've take. You will get 20% OFF (expires after DSGO conference).


Installation Instructions

Option 1: RStudio IDE Desktop + Install R Packages

Step 1: Install R and RStudio IDE
Step 2: Open Rstudio and run the following scripts
pkgs <- c("h2o", "tidyverse", "rsample", "recipes", "lime")
install.packages(pkgs)

Test H2O - You may need the Java Developer Kit

library(h2o)
h2o.init()

If H2O cannot connect, you probably need to install Java.

Step 3: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo

Option 2: If You Have Docker Installed

Step 0: Docker Installation (Takes Time)

Skip this step if you already have Docker Community Edition installed

Docker Community Edition Installation Instructions

Step 1: Run the DSGO Workshop Docker Image

In a terminal / command line, run the following command to download and install the workshop container. This will take a few minutes to load.

docker run -d -p 8787:8787 -v "`pwd`":/home/rstudio/working -e PASSWORD=rstudio -e ROOT=TRUE mdancho/workshop_2018_dsgo
Step 3: Fire Up RStudio IDE in your Browser

Go into you favorite browser (I'll be using Chrome), and enter the following in the web address field.

localhost:8787
Step 4: Log into RStudio Server

Use the following credentials.

  • User Name: rstudio
  • Password: rstudio
Step 5: Load the Project From GitHub

Wait for instructions from Matt.

The URL for the GitHub project is:

https://github.com/business-science/workshop_2018_dsgo


Further Resources

About

DataScienceGO 2018 - Machine Learning Workshop

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published