Data Science For Business (DS4B 201 / HR 201)

Solve a real-world employee turnover (churn) problem with H2O automated machine learning & LIME black-box model explanations

   Watch Promo Get Started Today!

Please note that the Shiny Web Application is built in DS4B 301: Building A Shiny Web Application (Coming Soon!)

DS4B 201 teaches you the tools and frameworks for ROI-driven data science using the R-programming language.

Over the course of 10-weeks you'll dive in-depth into an Employee Attrition (Churn) problem, learning & applying a systematic process, cutting-edge tools, and R code.

At the end of the course, you'll be able to confidently apply data science within a business.

The difference with the DS4B 201 program: You get results!

Lecture Samples

Sample Lecture from Chapter 1, Business Understanding: BSPF & Code Workflows

Sample Lecture from Chapter 6, Modeling Churn: Explaining Black-Box Models With LIME

There are 100+ coding courses like this that walk you through the process of applying data science to the business problem!

About The Program

The course takes about 10 weeks to complete. It's an in-depth study of one churn / binary classification problem that goes into every facet of how to solve it. Here's the basic structure of DS4B 201:

  • It begins with the problem overview and tool introduction covering objectives, tools, and setup. We introduce the Business Science Problem Framework, which is our step-by-step roadmap for data science project success used in every chapter as you progress through the course.

  • It progresses into coding by sizing the problem and developing skills we use throughout the course in dplyr, ggplot2 and introducing you to a new metaprogramming language called Tidy Eval for programming with dplyr. You'll use Tidy Eval for the attrition workflow, and you'll build a customizable plotting function to show exec's which departments and roles are costing the organization the most due to attrition. The attrition workflow and plotting functions built with Tidy Eval are used in a future course for creating R Packages (DS4B 303)

  • It then goes through two chapters on foundational EDA / Pre-Modeling tools (tidyverse, dplyr, ggplot2, purrr, skimr, GGally, recipes) to work with data, explore data, prepare data, analyze data, and do everything needed prior to modeling. The goal is not to waste time modeling until the problem is well understood and the data is likely to be a good set for modeling. We teach you how to do this at length because it saves a lot of time.

  • Next, there are two chapters on H2O. Generating models is the first H2O chapter, and you'll gain an understanding of the primary H2O functions for automated machine learning, prediction, and performance. You'll create a visualization that examines the 30+ models you build. There's also a bonus lecture on H2O grid search. The second chapter is an in-depth performance analysis. You'll learn about ROC Plot, Precision vs Recall, Gain and Lift Plots (which are for executive communication). You will build a custom plotting function the is the "ultimate performance dashboard", combining the 4 plots using cowplot.

  • Then you learn about LIME and how to perform local interpretability. You'll learn how to create explanations. You'll also have a cool challenge where you recreate the plots with a business-ready theme.

  • Next, is an in-depth chapter on Expected Value. We start with a basic case of making a "No Overtime" policy change. Just toggling Overtime = Yes to No. We calculate the expected value of this decision. We then go through Expected Value Framework, a tool that enables targeting high-risk churners and accounts costs associated with false negatives / false positives. We then teach how to optimize the threshold using purrr for iteration to maximize expected savings of a targeted policy. We then teach you sensitivity analysis again using purrr to show a heatmap that covers confidence ranges that you can explain to executives.

  • The last major chapter is on a recommendation algorithm that you build, which makes employee-level recommendations based on inputs. The recommendation algorithm and LIME analysis are used in a future course on building a Shiny App (DS4B 301)

Class Curriculum

Available in days
days after you enroll
  Chapter 0: Getting Started
Available in days
days after you enroll
Available in days
days after you enroll
  0.4 Frameworks
Available in days
days after you enroll
  Chapter 1, Business Understanding: BSPF & Code Workflows
Available in days
days after you enroll
  Aside: Intro To Tidy Eval
Available in days
days after you enroll
  1.6 Chapter Code
Available in days
days after you enroll
  Chapter 2, Data Understanding: By Data Type & Feature-Target Interactions
Available in days
days after you enroll
  2.1 Setting Up For Data Understanding
Available in days
days after you enroll
  2.4 Challenge #2: Assessing Feature Pairs
Available in days
days after you enroll
  2.5 Chapter Code
Available in days
days after you enroll
  Course Survey #1: Your Feedback Is Important!
Available in days
days after you enroll
  Chapter 3, Data Preparation: Getting Data Ready For People & Machines
Available in days
days after you enroll
  3.1 Data Preparation Setup
Available in days
days after you enroll
  3.5 Challenge #3: Correlation Analysis
Available in days
days after you enroll
  3.6 Chapter Code
Available in days
days after you enroll
  Chapter 4, Modeling Churn: Using Automated Machine Learning With H2O
Available in days
days after you enroll
  4.6 Chapter Code
Available in days
days after you enroll
  Chapter 5, Modeling Churn: Assessing H2O Performance
Available in days
days after you enroll
  5.1 Performance Overview & Setup
Available in days
days after you enroll
  5.3 Performance Charts For Data Scientists
Available in days
days after you enroll
  5.6 Chapter Code
Available in days
days after you enroll
  Chapter 6, Modeling Churn: Explaining Black-Box Models With LIME
Available in days
days after you enroll
  6.1 Chapter Overview & Setup
Available in days
days after you enroll
  6.4 Chapter Code
Available in days
days after you enroll
  Chapter 7, Evaluation: Calculating The Expected ROI (Savings) Of A Policy Change
Available in days
days after you enroll
  7.5 Chapter Code
Available in days
days after you enroll
  Chapter 8: Evaluation, Maximizing ROI (Savings) With Threshold Optimization & Sensitivity Analysis
Available in days
days after you enroll
  Chapter 9, Evaluation: Creating A Recommendation Algorithm
Available in days
days after you enroll
  Chapter 10, Conclusion: Next Steps
Available in days
days after you enroll

Your Instructor

Matt Dancho
Matt Dancho

Founder of Business Science and general business & finance guru, He has worked with many clients from Fortune 500 to high-octane startups! Matt loves educating data scientists on how to apply powerful tools within their organization to yield ROI. Matt doesn't rest until he gets results (literally, he doesn't sleep so don't be suprised if he responds to your email at 4AM)!

Get started now!

Main Features & Benefits

The feedback provided in the initial student survey has a few consistent themes for what the students love:

  • The BSPF framework - I show you how to implement the steps in your business to size the problem, understand the drivers, work with key decision makers & develop an ROI-driven solution for your company. This is your guide, and I show it step-by-step.

  • The tools - You'll learn how to use H2O Automated Machine Learning for a binary classification problem (think predicting customer churn, detecting whether or not a customer will make a purchase from an advertisement, and so on). You'll learn feature explanation with LIME to explain the key features. You'll learn threshold optimization which is a critical step in targeting key customers (or those that are likely to purchase by maximizing expected profit). You'll learn sensitivity analysis to take into account variability in your model parameters.

  • The code skills & R packages - You will learn 95% of the most common code techniques, R functions, and R packages that I use on a daily basis. You will learn ggplot2 (visualization), dplyr (data wrangling), purrr (iteration), recipes (preprocessing for ML), fs (working with the file system), skimr & GGally (exploring data) and more - IN-DEPTH. By following along in the hours of lectures, you will leave the course confident in how to use all of the packages knowing the most important functions like purrr::map(). You will develop 10+ visualizations creating several custom plotting functions. You'll be a data science rockstar just with these techniques.

  • The tidyverse programming (Tidy Eval) - Watch Hadley's video, this is what we teach. You will create a bunch of functions using Tidy Eval. It's technically an advanced programming language called rlang, but it's super important when you start building your own functions. We teach it.

  • It's Integrated - We have future courses coming using the same problem that teach Shiny Web App Development, R Package Development, & Rmarkdown reporting and interactive visualization. Students have not seen this yet, but are excited about the idea of integration.

Private Slack Channel Community

Engage with the instructor (Matt) and other students in the course via our private Slack Workspace.

Increase confidence, build critical thinking skills, & take your data science to the next level

Employee Attrition: A High-Impact Problem

Employee turnover (attrition) can be a $15M/YEAR COST to an organization that loses on average 200 high performing employees per year. Predicting turnover is at the forefront of Human Resources (HR) needs in many organizations. Further, HR departments typically have historical data on employees making this a perfect problem for DATA SCIENCE FOR BUSINESS.

Until now the mainstream approach has been to use logistic regression or survival curves to model employee attrition. However, with advancements in machine learning (ML), we can now get both better predictive performance and better explanations of what critical features are linked to employee attrition.

In Data Science For Business (HR 201), you'll learn how to:

  • Use People Analytics (Human Resources) data to predict and explain employee turnover
  • Implement the Business Science Problem Framework and CRISP-DM to tackle any organizational data science problem
  • Perform automated machine learning with H2O
  • Explain complex, black-box machine learning models with LIME

The Ultimate Machine Learning Course For Business

The Ultimate Machine Learning Course For Business

Learn how to apply the Business Science Problem Framework & CRISP-DM through an end-to-end data science for business project:

As a data scientist you need to be able to build custom functions to get things done. Learn Tidy Eval, a new programming framework for dplyr and other tidyverse packages, to build reusable functions that the data science team can scale.

Learn H2O, a high-end machine learning library, by building an extremely predictive machine learning employee turnover classifier.

In business, it's more important to know why something happened rather than the prediction itself. We learn LIME to develop these explanations and interpret complex models.


Turn raw data into ML-ready datasets with recipes.

Understand data by data type using skimr, and visualize interactions with GGally.

Apply Correlation Analysis to understand feature importance prior to modeling.

After taking this course, you'll have an excellent understanding of the data science process and how you can immediately apply within a business context to yield positive ROI for your organization.


Get started now!

Lifetime Access Gets You

  • A complete walk-through of an end-to-end data science project by solving a real-world problem
  • A play-by-play strategy to yield Return-On-Investment (ROI) for your company
  • Hours of expert instruction in how to apply data science for business from the Founder of Business Science
  • PDF Frameworks & Excel Calculators that gain buy-in from Executives when pitching your Data Science Project
  • Access to our Slack Channel Community for asking data science questions & discussing the course!

Course Satisfaction Results

As of July 9, 2018, we are currently getting an average Course Satisfaction rating from students of

9.0 / 10


We think it's great, but don't just listen to us. Here's what other students have to say about Data Science For Business (HR 201).

"Business Science University gives a solid approach to understanding what a Data Scientist needs to do to transform an idea into a full solution, also taking into account that this process must return the investment for the company and add value. Mixing both theory and programming you’ll learn with real-world examples the bulletproof workflow that the successful company founded by Matt Dancho use to do Data Science. This is not another course, this is the ultimate ecosystem for you to develop and improve as a data scientist for your organization."

- Favio Vázquez, Principal Data Scientist, OXXO

"I have been going through books & MOOC's to skill-up my data science game. HR 201 is the first course that gives me a CLEAR FRAMEWORK to apply data science to Business Intelligence! It gives me the opportunity to bring data science to my organization and clearly articulate the business value proposition throughout the process. All that with the help of bleeding-edge open source tools (H2O, LIME, RStudio)"

- Renaud Liber, Business/Data Analyst - BI, Napoleon Games NV

"Business Science University is an excellent resource for learning data science. The HR 201 course does a great job of teaching how to communicate a business problem, how to execute investigative thinking to solve the problem, and properly structuring code for collaboration and reusability. Most importantly, I took away a repeatable methodology and project structure that can be used to solve future business problems using data science. This was well worth the investment."

- David Curry, CTO, Africa Talent Management

Sunita Kenner, Senior Manager: Data/Business Analytics at Extensis.

Feedback provided in... R (Awesome!!)

DS4B 201 / HR 201 Content Release Schedule

The content for this course is being released following a drip schedule. The first 50% of the course is available at launch. Subsequent lessons are to be released following the schedule outlined in the Class Curriculum.


Refer to the free Test Your Baseline Knowledge Check in the Class Curriculum to determine your fitness for this course. As a prerequisite, the learner is expected to:

  • Be familiar with the R statistical programming language (e.g. have R setup on computer, have RStudio IDE working, have basic familiarity with R programming language)
  • Be familiar with the tidyverse (e.g. basic knowledge of dplyr and ggplot2)

Everything else will be taken care of!

Business Discounts

Please contact Business Science to find rates for multiple users & organizations.

How The 4-Course Virtual Workshop System Works

We use a hub-and-spokes model. DS4B 201 / HR 201 (200-series course) is the hub that serves as the base for each extension (300-series courses). This maintains a consistent theme across multiple courses by using the same business problem while focusing on the tools that data scientist's need to use in their day-to-day work.

There are several advantages to the hub-and-spokes model:

  • It is focused on solving a problem
  • It simulates the real-world
  • Each course stands-alone so you can take what you are interested in
  • Courses can be combined, which exponentially magnifies your learning

HR 201 is the first course in the 4-Course Virtual Workshop, and HR 201 is what you get when you purchase this course. The release schedule for the others is TBA (to be announced). More information is coming!

Next Steps: Take The Rest Of The Virtual Workshop Courses!

A data scientist can never stop learning. When this happens, plateau sets in, which is exactly what you and your organization cannot afford! (This is why Business Science provides data science coaching as a service!)

Don't plateau!

Continue with the rest of the Virtual Workshop to exponentially multiply your learning!

HR 301 (COMING SOON): Building A Shiny App (Employee Smart Scorecard)

The most effective means of improving your organization is by helping others make data-driven decisions.

A Machine Learning-Powered Web Application is 100% the best way to do this. (Trust us, we've seen the change it makes in an organization.) Building a Machine Learning-Powered Web Application is easier than you think with Shiny!

You can further your capabilities by taking our integrated HR 301 course, which implements our H2O model in a Shiny Web App for interactive employee attrition prevention recommendations. We call it the Employee Smart Scorecard!

HR 302 (COMING SOON): Communicating With RMarkdown Reports

Executive communication makes or breaks a data science project. Further, data science can be extremely valuable in customer communication.

In HR 302, you'll use RMarkdown to communicate the story through reports and presentations designed for your target audiences: executives (global decision makers), managers (local decision makers), and data science peers (reproducers / reviewers). Additionally, you'll learn about parameterized Rmarkdown reports, which is perfect for automated reporting.

HR 303 (COMING SOON): Building An R Package

Data scientists need to be able to create packages to simplify workflows and to keep the Data Science Team's analyses consistent.

Build an R package, tidyattrition, in HR 303. The tidyattrition package follows the workflow developed in the Business Understanding phase. Learn to turn custom tidyeval functions such as assess_attrition(), calculate_attrition_cost(), and plot_attrition() into an R package that others can use!

Frequently Asked Questions

Who is this course for?
This course is for anyone with R programming experience seeking to apply data science for business (DS4B). It's not for complete beginners! With that said, a basic understanding of R, dplyr, and ggplot2 will be sufficient to complete the course. Although the concepts are advanced, the hard stuff is explained such that a novice/intermediate learner will pick it up!
This course is part of a "4-Course Virtual Workshop". What are the other courses?
We are currently working on 3 other courses. These are extensions of HR 201: - HR 301: Shiny App - We build this using the H2O and LIME model / output and the recommendation algorithm we build. - HR 302: Reporting / Communication - We go through communication and reporting for executives and the organization. We focus on Rmarkdown and also parameterized reports that can be deployed. - HR 303: R Package, tidyattrition - We create an R Package called tidyattrition that contains streamlined attrition workflow functions for creating the project directory, assessing attrition, and plotting attrition cost.
Does the current outline cover all 4-courses or just the HR 201 Course?
The current outline is just for the first. In the first course, we go through the BSPF problem framework, understand the business, understand the data, preprocess the data and perform a pre-modeling correlation analysis, develop H2O models, evaluate H2O modeling performance, use LIME to understand why the black-box model selects what it selects, develop recommendation logic, and evaluate the business value. That's what DS4B HR 201 is about.
Is the price listed just for the first of four courses or for the entire four part workshop?
The prices listed is for the first course, which is the flagship (hub in hub-and-spokes). Other courses will be offered at an additional charge. These are all standalone courses so you can select which ones you want a la carte and you will receive all materials to complete that course independent of the rest. However, they all use the same data and theme for solving a singular problem, which is beneficial for practitioners in the real-world.
Should I take this course even though it deals with an Employee Attrition Problem, which is not in my domain?
Absolutely. The course uses a real-world example of an HR problem, which may not be specific to all Data Science For Business (DS4B) needs. However, the system and tools used are applicable to ANY BINARY CLASSIFICATION PROBLEM (for example, customer churn, fraud detection, any yes/no problem!). The real value is in the tools and techniques used - You will learn our process along with advanced tools!
Will this course be beneficial if I have a non-traditional background (e.g. Sales, Finance, Sociology, Marketing, Operations, Classical Music)?
Look, my background is non-traditional (mechanical engineering). If I can do it, you can do it. As long as you are (1) interested in data science, and (2) interested in applying it to business, then you are the right candidate. You should however take a basic course that teaches R, dplyr, and ggplot2 so you have the minimum skillset. Refer to the quiz: Test Your Baseline.
What will I learn beyond the basics?
You will learn a ton: H2O, LIME, recipes, and much much more! In addition to the course overview, the course has free-previews. Take a peek and see if you like the content.
I am finishing my degree. Will this course help me?
Yes. The course bridges the gap between academic data science and real-world data science in a business context. This makes it excellent, if not essential, to your ability to hit the ground running when you transition into an organization.
When does the course start and finish?
The course starts now and never ends! It is a completely self-paced online course - you decide when you start and when you finish.
Are these courses self-paced, asynchronous?
All courses are completely self-paced. You can take them on your own schedule. The content uses pre-recorded video, and we will handle comments as we receive. The courses can be taken asynchronously. However, most will want to take HR 201 first before proceeding into the 300-series courses since this provides a foundation of the business problem and exposure to H20 and LIME.
How long do I have access to the course?
How does lifetime access sound? After enrolling, you have unlimited access to this course for as long as you like - across any and all devices you own.
Will the course continue to be updated with new content?
Yes - I have a number of new sections planned, and I will be actively improving to make sure the content is perfect! Your membership includes lifetime access as the course evolves.
What is the geographic availability for the course? Will it work outside US?
Yes - The course can be taken from virtually any geographic location that is permitted to trade with the US. This should cover 99.9% of the world population!
I love the course, but I can't afford it. What options do I have?
The value the course will deliver is exponentially more than its price. You have two options. First, your employer may offer education assistance. This is highly recommended because the education will ultimately benefit them... financially! (See the $15M/year problem lecture). Second, if self-funded, view it as an investment. What you are getting will help you get a job, develop a portfolio of experience using cutting-edge tools and processes, and manage a data science project the way we do!
What if I am unhappy with the course?
We would never want you to be unhappy! If you are unsatisfied with your purchase, contact us in the first 30 days and we will give you a full refund. Signup is risk-free!
Do you offer a payment plan?
Yes - We have a 3-payment monthly plan at a slightly higher rate. This option spreads the payments over three months.

Get started now!