Machine Learning with Python - Bespoke

Course Code: pymlbspk

Duration: 21 hours

Prerequisites:

Basic knowledge of statistical concepts is desirable though not compulsary.

A prior knowledge of programming (preferably Python) is highly recommended.

Overview:

In this training, participants will learn how to apply machine learning techniques and tools for solving real-world problems in the banking industry. Python will be used as the programming language.

Participants first learn the key principles, then put their knowledge into practice by building their own machine learning models and using them to complete a number of team projects.

The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the Python programming language and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results.

Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications.

Audience

Developers
Data scientists

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Course Outline:

Quick Introduction to Python

Quick go through with Python

Getting Started with Python Libraries for Machine Learning

Introduction to Machine Learning

This section provides a general introduction of when to use 'machine learning', what should be considered and what it all means including the pros and cons. Datatypes (structured/unstructured/static/streamed), data validity/volume, data driven vs user driven analytics, statistical models vs. machine learning models/ challenges of unsupervised learning, bias-variance trade off, iteration/evaluation, cross-validation approaches, supervised/unsupervised/reinforcement.

MAJOR TOPICS

1.Understanding naive Bayes

Basic concepts of Bayesian methods
Probability
Joint probability
Conditional probability with Bayes' theorem
The naive Bayes algorithm
The naive Bayes classification
The Laplace estimator
Using numeric features with naive Bayes

2.Understanding decision trees

Divide and conquer
The C5.0 decision tree algorithm
Choosing the best split
Pruning the decision tree

3. Understanding neural networks

From biological to artificial neurons
Activation functions
Network topology
The number of layers
The direction of information travel
The number of nodes in each layer
Training neural networks with backpropagation
Deep Learning

4. Understanding Support Vector Machines

Classification with hyperplanes
Finding the maximum margin
The case of linearly separable data
The case of non-linearly separable data
Using kernels for non-linear spaces

5. Understanding clustering

Clustering as a machine learning task
The k-means algorithm for clustering
Using distance to assign and update clusters
Choosing the appropriate number of clusters

6. Measuring performance for classification

Working with classification prediction data
A closer look at confusion matrices
Using confusion matrices to measure performance
Beyond accuracy – other measures of performance
The kappa statistic
Sensitivity and specificity
Precision and recall
The F-measure
Visualizing performance tradeoffs
ROC curves
Estimating future performance
The holdout method
Cross-validation
Bootstrap sampling

7. Tuning stock models for better performance

Using caret for automated parameter tuning
Creating a simple tuned model
Customizing the tuning process
Improving model performance with meta-learning
Understanding ensembles
Bagging
Boosting
Random forests
Training random forests
Evaluating random forest performance

MINOR TOPICS

8. Understanding classification using nearest neighbors

The kNN algorithm
Calculating distance
Choosing an appropriate k
Preparing data for use with kNN
Why is the kNN algorithm lazy?

9. Understanding classification rules

Separate and conquer
The One Rule algorithm
The RIPPER algorithm
Rules from decision trees

10.Understanding regression

Simple linear regression
Ordinary least squares estimation
Correlations
Multiple linear regression

11.Understanding regression trees and model trees

Adding regression to trees

12. Understanding association rules

The Apriori algorithm for association rule learning
Measuring rule interest – support and confidence
Building a set of rules with the Apriori principle