Course Code: eirmlpy
Duration: 49 hours
Prerequisites:

n/a

Overview:

EIR require a tailor-made training/workshop programme for seven members of their data analytics team. The programme will have a duration of 7 days delivered over several weeks.

The programme will consist of a:

  • 1 day workshop
  • 2 day Python course
  • 3 day Machine Learning course
  • 1 day support workshop
Course Outline:

Workshop (1 day)

  • tbc but may be based on a 'case study' concerning how to solve a relevant company problem using data analytics and machine learning algorithms and then focus on exploring the data points and business requirements. 

Python Programming (2 days)

  • Introduction to Python and Ipython
  • Basics of Python language
  • Data structures
  • Operators
  • Program control – loops, conditionals
  • Debugging and managing programs
  • Functions
  • Python Modules
  • Import/Export data files
  • Numpy – tables and vectorization
  • Pandas – Data Manipulation, Import Export
  • Data cleaning, transformation and manipulation
  • Data Visualization

Machine Learning (3 days)

Introduction to Applied Machine Learning

  • Statistical learning vs. Machine learning
  • Supervised vs Unsupervised learning
  • Machine learning algorithms
  • Classification, Prediction, Association, Segmentation, Outlier detection,
  • Train and Test samples
  • Model Validation
  • Bias-Variance trade-off

Machine Learning with Python

  • Choice of libraries
  • Add-on tools
  • IPython Notebooks

Data preparation

  • Missing observations
  • Outlier observations
  • Standarization and normalization
  • Binarization

Forecasting continuous data

  • Linear regression
  • Generalizations, Nonlinearity and Regularization
  • Regression trees
  • Model validation

Classification

  • Bayesian refresher
  • Naive Bayes
  • Logistic regression
  • K-Nearest neighbours
  • Decision trees
  • Support Vector Machines
  • Neural networks

Validation of Machine learning algorithms

  • Cross-validation approaches
  • Bootstrap
  • Tuning with grid search approach
  • Ensemble learning

Unsupervised Learning

  • Clustering
  • Associations
  • Dimensionality reduction with PCA
  • Factor Analysis

Post Training Support Workshop (1 day)

  • tba