Course Code: dsmlpython
Duration: 28 hours
Prerequisites:
  • Basic programming experience in any language (python preferred)

Audience

  • Anyone who wants to learn Data Science
Course Outline:

Python

  • Intro to Python
  • Jupyter Notebooks
  • Numpy
  • Pandas
  • Matplotlib, Seaborn, Plotly, Visdom

Discovery?

  • Data Preparation
  • Model Planning
  • Model Building
  • Operationalization

Machine Learning

  • Inferential and Descriptive Statistics
  • Regression
  • Classification
  • Using scikit-learn library
  • Supervised and Unsupervised Learning Algorithms
    • Naive Bayes
    • K-Means
    • Logistic Regression
    • Support Vector Machines
    • Neural Networks
    • Decision Trees
    • Random Forest
    • Ensemble methods
  • Build, Train and Deploy models
  • Inference

KNIME

  • Installation
  • Starting and customizing KNIME analytics platform
  • Nodes, Data and Workflows
  • The Data Science Cycle
  • Hands on Examples
    • Disease tagging
    • Risk Information Extraction

Introduction to AWS and Hadoop

Examples:

There will be examples using all the machine learning models and some practice questions for numpy and pandas.

Examples:
- Titanic Survival Exploration - Covers numpy, pandas, matplotlib, scikit-learn
- Spam Email Classifier - Naive Bayes Algorithm
- Bike Share Analysis - ML based project
- K Means Clustering Project

We will have 2 case studies (on day 2 and day 3, respectively)

Case studies:
- Drug property prediction using ML - Uses Machine Learning
- One study specific to what the client wants