An interest in Data Analytics will entitle the students to undertake the online training. Completion of the online training will entitle attendance at the two day weekend Workshop.
It is understood that the majority of the audience may not have any prior experience in this learning space.
A bespoke Data Analytics workshop for up to 40 MBA students at Cambridge University.
Pre-workshop preparatory materials
Students will be provided with access to a guided tutorial notebook to complete prior to the 2 day workshop.
Importing data
- pandas and numpy
- reading in data from files and the web
- reading in financial data from online sources
Data wrangling
- numpy arrays
- common matrix operations
- optimizing performance by avoiding loops
- sorting
- filtering
- aggregating
- working with time series data
- imputation of missing data
Exploratory data analysis
- cross-tabs
- graphics in matplotlib and seaborn
Model development and evaluation
- scikit-learn for modelling
- regression models
Workshop outline
A two day workshop (2 - 3 May 2020) delivered on site at Cambridge University.
Day 1
- Data visualization
- Supervised learning: classification and regression
- Trade-off between good model fit and overfitting
- Logistic Regression as a classifier
- The Support Vector Machine
- K-Nearest Neighbours
- Evaluating model performance
- Introduction to neural networks in Keras
- Case study: use of classification methods in fraud detection
Day 2
- Decision trees
- Random Forest
- Unsupervised Learning
- Principal components analysis
- K-means clustering
- Case study: market segmentation