Course Code:
bsprsa
Duration:
21 hours
Course Outline:
Day One
Introduction to R & RStudio
- A first R program
- Rstudio
- Other editors
- Getting Help in R
Importing/Exporting Data
- Flat files – txt, csv
- Spreadsheet files – xls, xlsx
- SPSS, SAS and other formats data
- Accessing data from SQL data sources
- SQL database connectivity and operations
Organising Data
- Data types and classes
- Data storage in R – Rdata format
- Objects structure
- Numbers and vectors
- Matrix and table
- Factors
- Lists
- Data Frames
- Date and time
Tabular Representation
- Overview of packages for data tables – dplyr, tidyr, data.table
- Indexes and subscripts
- Selecting, subsetting observations and variables
- Filtering, grouping
- Recoding transformations
- Reshaping data
- Merging data
- Character manipulation, stringr package
- Regular expressions
Day Two
R and Statistics
- Probability and Normal Distribution
- Random numbers
- Descriptive Statistics
- Standardization and Normalization
- t-distribution
- Chi-Square Distribution
- Confidence Intervals
- Hypothesis Testing, parametric vs. non parametric tests
- t-tests
- F-tests
- ANOVA
Linear Regression
- Correlation coefficient and interpretation
- Wilkinson-Rogers formula notation
- Simple and multiple linear regression
- Estimation methods – Least squares
- Model validation – tests for violation of assumptions
- Logistic regression
Graphical Procedures
- Plots for 1, 2 and more variables
- QQ-Plots
- Exporting plots to png, pdf and jpeg files
- ggplot2
Project Organisation
- Data & other Artefacts
- Folder-Structure
- Versioning
Day Three
- ANOVA revisisted
- Normality Tests
- Mixed-Effect Models & Nested Analysis
- Variance Component Analysis
- R-Package VCA
- Visualization of Variability
- Outlier Detection
- R-Package STB
- VCA-Models
- ANOVA, MINQUE
- REML, ML
- VCA Inference
- Confidence intervals for Variance Components