Course Code: rprogbspk
Duration: 12 hours
Overview:

General objective of the training:

Enabling participants to use R for data analysis (Data loading, data quality checks, descriptive statistics) and data visualisation. Moreover, the training should include an introduction to code and version controls structures, including functions, interaction with folder structures and github.

Course Outline:

Data analysis:

  • [1hr] Two common data formats: tibble data frame and tidy data (tibble and tidyr packages):
    • two formats of tabular data: long format and wide format and conversion between these formats,
    • representing multi-dimensional user data in two-dimensional data structures, data dimension reduction, grouped data and nesting.
  • [2hrs] Data manipulation with dplyr:
    • filter, mutate, select,
    • aggregation and summarizing functions,
    • window functions: ranking and ordering functions, offsets, cumulative aggregates.
  • [45mins] Best practises for data quality checks:
    • identify missing values, outliers, duplicates, consistency.
  • [1hr] Basic descriptive statistical analysis
    • Central tendency measures, dispersion measures, correlation, frequency tables. 
  • [45mins] Working with time-series data in R (hourly data, panel data).

Data visualisation (mostly for descriptive statistical analysis)

  • [2hrs] ggplot2 R package
    • layers, and basic selected functions
    • plotting single and multiple series, 
    • plotting categorical data. 

Code management and automating analysis

  • [1hr] Introduction to functions in R
  • [1hr] Best practises for automating figure creating, e.g. the process of defining where to get data, selecting a certain subcategory, calling the relevant functions, storing the data and plots
  • [30mins] Code styling and documentation.
  • [1hr] Introduction to GitHub and version management.

Time left for Q&A