Introduction to R with RMarkdown and Quarto ( rrmarkquabesp | 21 hours )

Prerequisites:

There are no prerequisites for this course.

You don’t need to have used R or any programming language before.

Overview:

R is a tremendously popular and powerful scripting language with a continuously growing user base in the data science community. This course will give you a comprehensive introduction to how to use R in RStudio to read, wrangle and visualise datasets.

RMarkdown and Quarto are tools that allow us to write reports, slide decks and entire websites with the R language. You can create interactive HTML outputs and reports designed for print. This course will introduce you to how to use RMarkdown and Quarto for exploratory data analysis, writing technical reports and the basics of programmatically generated parameterised reports.

This course has been taught since 2018 to both completely new R users and experienced R users. Even advanced users have appreciated the unique way this course introduces and interrogates R syntax and its peculiarities.

The course uses live coding and group activities where we collectively build data visualisations and reports that are uniquely designed for each course intake. You will also be provided with the tutor’s slide deck and all code they write during the course.

Course Outline:

Introduction to R and RStudio

  • What is base R and what are (), [] and {} for in the R language?

  • Getting comfortable with using RStudio for writing and running code

  • How to use RStudio projects to improve the reproducibility and transportability of your code.

Introduction to wrangling with the tidyverse

  • How does the tidyverse benefit us?

  • Using the {readr}, {dplyr} and {tidyr} packages for data wrangling

  • Working with Excel workbooks via {readxl}

  • Cleaning data with the {janitor}, {lubridate} and {stringr} packages

Introduction to data visualisation with {ggplot2}

  • What are the purposes of aesthetics, geoms, scales and themes in the {ggplot2} grammar of graphics?

  • Using {ggplot2} for quick EDA visualisations

  • Using annotations and {ggtext} to add richness to the stories your dataviz show

  • Creating custom themes for {ggplot2} to mirror your brand identities

  • How to create pixel perfect {ggplot2} charts for inclusion in your RMarkdown and Quarto reports

What are RMarkdown and Quarto?

RMarkdown has been an established tool in the R community since 2016. It’s used to create reports, presentations, entire websites and books. For instance, this book on Text Mining with R.

Quarto is the future of RMarkdown. It expands the reach of RMarkdown to include users of Python, Julia and JavaScript. Meaning that projects can be collaborated on by users of multiple programming languages, and likely multiple teams in your organization.

Quarto is considered stable as of 2022 but still has some room to grow for full feature parity with RMarkdown. This section of the course will explain the benefits and differences between the two tools. Assume that everything you’ll learn about RMarkdown you will know how to replicate in Quarto or an alternative will be provided.

Doing exploratory data analysis with RMarkdown

  • Using RMarkdown as a literate programming environment (like Jupyter notebooks)

  • Basics of Markdown syntax for building up the story of your analysis

  • Adding and running code chunks to RMarkdown for running code

Creating interactive HTML content with RMarkdown

  • Creating HTML simple HTML reports

  • Creating HTML slides with {xaringan} in RMarkdown

  • Creating HTML slides with revealjs in Quarto

  • Customising the appearance of HTML output types with CSS

Creating print quality reports with RMarkdown

  • Using {pagedown} to create paginated reports

  • Using CSS to control page breaking in {pagedown}

  • Using RMarkdown to generate MS Word documents

Making many reports with RMarkdown

In this section of the course we will look at how to programmatically generate multiple reports with RMarkdown and Quarto. Here are two use cases:

  • You have data files about several regions (eg all US states) and need to create a report for each region. Your report template would pull in the relevant data for each region and generate a pdf or HTML file for each region.

  • You need to generate a report on a schedule, eg every quarter. The data is added to Excel files, Google Sheets or databases. Your report template will be designed to read in data from the appropriate time period. This could be programmatically scheduled with several different tools.

In order to achieve this we will also introduce the {purrr} package to aid the process of programmatically doing many things in R. These skills are transferable to other tasks, for instance saving {ggplot2} charts for each region/year or reading in many data files.