1 Introduction

This is the material for a 9-hours R workshop appended to ISDS7510. Most of the R trainings would probably start providing a solid basis of base R before moving forward, and they do so for many good reasons. Instead, we cover only a handful of base R concepts that are stricly necessary to create basic reports in R without going too deep into the R caveats. Nevertheless, understanding the “boring” stuff is a necessary and inevitable pain that we need to face. The first chapter Programming Concepts provides an overview of base R concepts that you will need for this workshop. If you want to know more on the topic, check out Advanced R by Hadley Wickham. The second topic we cover is Data Visualization. Because we focus on the tidyverse suite for data science, we overlook plotting functions in base R and move directly to ggplot2. We will learn that ggplot2 works at best on normalized datasets, but dataset do not always come in such a tidy/normalized format. Thus, the second package we will look at is tidyr for data wrangling. Finally, we will look at dplyr, a package for data manipulation on both local and remote machines. First, we fimiliarize with dplyr functions, and then on how to set up a connection for querying a MySQL database.

To delve further into specific topics in R, visit the RBookdown library.

1.1 Install the packages for this workshop

All the packages you need for this workshop are included (together with others) in the tidyverse suite for data science:

You can either install them one by one, or install the whole tidyverse meta-package:

install.packages('tidyverse')
library(tidyverse)

1.2 Managing your work with Projects and R Markdown

Organizing your work in R requires attention to data management and reporting. Fortunately the RStudio IDE (Integrated Development Environment) provides a set of tools for managing projects in a cohesive fashion. In this section we discuss how Projects work in RStudio for managing your work and the RMarkdown authoring format for creating reports. Chapter2 describes the fundamentals of RMarkdown more in detail, while the rest of this chapter we describes how to start with your first project in RStudio.

1.3 Creating a new project in RStudio

RStudio implements the project structure in order to help you segregate work on separate analyses. On the surface a project looks like a directory on your local machine. But a project is really “dividing your work into multiple contexts, each with their own working directory, workspace, history, and source documents”1. Creating a project is straightforward, find the button on the upper right corner, select new project and follow the prompts. This arrangement enables you to switch between different projects and continue working from where you left of each time, as if that was the only project you had going in RStudio.

1.4 Where to seek help

The main tool available for seeking other R users’ help is stackoverflow.com. To post on stackoverflow, make sure your question is framed appriopriately as a reproducible example, and use relevant tags when posting.