Chapter 5 R Basics

This unit introduces you to the R programming language, and the tools we use to program in R. We will explore:

  1. What is R?, a brief introduction to the R language;
  2. R Studio, a tour of the Interactive Development Environment RStudio;
  3. R Packages, extra tools and functionalities;
  4. R Markdown, a type of R script file we’ll be working with in this class.

5.1 What is R?

R is a versatile, open source programming and scripting language that’s useful both for statistics and data science. It’s inspired by the programming language S. Some of its best features are:

  • It’s free, open source, and available on every major platform. As a result, if you do your analysis in R, most people can easily replicate it.

  • It contains massive set of packages for statistical modelling, machine learning, visualisation, and importing and manipulating data. Over 14,000 packages are available as of August 2019. Whatever model or graphic you’re trying to do, chances are that someone has already tried to do it (and a package for it exists).

  • It’s designed for statistics and data analysis, but also general-purpose programming.

  • It’s an Interactive Development Environment tailored to the needs of interactive data analysis and statistical programming.

  • It has powerful tools for communicating your results. R packages make it easy to produce html or pdf reports, or create interactive websites.

  • A large and growing community of peers.

R also has a number of shortcomings:

  • It has a steeper learning curve than SPSS or Stata.

  • R is not a particularly fast programming language, and poorly written R code can be terribly slow. R is also a profligate user of memory.

  • Much of the R code you’ll see in the wild is written in haste to solve a pressing problem. As a result, code is not very elegant, fast, or easy to understand. Most users do not revise their code to address these shortcomings.

  • Inconsistency is rife across contributed packages, even within base R. You are confronted with over 20 years of evolution every time you use R. Learning R can be tough because there are many special cases to remember.

5.2 R Studio

Throughout this class, we will assume that you are using R via RStudio. First time users often confuse the two. At its simplest R is like a car’s engine while RStudio is like a car’s dashboard.

More precisely, R is a programming language that runs computations, while RStudio is an integrated development environment (IDE) that provides an interface with many convenient features and tools. So just as the way of having access to a speedometer, rear-view mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well.

RStudio includes a console, syntax-highlighting code editor, as well as tools for plotting, history, debugging and workspace management. It’s also free and open-source. Yay!

NB: We don’t have to use RStudio to use R. For example, we write R code in a plain text editor (like textedit or notepad), and then execute the sript using the shell (e.g. terminal in Mac). But this is not ideal.

After you install R and RStudio on your computer, you’ll have two new applications you can open. We’ll always work in RStudio– not in the R application.

After you open RStudio, you should see something similar to this:

5.2.1 Console

There are two main ways of interacting with R: using the console or by using the script editor.

The console window (in RStudio, the bottom left panel) is the place where R is waiting for you to tell it what to do, and where it will show the results of a command.

You can type commands directly into the console, but they will be forgotten when you close the session. Try it out now.

> 2 + 2

If R is ready to accept commands, the R console shows a > prompt. If it receives a command (by typing, copy-pasting or sent from the script editor using Ctrl-Enter), R will try to execute it, and when ready, show the results and come back with a new >-prompt to wait for new commands.

If R is still waiting for you to enter more data because it isn’t complete yet, the console will show a + prompt. It means that you haven’t finished entering a complete command. This happens when you have not ‘closed’ a parenthesis or quotation. If you’re in RStudio and this happens, click inside the console window and press Esc; this should help get you out of trouble.

> "This is an incomplete quote
+

More Console Features

  1. Retrieving previous commands: As you work with R you’ll often want to re-execute a command which you previously entered. Recall previous commands using the up and down arrow keys.

  2. Console title bar: This screenshot illustrates a few additional capabilities provided by the Console title bar:

    • Display of the current working directory.
    • The ability to interrupt R during a long computation.
    • Minimizing and maximizing the Console in relation to the Source pane (using the buttons at the top-right or by double-clicking the title bar).

5.2.2 Scripts

It’s better practice to enter commands in the script editor, and save the script. This way, you have a complete record of what you did, you can easily show others how you did it and you can do it again later on if needed. Open it up either by clicking the File menu, and selecting New File, then R script, or using the keyboard shortcut Cmd/Ctrl + Shift + N. Now you’ll see four panes.

The script editor is a great place to put code you care about. Keep experimenting in the console, but once you have written code that works and does what you want, put it in the script editor.

RStudio will automatically save the contents of the editor when you quit RStudio, and will automatically load it when you re-open. Nevertheless, it’s a good idea to save your scripts regularly and to back them up.

5.2.3 Running code

While you certainly can copy-paste code you’d like to run from the editor into the console, this workflow is pretty inefficient. The key to using the script editor effectively is to memorize one of the most important keyboard shortcuts in RStudio: Cmd/Ctrl + Enter. This executes the current R expression from the script editor in the console.

For example, take the code below. If your cursor is somewhere on the first line, pressing Cmd/Ctrl + Enter will run the complete command that generates dems. It will also move the cursor to the next statement (beginning with reps). That makes it easy to run your complete script by repeatedly pressing Cmd/Ctrl + Enter.

dems <- (55 + 70) * 1.3

reps <- (20 - 1) / 2

Instead of running expression-by-expression, you can also execute the complete script in one step: Cmd/Ctrl + Shift + S. Doing this regularly is a great way to check that you’ve captured all the important parts of your code in the script.

5.2.4 Comments

Use # signs to add comments within your code chunks, and you are encouraged to regularly comment within your code. Anything to the right of a # is ignored by R. Each line of a comment needs to begin with a #.

# This is a comment.
# This is another line of comments.

5.2.5 Diagnostics and errors

The script editor will also highlight syntax errors with a red squiggly line and a cross in the sidebar:

You can hover over the cross to see what the problem is:

If you try to execute the code, you’ll see an error in the console.

When errors happen, your code is haulted – meaning it’s never executed. Errors can be frustrating in R, but with practice you’ll be able to debug your code quickly.

5.2.6 Errors, Messages, and Warnings

One thing that intimidates new R and RStudio users is how it reports errors, warnings, and messages. R reports errors, warnings, and messages in a glaring font, which makes it seem like it is scolding you. However, seeing red text in the console is not always bad.

  1. Errors: When the text is a legitimate error, it will be prefaced with “Error:” and try to explain what went wrong. Generally when there’s an error, the code will not run. Think of errors as a red traffic light: something is wrong!

  2. Warnings: When the text is a warning, it will be prefaced with “Warning:” and R will try to explain why there’s a warning. Generally your code will still work, but perhaps not in the way you would expect. Think of warnings as a yellow traffic light: everything is working fine, but watch out/pay attention.

  3. Messages: When the text doesn’t start with either “Error” or “Warning”, it’s just a friendly message. These are helpful diagnostic messages and they don’t stop your code from working. Think of messages as a green traffic light: everything is working fine and keep on going!

5.2.7 R Environment

Turn your attention to the upper right pane. This pane displays your “global environment”, and it contains the data objects you have saved in your current session. Notice that we have the two objects created earlier, dems, and reps, along with their values.

You can list all objects in your current environment by running:

ls()
#> [1] "dems" "reps"

Sometimes we want to remove objects that we no longer need.

x <- 5
rm(x)

If you want to remove all objects from your current environment, you can run:

rm(list = ls())

5.3 R Packages

The best part about R are its user-contributed packages (also called “libraries”). A package is a collection of functions (and sometimes data) that can be used by other programers.

A good analogy for R packages is they are like apps you can download onto a mobile phone:

So R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn’t have everything. R packages are like the apps you can download onto your phone from Apple’s App Store or Android’s Google Play.

Let’s continue this analogy by considering the Instagram app for editing and sharing pictures. Say you have purchased a new phone and you would like to share a photo you have just taken with friends and family on Instagram. You need to:

  1. Install the app: Since your phone is new and does not include the Instagram app, you need to download the app from either the App Store or Google Play. You do this once and you’re set for the time being. You might need to do this again in the future when there is an update to the app.
  2. Open the app: After you’ve installed Instagram, you need to open the app.

The process is very similar for using an R package. You need to:

  1. Install the package: This is like installing an app on your phone. Most packages are not installed by default when you install R and RStudio. Thus if you want to use a package for the first time, you need to install it first. Once you’ve installed a package, you likely won’t install it again unless you want to update it to a newer version.
  2. “Load” the package: Loading a package is like opening an app on your phone. Packages are not loaded by default when you start RStudio on your computer; you need to load each package you want to use every time you start RStudio.

5.3.1 Installing Packages

First, we download the package from one of the CRAN mirrors onto our computer. For this we use install.packages("package-name"). If you have not set a preferred CRAN mirror in your options(), a menu will pop up asking you to choose a location.

Let’s download the package dplyr.

install.packages("dplyr")

If you run into errors later in the course about a function or package not being found, run the install.packages function to make sure the package is actually installed.

Important: Once we download the package, we never need to run install.packages again (unless we get a new computer.)

5.3.2 Loading Packages

Once we download the package, we need to load it into our session to use it. This is required at the beginning of each R session. This step is necessary because if we automatically loaded every package we have ever downloaded, our computer would fry.

library(dplyr)

The message tells you which functions from dplyr conflict with functions in base R (or from other packages you might have loaded).

5.3.3 Challenges

Let’s go ahead and download some core, important packages we’ll use for the rest of the course. Download (if you haven’t done so already) and load the following packages:

  • tidyverse
  • rmarkdown
  • knitr
  • gapminder
  • devtools
  • stargazer
  • rtweet
  • kableExtra

5.4 R Markdown

Throughout this course, we’ll be using R Markdown for lecture notes and homework assignments. R Markdown documents combine executable code, results, and prose commentary into one document. Think of an R Markdown files as a modern day lab notebook, where you can capture not only what you did, but also what you were thinking.

The filename of an R Markdown document should end in .Rmd or .rmd. They can also be converted to an output format, like PDF, HTML, slideshows, Word files, and more.

R Markdown documents contain three important types of content:

  1. An (optional) YAML header surrounded by ---s.
  2. Chunks of R code surrounded by ```.
  3. Text mixed with simple text formatting like # heading and _italics_.

5.4.1 YAML header

YAML stands for “yet another markup language”. R Markdown uses it to control many details of the output.

---
title: "Homework 1"
author: "Rochelle Terman"
date: "Fall 2019"
output: html_document
---

In this example, we specified the document’s title, author, and date; we also specified that we want it to eventually convert it into an HTML document.

5.4.2 Markdown

Prose in .Rmd files is written in Markdown, a lightweight set of conventions for formatting plain text files. Markdown is designed to be easy to read and easy to write. It is also very easy to learn. The guide below shows how to use Pandoc’s Markdown, a slightly extended version of Markdown that R Markdown understands.

Text formatting 
------------------------------------------------------------

*italic*  or _italic_
**bold**   __bold__
`code`
superscript^2^ and subscript~2~

Headings
------------------------------------------------------------

# 1st Level Header

## 2nd Level Header

### 3rd Level Header

Lists
------------------------------------------------------------

*   Bulleted list item 1

*   Item 2

    * Item 2a

    * Item 2b

1.  Numbered list item 1

1.  Item 2. The numbers are incremented automatically in the output.

Links and images
------------------------------------------------------------

<http://example.com>

[linked phrase](http://example.com)

![optional caption text](path/to/img.png)

Tables 
------------------------------------------------------------

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell

The best way to learn these is simply to try them out. It will take a few days, but soon they will become second nature, and you won’t need to think about them. If you forget, you can get to a handy reference sheet with Help > Markdown Quick Reference.

5.4.3 Code Chunks

To run code inside an R Markdown document, you do it inside a “chunk”. Think of a chunk like a step in a larger process. A chunk should be relatively self-contained, and focused around a single task.

Chunks begin with a header which consists of ```{r, followed by an optional chunk name, followed by comma separated options, followed by }. Next comes your R code and the chunk end is indicated by a final ```.

You can continue to run the code using the keyboard shortcut that we learned earlier: Cmd/Ctrl + Enter. You can also run the entire chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Cmd/Ctrl + Shift + Enter.

RStudio executes the code and displays the results inline with the code:

5.4.4 Knitting

To produce a complete report containing all text, code, and results, click the “Knit” button at the top of the script editor (looks like a ball of yarn) or press Cmd/Ctrl + Shift + K. This will display the report in the viewer pane, and create a self-contained HTML file that you can share with others. The .html file is written in the same directory as your .Rmd file.

5.4.5 Cheatsheets and Other Resources

When working in RStudio, you can find an R Markdown cheatsheet by going to Help > Cheatsheets > R Markdown Cheat Sheet.

A helpful overview of R Markdown can also be found in R for Data Science

A deep dive into R Markdown can be found here

5.4.6 Challenges

Challenge 1.

Create a new R Markdown document with File > New File > R Markdown… Read the instructions. Practice running the chunks.

Now add some new markdown. Try adding some first-, second-, and third-level headers. Insert a link to a website.

Challenge 2.

In the first code chunk, modify cars to mtcars. Re-run the chunk, and see modified output.

Challenge 3.

Knit the document into an PDF file. Verify that you can modify the input and see the output update.

Acknowledgments

This page is in part derived from the following sources:

  1. R for Data Science licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0