Chapter 7 Data Classes and Structures

To make the best use of the R language, you’ll need a strong understanding of basic data structures, and how to operate on them.

This is critical to understand because these are the objects you will manipulate on a day-to-day basis in R. But they are not always as easy to work with as they seem at the outset. Dealing with object types and conversions is one of the most common sources of frustration for beginners.

R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the five data types most often used in data analysis:

Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Dataframe
nd Array

Each data structure has its own specifications and behavior. In the rest of this chapter, we will cover the types of data objects that exist in R and their attributes.

  1. Vectors
  2. Lists
  3. Factors
  4. Matrices
  5. Dataframes

7.1 Vectors

Let’s start with one-dimensional (1d) objects. There are two kinds:

  1. Atomic vectors - also called, simply, vectors.
  2. Lists: Lists are distinct from atomic vectors because lists can contain other lists.

We’ll discuss atomic vectors first.

7.1.1 Creating Vectors

Vectors are 1-dimensional chains of values. We call each value an element of a vector.

Atomic vectors are usually created with c(), which is short for ‘combine’:

x <- c(1, 2, 3)
x
#> [1] 1 2 3
length(x)
#> [1] 3

We can also add elements to the end of a vector by passing the original vector into the c function, like so:

z <- c("Beyonce", "Kelly", "Michelle", "LeToya")
z <- c(z, "Farrah")
z
#> [1] "Beyonce"  "Kelly"    "Michelle" "LeToya"   "Farrah"

Notice that vectors are always flat, even if you nest c()’s:

# these are equivalent
c(1, c(2, c(3, 4)))
#> [1] 1 2 3 4
c(1, 2, 3, 4)
#> [1] 1 2 3 4

7.1.2 Naming a Vector

We can also attach names to our vector. This helps us understand what each element refers to.

You can give a name to the elements of a vector with the names() function. Have a look at this example:

days_month <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
names(days_month) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

days_month
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
#>  31  28  31  30  31  30  31  31  30  31  30  31

You can name a vector when you create it:

some_vector <- c(name = "Rochelle Terman", profession = "Professor Extraordinaire")
some_vector
#>                       name                 profession 
#>          "Rochelle Terman" "Professor Extraordinaire"

Notice that in the first case, we surrounded each name with quotation marks. But we don’t have to do this when creating a named vector.

Names don’t have to be unique, and not all values need to have a name associated. However, names are most useful for subsetting, described in the next chapter. When subsetting, it is most useful when the names are unique.

7.1.3 Calculations on Vectors

One of the most powerful things about vectors is that we can perform arithmetic calculations on them.

For example, we can sum up all the values in a numerical vector using sum:

a <- c(1, -2, 3)
sum(a)
#> [1] 2

We can also sum two vectors. It is important to know that if you sum two vectors in R, it takes the element-wise sum. For example, the following three statements are completely equivalent:

c(1, 2, 3) + c(4, 5, 6)
c(1 + 4, 2 + 5, 3 + 6)
c(5, 7, 9)

7.1.4 Types of Vectors

So there are there are four common types of vectors, depending on the class: * logical * integer * numeric (same as double) * character.

Logical Vectors

Logical vectors take on one of three possible values:

  1. TRUE
  2. FALSE
  3. NA (missing value)
c(TRUE, TRUE, FALSE, NA)
#> [1]  TRUE  TRUE FALSE    NA

Numeric Vectors

Numeric vectors contain numbers. They can be stored as integers (whole numbers) or doubles (numbers with decimal points). In practice, you rarely need to concern yourself with this difference, but just know that they are different but related things.

c(1, 2, 335)
#> [1]   1   2 335
c(4.2, 4, 6, 53.2)
#> [1]  4.2  4.0  6.0 53.2

Character Vectors

Character vectors contain character (or ‘string’) values. Note that each value has to be surrounded by quotation marks before the comma.

c("Beyonce", "Kelly", "Michelle", "LeToya")
#> [1] "Beyonce"  "Kelly"    "Michelle" "LeToya"

7.1.5 Coercion

We can change or convert a vector’s type using as.....

num_var <- c(1, 2.5, 4.5)
class(num_var)
#> [1] "numeric"
as.character(num_var)
#> [1] "1"   "2.5" "4.5"

Remember that all elements of a vector must be the same type. So when you attempt to combine different types, they will be coerced to the most “flexible” type.

For example, combining a character and an integer yields a character:

c("a", 1)
#> [1] "a" "1"

Guess what the following do without running them first:

c(1.7, "a") 
c(TRUE, 2) 
c("a", TRUE) 

TRUE == 1 and FALSE == 0.

Notice that when a logical vector is coerced to an integer or double, TRUE becomes 1 and FALSE becomes 0. This is very useful in conjunction with sum() and mean()

x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#> [1] 0 0 1

# Total number of TRUEs
sum(x)
#> [1] 1

# Proportion that are TRUE
mean(x)
#> [1] 0.333

Coercion often happens automatically.

This is called implicit coercion. Most mathematical functions (+, log, abs, etc.) will coerce to a double or integer, and most logical operations (&, |, any, etc) will coerce to a logical. You will usually get a warning message if the coercion might lose information.

1 < "2"
#> [1] TRUE
"1" > 2
#> [1] FALSE

Sometimes coercions, especially nonsensical ones, won’t work.

x <- c("a", "b", "c")
as.numeric(x)
#> Warning: NAs introduced by coercion
#> [1] NA NA NA
as.logical(x)
#> [1] NA NA NA

7.1.6 Challenges

Challenge 1: Create and examine your vector

Create a character vector called fruit that contain 4 of your favorite fruits. Then evaluate its structure using the commands below.


# First create your fruit vector 
# YOUR CODE HERE


# Examine your vector
length(fruit)
class(fruit)
str(fruit)

Challenge 2: Coercion


# 1. Create a vector of a sequence of numbers between 1 to 10.

# 2. Coerce that vector into a character vector

# 3. Add the element "11" to the end of the vector

# 4. Coerce it back to a numeric vector.

Challenge 3: Calculations on Vectors

Create a vector of the numbers 11-20, and multiply it by the original vector from Challenge 2.

7.2 Lists

Lists are different from vectors because their elements can be of any type. Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from vectors.

7.2.1 Creating Lists

You construct lists by using list() instead of c():

x <- list(1, "a", TRUE, c(4, 5, 6))
x
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "a"
#> 
#> [[3]]
#> [1] TRUE
#> 
#> [[4]]
#> [1] 4 5 6

7.2.2 Naming Lists

As with vectors, we can attach names to each element on our list:

my_list <- list(name1 = elem1, 
                name2 = elem2)

This creates a list with components that are named name1, name2, and so on. If you want to name your lists after you’ve created them, you can use the names() function as you did with vectors. The following commands are fully equivalent to the assignment above:

my_list <- list(elem1, elem2)
names(my_list) <- c("name1", "name2")

7.2.3 List Structure

A very useful tool for working with lists is str() because it focuses on reviewing the structure of a list, not the contents.

x <- list(a = c(1, 2, 3),
          b = c("Hello", "there"),
          c = 1:10)
str(x)
#> List of 3
#>  $ a: num [1:3] 1 2 3
#>  $ b: chr [1:2] "Hello" "there"
#>  $ c: int [1:10] 1 2 3 4 5 6 7 8 9 10

A list does not print to the console like a vector. Instead, each element of the list starts on a new line.

x.vec <- c(1,2,3)
x.list <- list(1,2,3)
x.vec
#> [1] 1 2 3
x.list
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3

Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists:

head(mtcars)
#>                    mpg cyl disp  hp drat   wt qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.62 16.5  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.88 17.0  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.32 18.6  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.21 19.4  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.0  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.46 20.2  1  0    3    1
is.list(mtcars)
#> [1] TRUE
mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
#> [1] TRUE

You could say that a list is some kind super data type: you can store practically any piece of information in it!

For this reason, lists are extremely useful inside functions. You can “staple” together lots of different kinds of results into a single object that a function can return.

mod <- lm(mpg ~ wt, data = mtcars)
str(mod)
#> List of 12
#>  $ coefficients : Named num [1:2] 37.29 -5.34
#>   ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
#>  $ residuals    : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
#>   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>  $ effects      : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
#>   ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "" "" ...
#>  $ rank         : int 2
#>  $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
#>   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>  $ assign       : int [1:2] 0 1
#>  $ qr           :List of 5
#>   ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>   .. .. ..$ : chr [1:2] "(Intercept)" "wt"
#>   .. ..- attr(*, "assign")= int [1:2] 0 1
#>   ..$ qraux: num [1:2] 1.18 1.05
#>   ..$ pivot: int [1:2] 1 2
#>   ..$ tol  : num 1e-07
#>   ..$ rank : int 2
#>   ..- attr(*, "class")= chr "qr"
#>  $ df.residual  : int 30
#>  $ xlevels      : Named list()
#>  $ call         : language lm(formula = mpg ~ wt, data = mtcars)
#>  $ terms        :Classes 'terms', 'formula'  language mpg ~ wt
#>   .. ..- attr(*, "variables")= language list(mpg, wt)
#>   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:2] "mpg" "wt"
#>   .. .. .. ..$ : chr "wt"
#>   .. ..- attr(*, "term.labels")= chr "wt"
#>   .. ..- attr(*, "order")= int 1
#>   .. ..- attr(*, "intercept")= int 1
#>   .. ..- attr(*, "response")= int 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "predvars")= language list(mpg, wt)
#>   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#>   .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
#>  $ model        :'data.frame':   32 obs. of  2 variables:
#>   ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>   ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#>   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ wt
#>   .. .. ..- attr(*, "variables")= language list(mpg, wt)
#>   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
#>   .. .. .. .. ..$ : chr "wt"
#>   .. .. ..- attr(*, "term.labels")= chr "wt"
#>   .. .. ..- attr(*, "order")= int 1
#>   .. .. ..- attr(*, "intercept")= int 1
#>   .. .. ..- attr(*, "response")= int 1
#>   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. .. ..- attr(*, "predvars")= language list(mpg, wt)
#>   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#>   .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
#>  - attr(*, "class")= chr "lm"

7.2.4 Challenges

Challenge 1.

What are the four basic types of atomic vectors? How does a list differ from an atomic vector?

Challenge 2.

Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?

Challenge 3.

Create three vectors and combine them into a list. Assign them names.

Challenge 4.

If x is a list, what is the class of x[1]? How about x[[1]]?

7.3 Factors

Factors are special vectors that represent categorical data: variables that have a fixed and known set of possible values. Think: Democrat, Republican, Independent; Male, Female, Other; etc.

It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently.

Historically, factors were much easier to work with than characters. As a result, many of the functions in base R automatically convert characters to factors. This means that factors often pop up in places where they’re not actually helpful.

7.3.1 Creating Factors

To create factors in R, you use the function factor(). The first thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. For example, party_vector contains the partyID of 5 different individuals:

party_vector <- c("Rep", "Rep", "Dem", "Rep", "Dem")

It is clear that there are two categories, or in R-terms factor levels, at work here: Dem and Rep.

The function factor() will encode the vector as a factor:

party_factor <- factor(party_vector)
party_vector
#> [1] "Rep" "Rep" "Dem" "Rep" "Dem"
party_factor
#> [1] Rep Rep Dem Rep Dem
#> Levels: Dem Rep

7.3.2 Summarizing a Factor

One of your favorite functions in R will be summary(). This will give you a quick overview of the contents of a variable. Let’s compare using summary() on both the character vector and the factor:

summary(party_vector)
#>    Length     Class      Mode 
#>         5 character character
summary(party_factor)
#> Dem Rep 
#>   2   3

7.3.3 Changing Factor Levels

When you create the factor, the factor levels are set to specific values. We can access those values with the levels() function.

levels(party_factor)
#> [1] "Dem" "Rep"

Any values not in the set of levels will be silently converted to NA. Let’s say we want to add an Independent to our sample:

party_factor[5] <- "Ind"
#> Warning in `[<-.factor`(`*tmp*`, 5, value = "Ind"): invalid factor level,
#> NA generated
party_factor
#> [1] Rep  Rep  Dem  Rep  <NA>
#> Levels: Dem Rep

We first need to add “Ind” to our factor levels. This will allow us to add Independents to our sample:

levels(party_factor)
#> [1] "Dem" "Rep"
levels(party_factor) <- c("Dem", "Rep", "Ind")

party_factor[5] <- "Ind"
party_factor
#> [1] Rep Rep Dem Rep Ind
#> Levels: Dem Rep Ind

7.3.4 Factors are Integers

Factors are pretty much integers that have labels on them. Underneath, it’s really numbers (1, 2, 3…).

str(party_factor)
#>  Factor w/ 3 levels "Dem","Rep","Ind": 2 2 1 2 3

They are better than using simple integer labels because factors are self-describing. For example, democrat and republican are more descriptive than 1s and 2s.

However, factors are NOT characters!!

While factors look (and often behave) like character vectors, they are actually integers. Be careful when treating them like strings.

x <- c("a", "b", "b", "a")
x <- as.factor(x)
c(x, "c")
#> [1] "1" "2" "2" "1" "c"

For this reason, it’s usually best to explicitly convert factors to character vectors if you need string-like behaviour.

x <- c("a", "b", "b", "a")
x <- as.factor(x)
x <- as.character(x)
c(x, "c'")
#> [1] "a"  "b"  "b"  "a"  "c'"

7.3.5 Challenges

Challenge 1.

What happens to a factor when you modify its levels?

f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
f1
#>  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

Challenge 2.

What does this code do? How do f2 and f3 differ from f1?

f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))

7.4 Matrices

Matrices are 2-d vectors. That is, they are a collection of elements of the same data type (numeric, character, or logical), arranged into a fixed number of rows and columns.

By definition, if you want to combine different types of data (one column numbers, another column characters), you want a dataframe, not a matrix.

7.4.1 Creating Matrices

We can create a matrix using the matrix() function. In this function, we assign dimensions to a vector, like this:

m <- matrix(1:6, nrow = 2, ncol = 3)
m
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

Notice that matrices fill column-wise. We can change this using the byrow argument:

m <- matrix(1:6, byrow = T, nrow = 2, ncol = 3)
m
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6

Another way to create matrices is to bind columns or rows using cbind() and rbind().

x <- 1:3
y <- 10:12
cbind(x, y)
#>      x  y
#> [1,] 1 10
#> [2,] 2 11
#> [3,] 3 12
# or 
rbind(x, y)
#>   [,1] [,2] [,3]
#> x    1    2    3
#> y   10   11   12

7.4.2 Matrix Dimensions

Use dim() to find out how many rows or columns are in a matrix (or dataframe)

dim(m)
#> [1] 2 3

We can transpose a matrix (or dataframe) with t()

m <- matrix(1:6, nrow = 2, ncol = 3)
m
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
t(m)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
#> [3,]    5    6

7.4.3 Matrix Names

Just like vectors or lists, we can give matrices names that describe the rows and columns

m <- matrix(1:6, nrow = 2, ncol = 3)

rownames(m) <- c("row1", "row2")
colnames(m) <- c("A", "B", "C")

m
#>      A B C
#> row1 1 3 5
#> row2 2 4 6

7.4.4 Challenge

Take a look at the vector I’ve created about box office sales for the first three Harry Potter movies:

# Box office sales (in millions!)
philosophers_stone <- c(66.1, 317.6, 657.2)
chamber_secrets <- c(54.7, 261.9, 616.9)
prisoner_azkaban <- c(45.6, 249.5, 547.1)

# Vectors region and titles, used for naming
region <- c("UK", "US", "Other")
titles <- c("Philosopher's Stone", "Chamber of Secrets", "Prisoner of Azkaban")

Your challenge is to:

  1. Combine the first three vectors into a matrix
  2. Add names for the matrix’s rows (titles) and columns (region)
  3. Use rowSums() to find the total Worldwide Box Office sales for each movie.

7.5 Dataframes

A dataframe is a very important data type in R. It’s pretty much the de facto data structure for most tabular data and it’s also what we use for statistics.

Let’s say we’re working with the following survey data:

  • ‘Are you married?’ or ‘yes/no’ questions (logical)
  • ‘How old are you?’ (numeric)
  • ‘What is your opinion on Trump?’ or other ‘open-ended’ questions (character)

A matrix won’t work here because the dataset contains different data types.

A dataframe is a 2-dimentional data structure containing heterogeneous data types. Each column is a variable of a dataset, and the rows are observations.

NB: You might have heard of “tibbles,” used in the tidyverse suite of packages. Tibbles are like dataframes 2.0, tweaking some of the behavior of dataframes to make life easier for data anlysis. For now, just think of tibbles and dataframes as the same thing and don’t worry about the difference.

7.5.1 Creating Dataframes

R contains a number of built-in datasets that are stored as dataframes. For example, the mtcars dataset contains information on automobile design and performance for 32 automobiles:

class(mtcars)
#> [1] "data.frame"
head(mtcars)
#>                    mpg cyl disp  hp drat   wt qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.62 16.5  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.88 17.0  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.32 18.6  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.21 19.4  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.44 17.0  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.46 20.2  1  0    3    1

We also create dataframes when we import data through read.csv or other data file input. We’ll talk more about importing data later in the class.

We can create a dataframe from scratch using data.frame(). This function takes vectors as input:

# Definition of vectors
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

planets <- data.frame(name, type, diameter, rings)
planets
#>      name               type diameter rings
#> 1 Mercury Terrestrial planet    0.382 FALSE
#> 2   Venus Terrestrial planet    0.949 FALSE
#> 3   Earth Terrestrial planet    1.000 FALSE
#> 4    Mars Terrestrial planet    0.532 FALSE
#> 5 Jupiter          Gas giant   11.209  TRUE
#> 6  Saturn          Gas giant    9.449  TRUE
#> 7  Uranus          Gas giant    4.007  TRUE
#> 8 Neptune          Gas giant    3.883  TRUE

Beware: data.frame()’s default behaviour turns strings into factors. Use stringAsFactors = FALSE to suppress this behaviour as needed:

planets <- data.frame(name, type, diameter, rings, stringsAsFactors = F)
planets
#>      name               type diameter rings
#> 1 Mercury Terrestrial planet    0.382 FALSE
#> 2   Venus Terrestrial planet    0.949 FALSE
#> 3   Earth Terrestrial planet    1.000 FALSE
#> 4    Mars Terrestrial planet    0.532 FALSE
#> 5 Jupiter          Gas giant   11.209  TRUE
#> 6  Saturn          Gas giant    9.449  TRUE
#> 7  Uranus          Gas giant    4.007  TRUE
#> 8 Neptune          Gas giant    3.883  TRUE

7.5.2 The Structure of Dataframes

Under the hood, a dataframe is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list.

vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)

str(df)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ vec1: int  1 2 3
#>  $ vec2: Factor w/ 3 levels "a","b","c": 1 2 3

The length() of a dataframe is the length of the underlying list and so is the same as ncol(); nrow() gives the number of rows.

vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)

# these two are equivalent - number of columns
length(df)
#> [1] 2
ncol(df)
#> [1] 2

# get number of rows
nrow(df)
#> [1] 3

# get number of both columns and rows
dim(df)
#> [1] 3 2

7.5.3 Naming Dataframes

Like matrices, dataframes have colnames(), and rownames(). However, since dataframes are really lists (of vectors) under the hood names() and colnames() are the same thing.

vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)

# these two are equivalent
names(df)
#> [1] "vec1" "vec2"
colnames(df)
#> [1] "vec1" "vec2"

# change the colnames
colnames(df) <- c("Number", "Character")
df
#>   Number Character
#> 1      1         a
#> 2      2         b
#> 3      3         c

names(df) <- c("Number", "Character")
df
#>   Number Character
#> 1      1         a
#> 2      2         b
#> 3      3         c

# change the rownames
rownames(df) 
#> [1] "1" "2" "3"
rownames(df) <- c("donut", "pickle", "pretzel")
df
#>         Number Character
#> donut        1         a
#> pickle       2         b
#> pretzel      3         c

7.5.4 Coercing Dataframes

Coerce an object to a dataframe with as.data.frame():

  • A vector will create a one-column dataframe.

  • A list will create one column for each element; it’s an error if they’re not all the same length.

  • A matrix will create a data frame with the same number of columns and rows as the matrix.

7.5.5 Challenges

Challenge 1.

Create a 3x2 data frame called basket. The first column should contain the names of 3 fruits. The second column should contain the price of those fruits.

Challenge 2.

Now give your dataframe appropriate column and row names.

Challenge 3.

Add a third column called color, that tells me what color each fruit is.

7.5.6 Quiz

You can check your answers in answers.

  1. How is a list different from an vector?

  2. What are the four common types of vectors?

  3. What are names? How do you get them and set them?

  4. How is a matrix different from a data frame?

7.5.7 Answers

  1. The elements of a list can be any type (even a list); the elements of an atomic vector are all of the same type.

  2. The four common types of vector are logical, integer, double (sometimes called numeric), and character.

  3. Names allow you to attach labels to values. You can get and set individual names with names(x) and names(x) <- c("x", "y", ...).

  4. Every element of a matrix must be the same type; in a data frame, the different columns can have different types.