# Chapter 7 Data Classes and Structures

To make the best use of the R language, you’ll need a strong understanding of basic data structures, and how to operate on them.

This is **critical** to understand because these are the objects you will manipulate on a day-to-day basis in R. But they are not always as easy to work with as they seem at the outset. Dealing with object types and conversions is one of the most common sources of frustration for beginners.

R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the five data types most often used in data analysis:

Homogeneous | Heterogeneous | |
---|---|---|

1d | Atomic vector | List |

2d | Matrix | Dataframe |

nd | Array |

Each data structure has its own specifications and behavior. In the rest of this chapter, we will cover the types of data objects that exist in R and their attributes.

## 7.1 Vectors

Let’s start with one-dimensional (1d) objects. There are two kinds:

**Atomic vectors**- also called, simply,**vectors**.**Lists**: Lists are distinct from atomic vectors because lists can contain other lists.

We’ll discuss **atomic vectors** first.

### 7.1.1 Creating Vectors

Vectors are 1-dimensional chains of values. We call each value an *element* of a vector.

Atomic vectors are usually created with `c()`

, which is short for ‘combine’:

```
x <- c(1, 2, 3)
x
#> [1] 1 2 3
length(x)
#> [1] 3
```

We can also add elements to the end of a vector by passing the original vector into the `c`

function, like so:

```
z <- c("Beyonce", "Kelly", "Michelle", "LeToya")
z <- c(z, "Farrah")
z
#> [1] "Beyonce" "Kelly" "Michelle" "LeToya" "Farrah"
```

Notice that vectors are always flat, even if you nest `c()`

’s:

```
# these are equivalent
c(1, c(2, c(3, 4)))
#> [1] 1 2 3 4
c(1, 2, 3, 4)
#> [1] 1 2 3 4
```

### 7.1.2 Naming a Vector

We can also attach names to our vector. This helps us understand what each element refers to.

You can give a name to the elements of a vector with the `names()`

function. Have a look at this example:

```
days_month <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
names(days_month) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
days_month
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 31 28 31 30 31 30 31 31 30 31 30 31
```

You can name a vector when you create it:

```
some_vector <- c(name = "Rochelle Terman", profession = "Professor Extraordinaire")
some_vector
#> name profession
#> "Rochelle Terman" "Professor Extraordinaire"
```

Notice that in the first case, we surrounded each name with quotation marks. But we don’t have to do this when creating a named vector.

Names don’t have to be unique, and not all values need to have a name associated. However, names are most useful for subsetting, described in the next chapter. When subsetting, it is most useful when the names are unique.

### 7.1.3 Calculations on Vectors

One of the most powerful things about vectors is that we can perform arithmetic calculations on them.

For example, we can sum up all the values in a numerical vector using **sum**:

```
a <- c(1, -2, 3)
sum(a)
#> [1] 2
```

We can also sum two vectors. It is important to know that if you **sum** two vectors in R, it takes the element-wise sum. For example, the following three statements are completely equivalent:

```
c(1, 2, 3) + c(4, 5, 6)
c(1 + 4, 2 + 5, 3 + 6)
c(5, 7, 9)
```

### 7.1.4 Types of Vectors

So there are there are four common types of vectors, depending on the class: * `logical`

* `integer`

* `numeric`

(same as `double`

) * `character`

.

#### Logical Vectors

Logical vectors take on one of three possible values:

`TRUE`

`FALSE`

`NA`

(missing value)

```
c(TRUE, TRUE, FALSE, NA)
#> [1] TRUE TRUE FALSE NA
```

#### Numeric Vectors

Numeric vectors contain numbers. They can be stored as *integers* (whole numbers) or *doubles* (numbers with decimal points). In practice, you rarely need to concern yourself with this difference, but just know that they are different but related things.

```
c(1, 2, 335)
#> [1] 1 2 335
c(4.2, 4, 6, 53.2)
#> [1] 4.2 4.0 6.0 53.2
```

#### Character Vectors

Character vectors contain character (or ‘string’) values. Note that each value has to be surrounded by quotation marks *before* the comma.

```
c("Beyonce", "Kelly", "Michelle", "LeToya")
#> [1] "Beyonce" "Kelly" "Michelle" "LeToya"
```

### 7.1.5 Coercion

We can change or convert a vector’s type using `as....`

.

```
num_var <- c(1, 2.5, 4.5)
class(num_var)
#> [1] "numeric"
as.character(num_var)
#> [1] "1" "2.5" "4.5"
```

Remember that all elements of a vector must be the same type. So when you attempt to combine different types, they will be **coerced** to the most “flexible” type.

For example, combining a character and an integer yields a character:

```
c("a", 1)
#> [1] "a" "1"
```

Guess what the following do without running them first:

```
c(1.7, "a")
c(TRUE, 2)
c("a", TRUE)
```

#### TRUE == 1 and FALSE == 0.

Notice that when a logical vector is coerced to an integer or double, `TRUE`

becomes 1 and `FALSE`

becomes 0. This is very useful in conjunction with `sum()`

and `mean()`

```
x <- c(FALSE, FALSE, TRUE)
as.numeric(x)
#> [1] 0 0 1
# Total number of TRUEs
sum(x)
#> [1] 1
# Proportion that are TRUE
mean(x)
#> [1] 0.333
```

#### Coercion often happens automatically.

This is called *implicit coercion*. Most mathematical functions (`+`

, `log`

, `abs`

, etc.) will coerce to a double or integer, and most logical operations (`&`

, `|`

, `any`

, etc) will coerce to a logical. You will usually get a warning message if the coercion might lose information.

```
1 < "2"
#> [1] TRUE
"1" > 2
#> [1] FALSE
```

Sometimes coercions, especially nonsensical ones, won’t work.

```
x <- c("a", "b", "c")
as.numeric(x)
#> Warning: NAs introduced by coercion
#> [1] NA NA NA
as.logical(x)
#> [1] NA NA NA
```

### 7.1.6 Challenges

#### Challenge 1: Create and examine your vector

Create a character vector called `fruit`

that contain 4 of your favorite fruits. Then evaluate its structure using the commands below.

```
# First create your fruit vector
# YOUR CODE HERE
# Examine your vector
length(fruit)
class(fruit)
str(fruit)
```

#### Challenge 2: Coercion

```
# 1. Create a vector of a sequence of numbers between 1 to 10.
# 2. Coerce that vector into a character vector
# 3. Add the element "11" to the end of the vector
# 4. Coerce it back to a numeric vector.
```

#### Challenge 3: Calculations on Vectors

Create a vector of the numbers 11-20, and multiply it by the original vector from Challenge 2.

## 7.2 Lists

Lists are different from vectors because their elements can be of **any type**. Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from vectors.

### 7.2.1 Creating Lists

You construct lists by using `list()`

instead of `c()`

:

```
x <- list(1, "a", TRUE, c(4, 5, 6))
x
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "a"
#>
#> [[3]]
#> [1] TRUE
#>
#> [[4]]
#> [1] 4 5 6
```

### 7.2.2 Naming Lists

As with vectors, we can attach names to each element on our list:

```
my_list <- list(name1 = elem1,
name2 = elem2)
```

This creates a list with components that are named `name1`

, `name2`

, and so on. If you want to name your lists after you’ve created them, you can use the `names()`

function as you did with vectors. The following commands are fully equivalent to the assignment above:

```
my_list <- list(elem1, elem2)
names(my_list) <- c("name1", "name2")
```

### 7.2.3 List Structure

A very useful tool for working with lists is `str()`

because it focuses on reviewing the structure of a list, not the contents.

```
x <- list(a = c(1, 2, 3),
b = c("Hello", "there"),
c = 1:10)
str(x)
#> List of 3
#> $ a: num [1:3] 1 2 3
#> $ b: chr [1:2] "Hello" "there"
#> $ c: int [1:10] 1 2 3 4 5 6 7 8 9 10
```

A list does not print to the console like a vector. Instead, each element of the list starts on a new line.

```
x.vec <- c(1,2,3)
x.list <- list(1,2,3)
x.vec
#> [1] 1 2 3
x.list
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
```

Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by `lm()`

) are lists:

```
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.62 16.5 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 17.0 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.21 19.4 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
is.list(mtcars)
#> [1] TRUE
mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
#> [1] TRUE
```

You could say that a list is some kind super data type: you can store practically any piece of information in it!

For this reason, lists are extremely useful inside functions. You can “staple” together lots of different kinds of results into a single object that a function can return.

```
mod <- lm(mpg ~ wt, data = mtcars)
str(mod)
#> List of 12
#> $ coefficients : Named num [1:2] 37.29 -5.34
#> ..- attr(*, "names")= chr [1:2] "(Intercept)" "wt"
#> $ residuals : Named num [1:32] -2.28 -0.92 -2.09 1.3 -0.2 ...
#> ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> $ effects : Named num [1:32] -113.65 -29.116 -1.661 1.631 0.111 ...
#> ..- attr(*, "names")= chr [1:32] "(Intercept)" "wt" "" "" ...
#> $ rank : int 2
#> $ fitted.values: Named num [1:32] 23.3 21.9 24.9 20.1 18.9 ...
#> ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> $ assign : int [1:2] 0 1
#> $ qr :List of 5
#> ..$ qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#> .. .. ..$ : chr [1:2] "(Intercept)" "wt"
#> .. ..- attr(*, "assign")= int [1:2] 0 1
#> ..$ qraux: num [1:2] 1.18 1.05
#> ..$ pivot: int [1:2] 1 2
#> ..$ tol : num 1e-07
#> ..$ rank : int 2
#> ..- attr(*, "class")= chr "qr"
#> $ df.residual : int 30
#> $ xlevels : Named list()
#> $ call : language lm(formula = mpg ~ wt, data = mtcars)
#> $ terms :Classes 'terms', 'formula' language mpg ~ wt
#> .. ..- attr(*, "variables")= language list(mpg, wt)
#> .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#> .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. ..$ : chr [1:2] "mpg" "wt"
#> .. .. .. ..$ : chr "wt"
#> .. ..- attr(*, "term.labels")= chr "wt"
#> .. ..- attr(*, "order")= int 1
#> .. ..- attr(*, "intercept")= int 1
#> .. ..- attr(*, "response")= int 1
#> .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> .. ..- attr(*, "predvars")= language list(mpg, wt)
#> .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#> .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
#> $ model :'data.frame': 32 obs. of 2 variables:
#> ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#> ..- attr(*, "terms")=Classes 'terms', 'formula' language mpg ~ wt
#> .. .. ..- attr(*, "variables")= language list(mpg, wt)
#> .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : chr [1:2] "mpg" "wt"
#> .. .. .. .. ..$ : chr "wt"
#> .. .. ..- attr(*, "term.labels")= chr "wt"
#> .. .. ..- attr(*, "order")= int 1
#> .. .. ..- attr(*, "intercept")= int 1
#> .. .. ..- attr(*, "response")= int 1
#> .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
#> .. .. ..- attr(*, "predvars")= language list(mpg, wt)
#> .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#> .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "wt"
#> - attr(*, "class")= chr "lm"
```

### 7.2.4 Challenges

#### Challenge 1.

What are the four basic types of atomic vectors? How does a list differ from an atomic vector?

#### Challenge 2.

Why is `1 == "1"`

true? Why is `-1 < FALSE`

true? Why is `"one" < 2`

false?

#### Challenge 3.

Create three vectors and combine them into a list. Assign them names.

#### Challenge 4.

If `x`

is a list, what is the class of `x[1]`

? How about `x[[1]]`

?

## 7.3 Factors

Factors are special vectors that represent *categorical* data: variables that have a fixed and known set of possible values. Think: Democrat, Republican, Independent; Male, Female, Other; etc.

It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently.

Historically, factors were much easier to work with than characters. As a result, many of the functions in base R automatically convert characters to factors. This means that factors often pop up in places where they’re not actually helpful.

### 7.3.1 Creating Factors

To create factors in R, you use the function `factor()`

. The first thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. For example, `party_vector`

contains the partyID of 5 different individuals:

`party_vector <- c("Rep", "Rep", "Dem", "Rep", "Dem")`

It is clear that there are two categories, or in R-terms **factor levels**, at work here: `Dem`

and `Rep`

.

The function `factor()`

will encode the vector as a factor:

```
party_factor <- factor(party_vector)
party_vector
#> [1] "Rep" "Rep" "Dem" "Rep" "Dem"
party_factor
#> [1] Rep Rep Dem Rep Dem
#> Levels: Dem Rep
```

### 7.3.2 Summarizing a Factor

One of your favorite functions in R will be `summary()`

. This will give you a quick overview of the contents of a variable. Let’s compare using `summary()`

on both the character vector and the factor:

```
summary(party_vector)
#> Length Class Mode
#> 5 character character
summary(party_factor)
#> Dem Rep
#> 2 3
```

### 7.3.3 Changing Factor Levels

When you create the factor, the factor levels are set to specific values. We can access those values with the `levels()`

function.

```
levels(party_factor)
#> [1] "Dem" "Rep"
```

Any values *not* in the set of levels will be silently converted to `NA`

. Let’s say we want to add an Independent to our sample:

```
party_factor[5] <- "Ind"
#> Warning in `[<-.factor`(`*tmp*`, 5, value = "Ind"): invalid factor level,
#> NA generated
party_factor
#> [1] Rep Rep Dem Rep <NA>
#> Levels: Dem Rep
```

We first need to add “Ind” to our factor levels. This will allow us to add Independents to our sample:

```
levels(party_factor)
#> [1] "Dem" "Rep"
levels(party_factor) <- c("Dem", "Rep", "Ind")
party_factor[5] <- "Ind"
party_factor
#> [1] Rep Rep Dem Rep Ind
#> Levels: Dem Rep Ind
```

### 7.3.4 Factors are Integers

Factors are pretty much integers that have labels on them. Underneath, it’s really numbers (1, 2, 3…).

```
str(party_factor)
#> Factor w/ 3 levels "Dem","Rep","Ind": 2 2 1 2 3
```

They are better than using simple integer labels because factors are self-describing. For example, `democrat`

and `republican`

are more descriptive than `1`

s and `2`

s.

However, **factors are NOT characters!!**

While factors look (and often behave) like character vectors, they are actually integers. Be careful when treating them like strings.

```
x <- c("a", "b", "b", "a")
x <- as.factor(x)
c(x, "c")
#> [1] "1" "2" "2" "1" "c"
```

For this reason, it’s usually best to explicitly **convert** factors to character vectors if you need string-like behaviour.

```
x <- c("a", "b", "b", "a")
x <- as.factor(x)
x <- as.character(x)
c(x, "c'")
#> [1] "a" "b" "b" "a" "c'"
```

### 7.3.5 Challenges

#### Challenge 1.

What happens to a factor when you modify its levels?

```
f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
f1
#> [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
#> Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
```

#### Challenge 2.

What does this code do? How do `f2`

and `f3`

differ from `f1`

?

```
f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))
```

## 7.4 Matrices

Matrices are 2-d vectors. That is, they are a collection of elements of the same data type (numeric, character, or logical), arranged into a fixed number of rows and columns.

By definition, if you want to combine different types of data (one column numbers, another column characters), you want a **dataframe**, not a matrix.

### 7.4.1 Creating Matrices

We can create a matrix using the `matrix()`

function. In this function, we assign dimensions to a vector, like this:

```
m <- matrix(1:6, nrow = 2, ncol = 3)
m
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
```

Notice that matrices fill column-wise. We can change this using the `byrow`

argument:

```
m <- matrix(1:6, byrow = T, nrow = 2, ncol = 3)
m
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 4 5 6
```

Another way to create matrices is to bind columns or rows using `cbind()`

and `rbind()`

.

```
x <- 1:3
y <- 10:12
cbind(x, y)
#> x y
#> [1,] 1 10
#> [2,] 2 11
#> [3,] 3 12
# or
rbind(x, y)
#> [,1] [,2] [,3]
#> x 1 2 3
#> y 10 11 12
```

### 7.4.2 Matrix Dimensions

Use `dim()`

to find out how many rows or columns are in a matrix (or dataframe)

```
dim(m)
#> [1] 2 3
```

We can transpose a matrix (or dataframe) with `t()`

```
m <- matrix(1:6, nrow = 2, ncol = 3)
m
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 2 4 6
t(m)
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 3 4
#> [3,] 5 6
```

### 7.4.3 Matrix Names

Just like vectors or lists, we can give matrices names that describe the rows and columns

```
m <- matrix(1:6, nrow = 2, ncol = 3)
rownames(m) <- c("row1", "row2")
colnames(m) <- c("A", "B", "C")
m
#> A B C
#> row1 1 3 5
#> row2 2 4 6
```

### 7.4.4 Challenge

Take a look at the vector I’ve created about box office sales for the first three Harry Potter movies:

```
# Box office sales (in millions!)
philosophers_stone <- c(66.1, 317.6, 657.2)
chamber_secrets <- c(54.7, 261.9, 616.9)
prisoner_azkaban <- c(45.6, 249.5, 547.1)
# Vectors region and titles, used for naming
region <- c("UK", "US", "Other")
titles <- c("Philosopher's Stone", "Chamber of Secrets", "Prisoner of Azkaban")
```

Your challenge is to:

- Combine the first three vectors into a matrix
- Add names for the matrix’s rows (
`titles`

) and columns (`region`

) - Use
`rowSums()`

to find the total Worldwide Box Office sales for each movie.

## 7.5 Dataframes

A dataframe is a very important data type in R. It’s pretty much the *de facto* data structure for most tabular data and it’s also what we use for statistics.

Let’s say we’re working with the following survey data:

- ‘Are you married?’ or ‘yes/no’ questions (
`logical`

) - ‘How old are you?’ (
`numeric`

) - ‘What is your opinion on Trump?’ or other ‘open-ended’ questions (
`character`

) - …

A matrix won’t work here because the dataset contains different data types.

A dataframe is a 2-dimentional data structure containing heterogeneous data types. Each column is a variable of a dataset, and the rows are observations.

NB: You might have heard of “tibbles,” used in the

`tidyverse`

suite of packages. Tibbles are like dataframes 2.0, tweaking some of the behavior of dataframes to make life easier for data anlysis. For now, just think of tibbles and dataframes as the same thing and don’t worry about the difference.

### 7.5.1 Creating Dataframes

R contains a number of built-in datasets that are stored as dataframes. For example, the `mtcars`

dataset contains information on automobile design and performance for 32 automobiles:

```
class(mtcars)
#> [1] "data.frame"
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.62 16.5 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 17.0 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.21 19.4 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
```

We also create dataframes when we import data through `read.csv`

or other data file input. We’ll talk more about importing data later in the class.

We can create a dataframe from scratch using `data.frame()`

. This function takes vectors as input:

```
# Definition of vectors
name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
planets <- data.frame(name, type, diameter, rings)
planets
#> name type diameter rings
#> 1 Mercury Terrestrial planet 0.382 FALSE
#> 2 Venus Terrestrial planet 0.949 FALSE
#> 3 Earth Terrestrial planet 1.000 FALSE
#> 4 Mars Terrestrial planet 0.532 FALSE
#> 5 Jupiter Gas giant 11.209 TRUE
#> 6 Saturn Gas giant 9.449 TRUE
#> 7 Uranus Gas giant 4.007 TRUE
#> 8 Neptune Gas giant 3.883 TRUE
```

Beware: `data.frame()`

’s default behaviour turns strings into factors. Use `stringAsFactors = FALSE`

to suppress this behaviour as needed:

```
planets <- data.frame(name, type, diameter, rings, stringsAsFactors = F)
planets
#> name type diameter rings
#> 1 Mercury Terrestrial planet 0.382 FALSE
#> 2 Venus Terrestrial planet 0.949 FALSE
#> 3 Earth Terrestrial planet 1.000 FALSE
#> 4 Mars Terrestrial planet 0.532 FALSE
#> 5 Jupiter Gas giant 11.209 TRUE
#> 6 Saturn Gas giant 9.449 TRUE
#> 7 Uranus Gas giant 4.007 TRUE
#> 8 Neptune Gas giant 3.883 TRUE
```

### 7.5.2 The Structure of Dataframes

Under the hood, a dataframe is a list of equal-length vectors. This makes it a 2-dimensional structure, so it shares properties of both the matrix and the list.

```
vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)
str(df)
#> 'data.frame': 3 obs. of 2 variables:
#> $ vec1: int 1 2 3
#> $ vec2: Factor w/ 3 levels "a","b","c": 1 2 3
```

The `length()`

of a dataframe is the length of the underlying list and so is the same as `ncol()`

; `nrow()`

gives the number of rows.

```
vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)
# these two are equivalent - number of columns
length(df)
#> [1] 2
ncol(df)
#> [1] 2
# get number of rows
nrow(df)
#> [1] 3
# get number of both columns and rows
dim(df)
#> [1] 3 2
```

### 7.5.3 Naming Dataframes

Like matrices, dataframes have `colnames()`

, and `rownames()`

. However, since dataframes are really lists (of vectors) under the hood `names()`

and `colnames()`

are the same thing.

```
vec1 <- 1:3
vec2 <- c("a", "b", "c")
df <- data.frame(vec1, vec2)
# these two are equivalent
names(df)
#> [1] "vec1" "vec2"
colnames(df)
#> [1] "vec1" "vec2"
# change the colnames
colnames(df) <- c("Number", "Character")
df
#> Number Character
#> 1 1 a
#> 2 2 b
#> 3 3 c
names(df) <- c("Number", "Character")
df
#> Number Character
#> 1 1 a
#> 2 2 b
#> 3 3 c
# change the rownames
rownames(df)
#> [1] "1" "2" "3"
rownames(df) <- c("donut", "pickle", "pretzel")
df
#> Number Character
#> donut 1 a
#> pickle 2 b
#> pretzel 3 c
```

### 7.5.4 Coercing Dataframes

Coerce an object to a dataframe with `as.data.frame()`

:

A vector will create a one-column dataframe.

A list will create one column for each element; it’s an error if they’re not all the same length.

A matrix will create a data frame with the same number of columns and rows as the matrix.

### 7.5.5 Challenges

#### Challenge 1.

Create a 3x2 data frame called `basket`

. The first column should contain the names of 3 fruits. The second column should contain the price of those fruits.

#### Challenge 2.

Now give your dataframe appropriate column and row names.

#### Challenge 3.

Add a third column called `color`

, that tells me what color each fruit is.

### 7.5.6 Quiz

You can check your answers in answers.

How is a list different from an vector?

What are the four common types of vectors?

What are names? How do you get them and set them?

How is a matrix different from a data frame?

### 7.5.7 Answers

The elements of a list can be any type (even a list); the elements of an atomic vector are all of the same type.

The four common types of vector are logical, integer, double (sometimes called numeric), and character.

Names allow you to attach labels to values. You can get and set individual names with

`names(x)`

and`names(x) <- c("x", "y", ...)`

.Every element of a matrix must be the same type; in a data frame, the different columns can have different types.