Data Structures in R

Stefano Mezzini

\usepackage{amsmath}

There are many different data structure types in R, each with varying levels of complexity and uses. The simplest data structure in R is a vector. Vectors are one-dimensional arrays (i.e., sets of values) of a single class (such as characters, numbers, dates, etc.), and they have a similar structure to the mathematical concept of vectors. For example, the vector $\vec v$ with values 1, 4, 6, 2 would be:

$$\vec v = \begin{bmatrix}1 \\ 4 \\ 6 \\ 2\end{bmatrix}.$$

In R we can create vectors using the c() function, as follows:

1
2
v <- c(1, 4, 6, 2)
v
1
## [1] 1 4 6 2

(R prints vectors in a line rather than as columns to improve readability.)

Since vectors can only contain elements of a common type, c() will force all elements to be of a single class. In the example below, I create a vector with the number 1, the letter “a”, today’s date, and the value TRUE (a boolean value), but c() coerces all values to be characters:

1
c(1, 'a', Sys.Date(), TRUE)
1
## [1] "1"     "a"     "20185" "TRUE"

If we place two vectors (of the same class) side by side, we can create a matrix. Matrices are thus two-dimensional arrays

1
2
# 2D arrays: matrices
matrix(data = 1:8, nrow = 2)
1
2
3
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8
1
matrix(data = 1:8, ncol = 2)
1
2
3
4
5
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
1
matrix(data = 1:8, ncol = 2, byrow = TRUE)
1
2
3
4
5
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8
1
2
# matrix operations
matrix(1:9, ncol = 3)
1
2
3
4
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
1
matrix(1:9, ncol = 3) + 3
1
2
3
4
##      [,1] [,2] [,3]
## [1,]    4    7   10
## [2,]    5    8   11
## [3,]    6    9   12
1
matrix(1:9, ncol = 3) * 2 # NOT matrix multiplication
1
2
3
4
##      [,1] [,2] [,3]
## [1,]    2    8   14
## [2,]    4   10   16
## [3,]    6   12   18
1
matrix(1:9, ncol = 3) %*% 6:8 # matrix multiplication
1
2
3
4
##      [,1]
## [1,]   90
## [2,]  111
## [3,]  132
1
matrix(1:9, ncol = 3) %*% matrix(1:6, ncol = 2)
1
2
3
4
##      [,1] [,2]
## [1,]   30   66
## [2,]   36   81
## [3,]   42   96

If we want to create an object that has different data types, a matrix or vector won’t work because R will force all items to be of the same type.

1
class(c(0, 2))
1
## [1] "numeric"
1
class(c(0, 2, 'a')) # coerces all items to text
1
## [1] "character"

Instead, we need to use a list.

1
2
3
4
5
# grouping objects with different and types: lists
l <- list(letters = c('a', 'b', 'c'),
          numbers = 1:10,
          today = Sys.Date())
l
1
2
3
4
5
6
7
8
## $letters
## [1] "a" "b" "c"
## 
## $numbers
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $today
## [1] "2025-04-07"
1
l$letters
1
## [1] "a" "b" "c"
1
l$today
1
## [1] "2025-04-07"

If we want a list that has a table-like structure (which will likely be the case for a lot of the data you use in R), we can use a data frame.

1
2
data.frame(num = 1:10,
           abc = LETTERS[1:10])
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
##    num abc
## 1    1   A
## 2    2   B
## 3    3   C
## 4    4   D
## 5    5   E
## 6    6   F
## 7    7   G
## 8    8   H
## 9    9   I
## 10  10   J
1
2
data.frame(num = 1:5,
           abc = LETTERS[1:10])
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
##    num abc
## 1    1   A
## 2    2   B
## 3    3   C
## 4    4   D
## 5    5   E
## 6    1   F
## 7    2   G
## 8    3   H
## 9    4   I
## 10   5   J
1
2
data.frame(num = 1:10,
           abc = LETTERS[1:9])
1
## Error in data.frame(num = 1:10, abc = LETTERS[1:9]): arguments imply differing number of rows: 10, 9

The tidyverse set of packages provides a “fancy data frame” that does not recycle elements (unless they are a single value), and allows you to reference other columns you previously created:

1
2
3
library('tibble')
tibble(num = 1:10,
       abc = LETTERS[num])
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## # A tibble: 10 × 2
##      num abc  
##    <int> <chr>
##  1     1 A    
##  2     2 B    
##  3     3 C    
##  4     4 D    
##  5     5 E    
##  6     6 F    
##  7     7 G    
##  8     8 H    
##  9     9 I    
## 10    10 J
1
2
tibble(num = 1:5,
       abc = LETTERS[1:10])
1
2
3
4
5
## Error in `tibble()`:
## ! Tibble columns must have compatible sizes.
## • Size 5: Existing data.
## • Size 10: Column `abc`.
## ℹ Only values of size one are recycled.
Built with Hugo
Theme Stack designed by Jimmy