When we talk about data, we can talk about data types, data classes, and data structures.

Types

Data types are fundamental building blocks for storing information. R has five atomic data types – the data types from which other objects are created. The three of importance to us here are:

Type Representation
Numeric Numbers
Character Text
Logical True and False Values

Character data, also known as strings, are always wrapped in “quotation marks”.

Numeric data can be stored two ways, as integers or as floating point, also called ‘double’.

Classes

Data types can have specific attributes that influence what we can and cannot do with these data. One of these attributes is a class.

Consider the following numbers – 20220301.

Without context, this is simply one big number. Or a list of smaller numbers. Classify it as a date, however, that has a set of rules for how a date is written – yyyymmdd – and a series of conventions for how dates function – a specific calendar type, the length of a year, month, or day etc – and we can start to be able to do some date specific operations with this data, like calculating a person’s age.

We’ll assign some numbers that could be a date to a variable

(numbers <- 20220301) # create variable numbers as atomic type numeric
[1] 20220301
# convert numbers to class date and assign to a new variable
(numbers_as_date <- as.Date(as.character(numbers), '%Y%m%d')) 
[1] "2022-03-01"

After which we can inquire about their class

class(numbers) # inquire about the class
[1] "numeric"
class(numbers_as_date)
[1] "Date"

And see the utility of adding the date class

Sys.Date() # retrieve today's data
[1] "2023-02-14"
# calculate the number of days that have passed since numbers
(days_since_March_31 <- Sys.Date() - numbers) # doesn't make sense
[1] "-53339-10-26"
(days_since_March_31 <- Sys.Date() - numbers_as_date) # works
Time difference of 350 days

Structures

Data structures can be thought of how these data are stored collectively – the structure that groups multiple values from a variable together, or the values from multiple variables together. R has a few basic data structures that you’ll frequently encounter. These include vectors, lists, matrices, and data frames.

Vectors

A vector is a very simple list. It is uni-dimensional - think of it as a single column or row of data - and it can only contain data of exactly the same type. So, if you have a list of numbers or words in R, these will likely be contained within a vector. In fact, the data set rivers is a vector,

rivers
  [1]  735  320  325  392  524  450 1459  135  465  600  330  336  280  315  870
 [16]  906  202  329  290 1000  600  505 1450  840 1243  890  350  407  286  280
 [31]  525  720  390  250  327  230  265  850  210  630  260  230  360  730  600
 [46]  306  390  420  291  710  340  217  281  352  259  250  470  680  570  350
 [61]  300  560  900  625  332 2348 1171 3710 2315 2533  780  280  410  460  260
 [76]  255  431  350  760  618  338  981 1306  500  696  605  250  411 1054  735
 [91]  233  435  490  310  460  383  375 1270  545  445 1885  380  300  380  377
[106]  425  276  210  800  420  350  360  538 1100 1205  314  237  610  360  540
[121] 1038  424  310  300  444  301  268  620  215  652  900  525  246  360  529
[136]  500  720  270  430  671 1770

To test if something is a vector, we have a couple of options. We can use is.vector(), but it’s more appropriate to use is.atomic(),

is.vector(rivers)
[1] TRUE
is.atomic(rivers)
[1] TRUE

Data Frames

A data frame essentially functions as a series of connected vectors, where each vector is a column. In this sense a data frame is also a special kind of list.

In a data frame, all vectors need to be of the same length. And while each vector must hold the same data type, not all vectors need to be of the same data type. Data frames also allow us to apply column names.

(data.frame(
  numbers = c(1,5,8,9, 11),
  words = c('I', 'want', 'to', 'learn', 'R')
))
  numbers words
1       1     I
2       5  want
3       8    to
4       9 learn
5      11     R

Lists

A list also essentially functions as a series of connected vectors, but breaks us free of each column needing to be the same length as in a data frame. You can also nest a list within a list. This can start to get complicated.

(list(
  breakfast = c('Eggs', 'Muffins', 'Coffee'),
  lunch = c('Grilled Cheese Sandwich with Orange Juice'),
  numbers = c(1,4,6,7)
))
$breakfast
[1] "Eggs"    "Muffins" "Coffee" 

$lunch
[1] "Grilled Cheese Sandwich with Orange Juice"

$numbers
[1] 1 4 6 7

Matrices

A matrix resembles a data frame when displayed on screen, but is more accurately a vector with attributes that define the number of columns to divide the vector into. As a result, a matrix can only hold a single data type or class.

In the following, a series of numeric data. Instead of having column names, we have column and row numbers.

(matrix(round(rnorm(12, 10, 1), 2), nrow = 3))
      [,1] [,2]  [,3]  [,4]
[1,] 11.03 9.06 10.93 10.55
[2,]  9.76 9.98  7.81  9.31
[3,] 10.45 8.97 10.05 11.51

Vectors are the building blocks of data frames, lists, and matrices. Matrices are vectors broken into columns of the same length and same data types. Data frames are joined vectors of the same length and different data types. Lists are joined vectors of different lengths and data types. Each is useful in certain situations.

Function Description
class reports the type of data or data structure.
is. a family of functions for identifying data types and structures.