Data Types

Data types dictate what we can and cannot do with our data. R has 5 basic data types: numeric, complex, logical, character, and raw. These data types then form the foundation for other objects in R–some of these objects provide structure and some provide semantic meaning. One of the objects that provides structure that we've already encountered is a vector. While an example of semantic meaning might have us explicitly indicate the numbers 20220101 represent a date in the format yyyymmdd.

One of our tasks when we load data into R is ensure that the object that holds, or structures, our data, is the most appropriate and that R properly interprets the semantic class of our data.

In this session we'll explore numeric, logical, and character data, leaving complex and raw data for another time.

Naming variables

As we go through this workshop, we'll work through concepts through variable assignment and then asking questions about our variables. When naming variables in R, keep in mind the following:

  • Variable names should first and foremost be meaningful.
  • They cannot start with a number or a dot followed by a number.
  • They cannot contain spaces.
  • They can contain letters, numbers, dots, and underscores.
  • Some words are reserved and cannot be used, such as if and for.

More details can be found with ?make.names

Numeric and Logical Data

Numeric data

Numeric data in R can be further broken down into two types, integer and double or floating point

  • double or floating point (generally speaking, can have a decimal)
  • integer (Whole number, full stop)

Let’s assign a decimal and whole number to two respective variables:

decimalPoint <- 10.1

decimalPoint
## [1] 10.1
noDecimalPoint <- 10

noDecimalPoint
## [1] 10

We can then inquire about these data with type(of):

typeof(decimalPoint)
## [1] "double"
typeof(noDecimalPoint)
## [1] "double"

We see that they are both returned as "double". Curious. What if we want an integer? To designate that we want an integer we append the digits with an L

wholeNumber <- 10L

wholeNumber
## [1] 10

Now, if we ask about the data type, we are returned "integer":

typeof(wholeNumber)
## [1] "integer"

Note

Some functions return integers and others return doubles. R when processing your data will convert on the fly between integers and doubles. Generally this is a good thing - we don't need to think about which data type we're creating, because there’s some fluidity. But just in case you're doing something where the distinction is important, it's good to know that you'll need to be explicit.

When might the difference between integers and floating point numbers be of importance to you?

These data are stored a little differently and can impact things like testing equivalency. So it may be of interest to note that

output <- 1/49 * 49 ## calculate 1/49 times 49

output ## print the result
## [1] 1
identical(output, 1) ## ask if "output" is identical to 1
## [1] FALSE
all.equal(output, 1) ## ask if "output" and 1 are at least pretty close to the same number
## [1] TRUE

Logical Values

We can also create logical data, that is, data that is either True or False. We could, for example, record the flipping of a coin as either heads or tails, or we could measure it as the number of times we get heads, by declaring a value of heads to be true and a value of tails to be false.

We'll assign a few logical statements to a vector and assign it to a variable:

logicalVector <- c(TRUE,FALSE, FALSE, TRUE, TRUE, FALSE)

logicalVector
## [1]  TRUE FALSE FALSE  TRUE  TRUE FALSE

And then we can ask about it

typeof(logicalVector)
## [1] "logical"

In data analysis, using logical vectors can be extremely useful for subsetting data.

Classes

In addition to using typeof() for inquiring about data types, we can also use class() to get a bit more information about an R object. Occasionally, the output of typeof() and class() are the same.

It can be useful to think of the class as describing what an R object is and typeof telling us about the data it contains. As an analogy, we might have a book of poetry. Our object–our book–might have a class of book and if we inquired about the contents with typeof we'd get back character, since it is comprised of character data.

Just as we can store various types of text in various types of books, notepads etc - R can store various types data in various types of objects. The class of these objects will help us to understand how we can work with those stored data.

Vectors

A vector in R is one such object for storing data.

Vectors in R are atomic. That is, they can only contain one data type, whether that be numeric, logical, character etc. When we made myFristVector we created a container that has specific rules pertaining to the data in that container, namely, it's flat and unidirectional - think of it as being a single row or a single column - and all the data must be of the same type.

View()

We can get a better sense of the above with the View() function. View() is a great way of exploring data that mimics what we might be more used to if coming from an environment like Excel.

View(myFirstVector)

We can get an even better sense of this if we tell R to treat our vector as a different class of object that we'll explore more shortly–a data frame

View(as.data.frame(myFirstVector))

Note

Since vectors are atomic, a vector made up of:

  • integer data will be an integer vector
  • logical values will be a logical vector
  • character values will be a character vector

When we call class on a vector, R returns the data type, but omits telling us it's a vector, so the output reads the same as typeof().

What Have We Learned

  • R has 5 basic data types
  • R objects may have a class that describes that object
  • Numeric data can be either integer or double precision floating point, and R will convert between integer and floating point when needed.