Data types dictate what we can and cannot do with our data. R
has 5 basic data types: numeric, complex, logical, character, and raw. These data types then form the foundation for other objects in R
–some of these objects provide structure and some provide semantic meaning. One of the objects that provides structure that we've already encountered is a vector
. While an example of semantic meaning might have us explicitly indicate the numbers 20220101
represent a date in the format yyyymmdd
.
One of our tasks when we load data into R
is ensure that the object that holds, or structures, our data, is the most appropriate and that R
properly interprets the semantic class of our data.
In this session we'll explore numeric, logical, and character data, leaving complex and raw data for another time.
Naming variables
As we go through this workshop, we'll work through concepts through variable assignment and then asking questions about our variables. When naming variables in R
, keep in mind the following:
More details can be found with ?make.names
Numeric data in R can be further broken down into two types, integer and double or floating point
Let’s assign a decimal and whole number to two respective variables:
decimalPoint <- 10.1
decimalPoint
## [1] 10.1
noDecimalPoint <- 10
noDecimalPoint
## [1] 10
We can then inquire about these data with type(of)
:
typeof(decimalPoint)
## [1] "double"
typeof(noDecimalPoint)
## [1] "double"
We see that they are both returned as "double". Curious. What if we want an integer? To designate that we want an integer we append the digits with an L
wholeNumber <- 10L
wholeNumber
## [1] 10
Now, if we ask about the data type, we are returned "integer":
typeof(wholeNumber)
## [1] "integer"
Note
Some functions return integers and others return doubles. R
when processing your data will convert on the fly between integers and doubles. Generally this is a good thing - we don't need to think about which data type we're creating, because there’s some fluidity. But just in case you're doing something where the distinction is important, it's good to know that you'll need to be explicit.
When might the difference between integers and floating point numbers be of importance to you?
These data are stored a little differently and can impact things like testing equivalency. So it may be of interest to note that
output <- 1/49 * 49 ## calculate 1/49 times 49
output ## print the result
## [1] 1
identical(output, 1) ## ask if "output" is identical to 1
## [1] FALSE
all.equal(output, 1) ## ask if "output" and 1 are at least pretty close to the same number
## [1] TRUE
We can also create logical data, that is, data that is either True or False. We could, for example, record the flipping of a coin as either heads or tails, or we could measure it as the number of times we get heads, by declaring a value of heads to be true and a value of tails to be false.
We'll assign a few logical statements to a vector and assign it to a variable:
logicalVector <- c(TRUE,FALSE, FALSE, TRUE, TRUE, FALSE)
logicalVector
## [1] TRUE FALSE FALSE TRUE TRUE FALSE
And then we can ask about it
typeof(logicalVector)
## [1] "logical"
In data analysis, using logical vectors can be extremely useful for subsetting data.
In addition to using typeof()
for inquiring about data types, we can also use class()
to get a bit more information about an R
object. Occasionally, the output of typeof()
and class()
are the same.
It can be useful to think of the class
as describing what an R object is and typeof
telling us about the data it contains. As an analogy, we might have a book of poetry. Our object–our book–might have a class of book
and if we inquired about the contents with typeof
we'd get back character
, since it is comprised of character
data.
Just as we can store various types of text in various types of books, notepads etc - R
can store various types data in various types of objects. The class of these objects will help us to understand how we can work with those stored data.
A vector in R
is one such object for storing data.
Vectors in R are atomic
. That is, they can only contain one data type, whether that be numeric, logical, character etc. When we made myFristVector
we created a container that has specific rules pertaining to the data in that container, namely, it's flat and unidirectional - think of it as being a single row or a single column - and all the data must be of the same type.
View()
We can get a better sense of the above with the View()
function. View()
is a great way of exploring data that mimics what we might be more used to if coming from an environment like Excel.
View(myFirstVector)
We can get an even better sense of this if we tell R
to treat our vector as a different class of object that we'll explore more shortly–a data frame
View(as.data.frame(myFirstVector))
Note
Since vectors are atomic, a vector made up of:
When we call class
on a vector, R
returns the data type, but omits telling us it's a vector, so the output reads the same as typeof()
.
R
has 5 basic data typesR
objects may have a class that describes that objectR
will convert between integer and floating point when needed.