Lots of things. I used to work with someone who programmed music with it.
More commonly, R
is known for and widely used for statistical computations. And this will form the foundation of our exploration of R
in this session. But really, it's a programming language, so there is little that you can't do with it; you're not tied to using its statistical packages.
R
is also more than just a programming language and tool for statistical analyses. It's a community of developers and users. This has several implications.
R
parlance, these add-ons are called packages. These packages include data and functions that are intended to make your work simpler.In this session we're going to be working almost exclusively with the base R
packages–the suite of tools that comes with a basic install.
R
As a programming language, R
has certain features that make it both very accessible to non programmers, especially for data analysis and statistical computing, and very challenging to grapple at the same time. This makes it both relatively low barrier, but on the flip side, it makes it very easy to use it poorly. A few to note:
The Good
R
Allows for
R
provides you with all the tools you need for basic data analysis with no additional fancy footworkR
has that support network, so you'll almost always get an answer online.The Bad
R
has given rise to multiple syntaxes–there is more than one correct grammatical structure for how we articulate our instructions. When you've learned the syntax for base R
and you encounter a solution to your problem written in the Tidyverse syntax, things can be confusing.R
doesn't always throw you an error when it should–instead it occasionally does some guess work, or coercion, for you when you're working, something that can lead to unexpected results.We're going to touch on a number of things that you can do with R, without delving into any in too much detail - the goal here is put you in a place where you are comfortable enough to engage in further inquiry and to ask questions that will get you helpful answers.
No single resource will every be able to answer all of your questions; many will be task or discipline specific and often a solution will offer a model that needs to be adapted to your specific need.
Generally, in R
you’ll be working with functions–that other users and developers have built–and passing data to these functions to be processed.
Let's start with a simple example. Type the following into your console:
mean(1:256)
## [1] 128.5
What's happened?
R
, this sequence of numbers is called a vector and mean()
is a function.Let's unpack this process a bit
We'll create a stand alone vector that we can then do some descriptive stats on. We'll start with the concatenate function c()
; concatenate as in linking, or combining, things together.
c(1:256)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
## [253] 253 254 255 256
The above is just a simple output though, to do something with the data in this vector we want to store the data somehow, so we can retrieve it, not just print it to our console.
We'll store it in a variable myFirstVector
; we do that in R with the assignment <-
. This is an R
quirck. You could use =
. But no one does.
myFirstVector <- c(1:256)
This time, when you hit enter, seemingly nothing happens. But we can now recall our variable
myFirstVector
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
## [253] 253 254 255 256
Now that we have our vector–or our data as simple as it is–we can calculate the mean with the mean()
function. So, we'll pass myFirstVector
to mean()
mean(myFirstVector)
## [1] 128.5
Outcome: We've created a vector by assigning a series of values to a variable and calculated the mean of these values. So, from a data analysis point of view, at it's most basic we can generate data and calculate descriptive statistics on them.
But we can peal this back a little further and ask, what is happening when we call a function? To explore this, we'll write our own function to calculate the sample mean, where
\(\bar{X} = (\sum xi)/n\)
We'll store our sample mean calculation in a variable called sampleMean
and then define the function as the sum of our input divided by the number of values in that same input. We'll be reliant on two other functions to achieve this, sum
which tallies all the values we pass to it and length
which counts the number of values.
# line by line, here's what we're doing:
# give the function a name and indicate what we'll pass to it
# calculate the sum of all values in the object we pass in
# calculate how many values are in the object we pass in
# run the calculation
# return the calculation
sampleMean <- function(vector) {
sum <- sum(vector)
length <- length(vector)
result <- sum/length
return(result)
}
We can then pass a vector into our function, just like we did with mean(myFirstVector)
.
sampleMean(myFirstVector)
## [1] 128.5
Outcome: You can program specific tasks tailored to your needs.
You can also peal back the layers of your program to investigate what's happening; a critical task when we think about research transparency and reproducibility.
sampleMean
## function(vector) {
## sum <- sum(vector)
## length <- length(vector)
## result <- sum/length
## return(result)
## }
This is the essence of a lot of what we'll be exploring in this session; generating data and then passing that data to a series of functions that perform specific tasks with that data.
R
has a series of functions that allow us to ask questions about our data; all good data analysis begins with knowing our data which means being able to explore it.
For example, we can look at just the beginning, or just the end of our vector using the head()
and tail()
functions. To do this, we pass our object, in this case myFirstVector
to each of these functions.
head(myFirstVector)
## [1] 1 2 3 4 5 6
tail(myFirstVector)
## [1] 251 252 253 254 255 256
We can also inquire about the nature of our vector. We can ask, as we've seen, how long it is with length()
length(myFirstVector)
## [1] 256
Or about its range with range()
range(myFirstVector)
## [1] 1 256
We can ask if it is a vector with is.vector()
is.vector(myFirstVector)
## [1] TRUE
We can ask what type of data it contains with typeof()
typeof(myFirstVector)
## [1] "integer"
You can (among other things)
R
is classifying itmyNumbers
with 201 values ranging from 100 to 300R
R
is a language, and like any language it will take time to develop fluency. To get more fluent you will need to know how to get help.
There is a fair bit of help built right into R, and this is a great place to start.
We can type a function's name to reveal it's underlying code. We can also access it's documentation by preceding it's name with a ?
–?functionName
. For example:
?head
or
?mean
The output in RStudio will tell you the function name and the package it's included in. For example, the function mean()
is part of the base
package, and so you'll see mean {base}
. It will also generally provide you with a description, a template for it's usage, the arguments that you can pass to it, and some examples of how to use it.
For example, you'll note that one argument we can pass to mean()
is trim
, which allows us to indicate a proportion from the head and tail of the data to remove from the calculation. The help page tells us that the default value for trim
is 0
and that it accepts values up to 0.5
. A second argument is whether or not to remove missing values.
The help page also provides us with references and other related functions that might be of interest.
While a single ?
searches for the help page on a given function, two ??
does a key word search in all the documentation bundled with R
. For example if you're interested in plotting some data, you might search for plot
:
??plot
And you'll get back a list of vignettes, code demonstrations, and help pages documenting specific functions in specific packages. The vignettes and code demonstrations walk you through either the capabilities or uses of a specific function. The lists display the package name followed by two colons and then the specific function.
Suppose you want to know if there's a function for a particular task, like finding the standard deviation, in your installation of R
. We can search for the term
??"standard deviation"
We can then pull up one of the help pages, say for stats::sd
and get instructions for using it.
Note
If you did not wrap "regression model"
in quotations, it would fail to run. We'll talk more about using characters in R
shortly.
The apropos()
function searches specifically for variables and functions that contain your search term.
apropos("vector")
We can also search for functions that start or end with specific sets of characters.
To find variables and functions that start with a particular set of letters, like 'm' or 'ma' we prepend the letters with a ^
apropos("^m")
apropos("^ma")
To find variables and functions that end in a particular set of letters, like 'r' or 'or' we append the letters with a $
apropos("r$")
apropos("or$")
Examples & demos
Many functions and features in R come with examples and demos on how they might be used. We can often get both examples and demos with the example()
and demo()
functions.
example(plot) ## Examples using the plot function
demo(graphics) ## Demonstration of the graphics capabilities in R
demo() ## Show a list of all demos available
The help built into R
can help you explore, discover, and troubleshoot. However, eventually you'll need to get into online forums, blogs, and other reading materials to troubleshoot and discover more resources to support your particular area of research.
R
to calculate the medianm_func_var
that stores all the functions and variables on your system that start with the letter m
.head()
and tail()
list the first or last 6 data points of whatever is passed to them. Use the help page for head()
and tail()
to list the first and last 10 results of m_func_var
.