Last Updates: 2023-10-17
read.table()
is R’s primary means of importing data, allowing the user to specify a variety of options. read.csv()
and read.delim()
are two wrappers on read.table()
to simplify the import of comma separated and tab delimited files; the only difference between the two is the delimiter that they expect.
using read.csv()
is as simple as specifying a directory or url from which to import data…
# import from directory
data_local <- read.csv("../data/gapminder.csv")
# import from url
url <- 'https://raw.githubusercontent.com/jstaf/gapminder/master/gapminder/gapminder.csv'
data_url <- read.csv(url)
By default, it assumes your file has a header and that any blank values contain the characters NA
. Depending on our data source, however, we may need to adjust these parameters…
# a set of possible NA values
na_values <- c("NULL", "NA", "N/A", "99", "", " ")
data_custom_na <- read.csv("../data/gapminder.csv", na.strings = na_values)
# a file with no header
data_no_header <- read.csv("../data/gapminder.csv", header = FALSE)
It may also be a good idea to trim excess white space, which is not done by default…
data_nows <- read.csv("../data/gapminder.csv", strip.white = TRUE)
:::note
TRUE and FALSE can be denoted with either T
and F
or TRUE
and FALSE
. However, T
and F
as variable names can be overwritten to point to other objects, whereas TRUE
and FALSE
cannot. It is advisable to always use TRUE
and FALSE
.
:::
Additional options and their defaults can be found with ?read.table()
.
read.csv()
and read_csv()
are very similar. read_csv()
is ostensibly faster, it also loads data into a tibble as opposed to a data frame, and has more user friendly defaults. It does, however, require loading additional packages.
As with read.csv()
and read.delim
, read_csv()
and read_tsv()
– for tab separated values – are wrappers on read_delim()
that is more flexible.
read_csv()
assumes the file has a header, it trims white space by default, and that missing data are either blank cells or contain the character NA
. To adjust these, use the arguments col_names = FALSE
, na = na_values
, trim_ws = FALSE
.
library(readr)
data_readr <- read_csv("../data/gapminder.csv")
Base R does not include a package for loading in Excel files. For this we’ll use the tidyverse package readxl
, which can read both legacy xls
as well as more recent xlsx
files.
library(readxl)
data_xls <- read_excel("../data/gapminder.xlsx")
This is tidyverse, and so read_excel()
loads a tibble object. If you need or want a data frame, you’ll need to adjust for that…
data_xls_df <- as.data.frame(read_excel("../data/gapminder.xlsx"))
You can specify specific sheets and ranges with the sheet
and range
arguments…
data_xls <- read_excel("../data/gapminder.xlsx", sheet = '1952', range = "A1:D5")
Additional options and their defaults can be found with ?readxl
or by visiting https://readxl.tidyverse.org/.