Creating Sample Datasets
This guide provides instructions on how to create sample datasets in R
and Python
. You can use these methods to generate a mini version of your original dataset for data consultations, enabling efficient and effective analysis on a manageable subset of your data. We assume you know how to read in your data, however, if you need step by step instructions on this, these are available further down the page for both R
and Python
.
R
Prerequisites
You will need dplyr
installed. You can double check that you have it installed:
|
|
|
|
And install it if necessary:
|
|
We want a minimum of 10 samples per variable or a maximum of 40% of your data if there is concern that 10 samples per variable will be insufficient for demonstration purposes. We want to store the output as an R
object.
10 samples per variable
- Replace
your_data_frame
in the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"
in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .RData file with you to your consultation.
40% of your observations
- Replace
your_data_frame
in the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"
in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .RData file with you to your consultation.
Python
We want a minimum of 10 samples per variable or a maximum of 40% of your data if there is concern that 10 samples per variable will be insufficient for demonstration purposes. We want to store the output as a csv file.
10 samples per variable
- Replace
your_data_frame
in the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"
in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .csv filw with you to your consultation.
40% of your observations
- Replace
your_data_frame
in the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"
in the last line with the path and file name to save your sampled data to.
|
|
Importing data
Importing Data into R
Prerequisites
Make sure you have the readr
package for CSV, readxl
package for Excel, or jsonlite
package for JSON installed. If not, you can install them using:
|
|
- Import CSV file:
|
|
- Import Excel file:
|
|
- Import JSON file
|
|
Importing Data in Python
- Import CSV file:
|
|
- Import Excel file:
|
|
- Import JSON file:
|
|