Creating Sample Datasets
This guide provides instructions on how to create sample datasets in R and Python. You can use these methods to generate a mini version of your original dataset for data consultations, enabling efficient and effective analysis on a manageable subset of your data. We assume you know how to read in your data, however, if you need step by step instructions on this, these are available further down the page for both R and Python.
R
Prerequisites
You will need dplyr installed. You can double check that you have it installed:
|
|
|
|
And install it if necessary:
|
|
We want a minimum of 10 samples per variable or a maximum of 40% of your data if there is concern that 10 samples per variable will be insufficient for demonstration purposes. We want to store the output as an R object.
10 samples per variable
- Replace
your_data_framein the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .RData file with you to your consultation.
40% of your observations
- Replace
your_data_framein the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .RData file with you to your consultation.
Python
We want a minimum of 10 samples per variable or a maximum of 40% of your data if there is concern that 10 samples per variable will be insufficient for demonstration purposes. We want to store the output as a csv file.
10 samples per variable
- Replace
your_data_framein the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"in the last line with the path and file name to save your sampled data to.
|
|
Bring the resulting .csv filw with you to your consultation.
40% of your observations
- Replace
your_data_framein the second line with the name you assigned to your data on import. - Replace
"path/to/your/file.RData"in the last line with the path and file name to save your sampled data to.
|
|
Importing data
Importing Data into R
Prerequisites
Make sure you have the readr package for CSV, readxl package for Excel, or jsonlite package for JSON installed. If not, you can install them using:
|
|
- Import CSV file:
|
|
- Import Excel file:
|
|
- Import JSON file
|
|
Importing Data in Python
- Import CSV file:
|
|
- Import Excel file:
|
|
- Import JSON file:
|
|