Home Python vs R - How To Decide?
Post
Cancel

Python vs R - How To Decide?

Python and R are both very useful tools in academia, research, industry, and everywhere! They have a lot of similarities, but there are also many differences.

The purpose of this post is to help students to decide which language to learn first, according to their differences, similarities, and departmental practices.

First, let’s introduce each, and talk about their main purposes.

Introducing Python

Python is a general-purpose, object-oriented programming language. It was created in 1991, and it has a community of people who contribute to regularly updating libraries and improving efficiencies. It happens to be one of the most popular programming languages in the world. Some of the most common libraries for data-related tasks include NumPy (for arrays), Pandas (for data analysis and manipulation), and MatPlotLib (for data visualizations). Python is a powerful tool used for machine learning, deep learning, and modelling. Jupyter Notebook is a useful interface to pair with Python because it allows for clean, readable layouts to be shared with peers and users. Note that Jupyter Notebook also supports R.

Python is commonly used in disciplines such as computer science, data science, mathematics (optimization, pure math, finance, economics, etc.), physics, engineering, the social sciences, and more. Note that this is just a general consensus. Your specific department at UBC, or your previous institution, may have done something differently.

Introducing R

R was originally created for statistics, specializing in statistical analysis and visualization. R was created in 1993, but generally has less community support and less advancements than Python. This is one of the major complaints of users, and is one reason why many people end up turning to Python over R. Libraries and tools in R are best known for helping with tasks such as cleaning data, creating visualization, and training some machine learning and deep learning algorithms. Note that R can take significantly longer to handle machine learning algorithms than alternatives such as Python.

R is commonly used in disciplines such as biology and math (statistics). Again, this is just a general consensus, and your specific department may use a different tool.

Similarities

Both R and Python are open source programming languages that are maintained and supported by large communities.

Differences

Here, we will discuss some of the differences between R and Python, which can help to decide which to start learning first.

R is mainly used for statistical analysis, but Python is traditionally better for data wrangling. Python is more multi-purpose, and these skills can be transferable to other things, such as web development or application development. Python has a readable syntax that is easier to learn.

Let’s take a look at this chart below to look at more differences between the two.

RPython
- mainly used for statistical analysis- better for data wrangling
- leans towards statistical modelling and analytics- more multi-purpose (can transfer skills to application or web development)
- can perform deep statistical analysis in short code chunks- more readable syntax
- data visualizations can be built from models with ggplot2- scalable for machine learning and data analysis, Altair has interactive and customizable visualizations
- only supports data formats from Excel, CSV, txt files- supports many data formats (CSV, JSON, SQL tables, web requested data, etc.)
- optimized for statistical analysis of large datasets- Pandas let’s you filter and sort data almost instantly
- modelling analysis requires non-core R (tidyverse used for importing, visualizing, reporting, etc.)- standard libraries used work together well (NumPy, SciPy, scikit-learn)

General Preferences

In general, it is commonly considered that if you have programming experience, Python is the way to go, because it is a very easy language to learn (compared to others). If you are new to programming, Python is also considered a great learning language for beginners, and R may take some extra efforts to develop expertise.

If your team or supervisor prefers one language over another, it is usually easier to stick with what the people around you are using. This will allow them to help you if you run into issues. Talk to your team to find out what the majority of them prefer to use.

Overall, R is better for statistical learning, and Python is better for machine learning, large-scaled apps, and data analysis within the web and its applications. Python and R both have great visualization tools, and you can read posts about those here as well, so visualizations shouldn’t prevent you from deciding between the two.

R and Python are both wonderful tools to have in your toolbox for all academic and professional endeavours. When in doubt, just learn both! But if you don’t have the time, hopefully this information helps to make a more informed decision. It is also easier to learn one if you already have familiarity with the other, so feel free to keep the door open to return to learning the other option down the road.

This post is licensed under CC BY 4.0 by the author.