Matplotlib is a standard package used in Python for plotting. This is a basic plotting package, but does have limitations.
If you are familiar with R, you may have used ggplot2 before. In Python, the equivalent package would be Altair. We will take a look at some basic Altair plotting tools.
Let’s switch things up now and use a different data set to visualize. Like R, Python has a few built-in data sets. A popular R dataset, iris, can also be found in Python. Let’s load the iris data set here.
|
|
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 0 5.1 3.5 1.4 0.2 setosa
## 1 4.9 3.0 1.4 0.2 setosa
## 2 4.7 3.2 1.3 0.2 setosa
## 3 4.6 3.1 1.5 0.2 setosa
## 4 5.0 3.6 1.4 0.2 setosa
## .. ... ... ... ... ...
## 145 6.7 3.0 5.2 2.3 virginica
## 146 6.3 2.5 5.0 1.9 virginica
## 147 6.5 3.0 5.2 2.0 virginica
## 148 6.2 3.4 5.4 2.3 virginica
## 149 5.9 3.0 5.1 1.8 virginica
##
## [150 rows x 5 columns]
Let’s first start by looking at matplotlib.
We will take a look at a scatterplot of the first two columns in iris.
|
|
How about a boxplot?
|
|
If we wanted to add a title and axis labels to the plot:
|
|
We can also use the functions plt.hist()
and plt.bar()
to generate histograms and boxplots, respectively.
Now, let’s take a look at a few of Altair’s functions.
In order to use Altair, we have to change the column names because it does not support the functionality with Column.Name
.
|
|
## SepalLength SepalWidth PetalLength PetalWidth Species
## 0 5.1 3.5 1.4 0.2 setosa
## 1 4.9 3.0 1.4 0.2 setosa
## 2 4.7 3.2 1.3 0.2 setosa
## 3 4.6 3.1 1.5 0.2 setosa
## 4 5.0 3.6 1.4 0.2 setosa
## .. ... ... ... ... ...
## 145 6.7 3.0 5.2 2.3 virginica
## 146 6.3 2.5 5.0 1.9 virginica
## 147 6.5 3.0 5.2 2.0 virginica
## 148 6.2 3.4 5.4 2.3 virginica
## 149 5.9 3.0 5.1 1.8 virginica
##
## [150 rows x 5 columns]
Let’s check the data types.
|
|
## SepalLength float64
## SepalWidth float64
## PetalLength float64
## PetalWidth float64
## Species object
## dtype: object
Now that we’re ready, let’s view a scatter plot of the first two columns in iris.
|
|
If we wanted to view this same scatterplot but also distinguish by colour, we could add in one small line at the end. Also, let’s add some axis titles and change the scale to reduce the white space.
|
|
Unfortunately, Altair only accepts the US spelling of colour!
We see that the red and orange are a bit hard to distinguish. We can add in different shapes to help distinguish between species.
|
|
Tooltips
There is a feature in Altair called a tooltip
that allows users to interact with the plot.
Let’s add a tooltip to the scatterplot above to see how it looks.
|
|
Notice that when you hover over the point, it lists the species value, because that is the one variable that we specified under the tooltip argument.
We can add as many different columns to the tooltip as we want.
|
|
Another feature we can add is the ability to make the graph interactive. This would allow the user to scroll or zoom.
|
|
Notice also there is a function called mark_circle()
which is different than mark_point()
.
We can show the same graph as above, but with mark_circle()
instead of mark_point()
.
|
|
Let’s take a look at mark_line()
, and let’s remove the shape argument.
|
|
We can overlay plots on top of each other. Let’s plot the scatterplot and lines together.
|
|
We can also show multiple plots at once using the arguments we learned in part 3.
Recall:
- or means horizontal and it is represented by
|
- and means vertical and it is represented by
&
Try looking at them horizontally by typing line | point
Then, try looking at them stacked vertically by typing line & point
We can also combine these features to design whatever layout you would like. First, let’s introduce a boxplot.
If we wanted to show a boxplot for the different petal lengths, we could do so like this:
|
|
To make it look nicer, we could add colour to each species, and then store it as a variable.
|
|
If you want to show multiple plots in the same window, you can show 2 or more. Try writing line | point | box
.
You can also mix and match operators. Using brackets will help to organize the layout. Try writing line | (point & box)
.
Many different Altair charts can be created using mark_bar()
, mark_line()
, mark_point()
, mark_rect()
, and so many more!
A complete list can be found here!