Home Visualizing with Altair in Python
Post
Cancel

Visualizing with Altair in Python

Matplotlib is a standard package used in Python for plotting. This is a basic plotting package, but does have limitations.

If you are familiar with R, you may have used ggplot2 before. In Python, the equivalent package would be Altair. We will take a look at some basic Altair plotting tools.

Let’s switch things up now and use a different data set to visualize. Like R, Python has a few built-in data sets. A popular R dataset, iris, can also be found in Python. Let’s load the iris data set here.

1
2
3
import statsmodels.api as sm
iris = sm.datasets.get_rdataset('iris').data
iris
1
2
3
4
5
6
7
8
9
10
11
12
13
14
##      Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 0             5.1          3.5           1.4          0.2     setosa
## 1             4.9          3.0           1.4          0.2     setosa
## 2             4.7          3.2           1.3          0.2     setosa
## 3             4.6          3.1           1.5          0.2     setosa
## 4             5.0          3.6           1.4          0.2     setosa
## ..            ...          ...           ...          ...        ...
## 145           6.7          3.0           5.2          2.3  virginica
## 146           6.3          2.5           5.0          1.9  virginica
## 147           6.5          3.0           5.2          2.0  virginica
## 148           6.2          3.4           5.4          2.3  virginica
## 149           5.9          3.0           5.1          1.8  virginica
## 
## [150 rows x 5 columns]

Let’s first start by looking at matplotlib.

We will take a look at a scatterplot of the first two columns in iris.

1
2
import matplotlib.pyplot as plt
plt.scatter(x = iris['Sepal.Length'], y = iris['Sepal.Width'])

How about a boxplot?

1
2
new_data = iris[["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]]
new_data.boxplot()

If we wanted to add a title and axis labels to the plot:

1
2
3
4
new_data.boxplot()
plt.title("Sample Boxplot")
plt.xlabel("Measurements")
plt.ylabel("Values")

We can also use the functions plt.hist() and plt.bar() to generate histograms and boxplots, respectively.

Now, let’s take a look at a few of Altair’s functions.

In order to use Altair, we have to change the column names because it does not support the functionality with Column.Name.

1
2
3
4
5
6
# rename columns
iris = iris.rename(columns={'Sepal.Length': 'SepalLength', 
                            'Sepal.Width': 'SepalWidth',
                           'Petal.Length': 'PetalLength',
                           'Petal.Width': 'PetalWidth'})
iris
1
2
3
4
5
6
7
8
9
10
11
12
13
14
##      SepalLength  SepalWidth  PetalLength  PetalWidth    Species
## 0            5.1         3.5          1.4         0.2     setosa
## 1            4.9         3.0          1.4         0.2     setosa
## 2            4.7         3.2          1.3         0.2     setosa
## 3            4.6         3.1          1.5         0.2     setosa
## 4            5.0         3.6          1.4         0.2     setosa
## ..           ...         ...          ...         ...        ...
## 145          6.7         3.0          5.2         2.3  virginica
## 146          6.3         2.5          5.0         1.9  virginica
## 147          6.5         3.0          5.2         2.0  virginica
## 148          6.2         3.4          5.4         2.3  virginica
## 149          5.9         3.0          5.1         1.8  virginica
## 
## [150 rows x 5 columns]

Let’s check the data types.

1
iris.dtypes
1
2
3
4
5
6
## SepalLength    float64
## SepalWidth     float64
## PetalLength    float64
## PetalWidth     float64
## Species         object
## dtype: object

Now that we’re ready, let’s view a scatter plot of the first two columns in iris.

1
2
3
4
5
import altair as alt
alt.Chart(iris).mark_point().encode(
      x = 'SepalLength',
      y = 'SepalWidth'
)

If we wanted to view this same scatterplot but also distinguish by colour, we could add in one small line at the end. Also, let’s add some axis titles and change the scale to reduce the white space.

1
2
3
4
5
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_point().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species'
)

Unfortunately, Altair only accepts the US spelling of colour!

We see that the red and orange are a bit hard to distinguish. We can add in different shapes to help distinguish between species.

1
2
3
4
5
6
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_point().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      shape = 'Species'
)

Tooltips

There is a feature in Altair called a tooltip that allows users to interact with the plot.

Let’s add a tooltip to the scatterplot above to see how it looks. Note that this blog formatting will not allow for the interactive elements, but if you copy this code into Jupyter Notebook or R Studio, you will see that when you hover over each individual point, it will list the values specified in the tooltip argument.

1
2
3
4
5
6
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_point().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      shape = 'Species',
      tooltip = ('Species'))

Notice that when you hover over the point, it lists the species value, because that is the one variable that we specified under the tooltip argument.

We can add as many different columns to the tooltip as we want.

1
2
3
4
5
6
7
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_point().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      shape = 'Species',
      tooltip = (['Species','SepalLength','SepalWidth','PetalLength','PetalWidth'])
)

Another feature we can add is the ability to make the graph interactive. This would allow the user to scroll or zoom. Note that this blog formatting will not allow for the interactive elements, but if you copy this code into Jupyter Notebook or R Studio, you will see that you can zoom to change the X and Y axis ranges.

1
2
3
4
5
6
7
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_point().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      shape = 'Species',
      tooltip = (['Species','SepalLength','SepalWidth','PetalLength','PetalWidth'])
).interactive()

Notice also there is a function called mark_circle() which is different than mark_point().

We can show the same graph as above, but with mark_circle() instead of mark_point().

1
2
3
4
5
6
7
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_circle().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      shape = 'Species',
      tooltip = (['Species','SepalLength','SepalWidth','PetalLength','PetalWidth'])
).interactive()

Let’s take a look at mark_line(), and let’s remove the shape argument.

1
2
3
4
5
6
alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_line().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      tooltip = (['Species','SepalLength','SepalWidth','PetalLength','PetalWidth'])
).interactive()

We can overlay plots on top of each other. Let’s plot the scatterplot and lines together.

1
2
3
4
5
6
7
8
9
10
11
12
13
line = alt.Chart(iris, title="Comparing Sepal Length to Sepal Width").mark_line().encode(
      x = alt.X('SepalLength', title = 'Sepal Length', scale = alt.Scale(domain = (4,9))),
      y = alt.Y('SepalWidth', title = 'Sepal Width', scale = alt.Scale(domain = (1.5,4.5))),
      color = 'Species',
      tooltip = (['Species','SepalLength','SepalWidth','PetalLength','PetalWidth'])
).interactive()

point = alt.Chart(iris).mark_point().encode(
  x = 'SepalLength',
  y = 'SepalWidth',
   color = 'Species')

line + point

We can also show multiple plots at once using the arguments we learned in part 3.

Recall:

  • or means horizontal and it is represented by |
  • and means vertical and it is represented by &

Try looking at them horizontally by typing line | point

Then, try looking at them stacked vertically by typing line & point

We can also combine these features to design whatever layout you would like. First, let’s introduce a boxplot.

If we wanted to show a boxplot for the different petal lengths, we could do so like this:

1
2
3
4
alt.Chart(iris, title = 'Petal Lengths of Species').mark_boxplot().encode(
      x = alt.X('Species', title = 'Type of Species'),
      y = alt.Y('PetalLength', title = 'Petal Length')
)

To make it look nicer, we could add colour to each species, and then store it as a variable.

1
2
3
4
5
6
box = alt.Chart(iris, title = 'Petal Lengths of Species').mark_boxplot().encode(
      x = alt.X('Species', title = 'Type of Species'),
      y = alt.Y('PetalLength', title = 'Petal Length'),
      color = 'Species'
)
box

If you want to show multiple plots in the same window, you can show 2 or more. Try writing line | point | box.

You can also mix and match operators. Using brackets will help to organize the layout. Try writing line | (point & box).

Many different Altair charts can be created using mark_bar(), mark_line(), mark_point(), mark_rect(), and so many more!

A complete list can be found here!

This post is licensed under CC BY 4.0 by the author.