When designing a bar plot for a data set with two or more categorical variables, one may need to group/cluster some of the bars by category. Such a clustered (grouped) bar plot is very easy to create if you know already how to code bar plots.
Before going any further, if you are not so familiar with vertical bar plots, have a quick look at this page:
Here we will take the example of the average amount of precipitations (response variable precipitations
) measured in 2016, 2017 and 2018 (first categorical variable year
) at two different field stations near Bergen, namely Lygra and Østerbø (second categorical variable location
). The dataframe for this tutorial is as follows:
# dataframe
df <- data.frame(location, year, precipitations)
# structure of the dataframe
str(df)
## 'data.frame': 6 obs. of 3 variables:
## $ location : Factor w/ 2 levels "Lygra","Østerbø": 1 1 1 2 2 2
## $ year : Factor w/ 3 levels "2016","2017",..: 1 2 3 1 2 3
## $ precipitations: num 1316 1453 1230 583 626 ...
As for any simple bar plot, the function to use for drawing the bars is geom_col()
. Since we have two categorical variables and the measurement variable to map, the function aes()
will look more or less like this: aes(location, year, precipitations)
. However, we have to order properly the variables and ask ggplot to group and color the bars according to one of categories. We will use fill=
to do so. Our plan is to:
location
on the X-axis,precipitations
on the Y-axis,fill=
based on the predictor variable year
,geom_col()
.NB: without any addional argument, the geometry geom_col()
combined with fill=
draws stacked bars (see further below). To put the bars side by side, we must use the argument position = "dodge"
.
The code for the plot is as follows:
ggplot(df, aes(x = location, y = precipitations, fill = year)) +
geom_col(position = "dodge")
Alternatively we may replace fill=
with color=
. While fill=
colors the entire boxes, color=
changes the color of the box frames only:
ggplot(df, aes(x = location, y = precipitations, color = year)) +
geom_col(position = "dodge")
Here is what happens when one forgets the argument position = "dodge"
.
ggplot(df, aes(x = location, y = precipitations, fill = year)) +
geom_col()
As you see here, the bars are no longer side by side, but stacked on top of each other. This is how one makes a stacked bar plot, as further explained here.
An alternative to "position = "dodge"
is "position = dodge2"
, which does a similar job while adding a little gap between the bars. This is not life-changing, but contributes to the æsthetics of your plot:
ggplot(df, aes(x = location, y = precipitations, fill = year)) +
geom_col(position = "dodge2")
To draw a horizontal clustered bar plot, we simply add coord_flip()
to the code:
ggplot(df, aes(x = location, y = precipitations, fill = year)) +
geom_col(position = "dodge2") +
coord_flip()
In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: