When designing a boxplot for a data set with two or more categorical variables, one may need to group/cluster some of the boxes by category. Such a clustered (grouped) boxplot is very easy to create if you know already how to draw boxplots.

Before going any further, if you are not so familiar with boxplots, have a quick look at this page:

Here we will take the following example where valuesis the response variable, and category1 and category2 the categorical predictor variables. The dataframe for this tutorial is as follows:

# dataframe
df <- data.frame(values, category1, category2)
# structure of the dataframe
str(df)
## 'data.frame':    400 obs. of  3 variables:
##  $ values   : num  15.5 23.5 31.9 29.1 23.5 ...
##  $ category1: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
##  $ category2: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...

As you may guess from the structure of the dataframe above, category1 has 4 levels (A, B, C and D) and category2 has only 2 levels (1 and 2).

As for any boxplot, the function to use for drawing the bars is geom_boxplot(). Since we have two categorical variables and the response variable to map, the function aes() will look more or less like this: aes(values, category1, category2). However, we have to order properly the variables and ask ggplot to group and color the boxes according to one of categories. We will use fill= to do so. Our plan is to:

Here is the code:

ggplot(df, aes(x = category1, y = values, fill = category2)) +
  geom_boxplot()



Alternatively we may replace fill= with color=. While fill= colors the entire boxes, color= changes the color of the box frames and lines only:

ggplot(df, aes(x = category1, y = values, color = category2)) +
  geom_boxplot()



Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: