When designing a boxplot for a data set with two or more categorical variables, one may need to group/cluster some of the boxes by category. Such a clustered (grouped) boxplot is very easy to create if you know already how to draw boxplots.

Before going any further, if you are not so familiar with boxplots, have a quick look at this page:

Here we will take the following example where `values`

is the response variable, and `category1`

and `category2`

the categorical predictor variables. The dataframe for this tutorial is as follows:

```
# dataframe
df <- data.frame(values, category1, category2)
# structure of the dataframe
str(df)
```

```
## 'data.frame': 400 obs. of 3 variables:
## $ values : num 15.5 23.5 31.9 29.1 23.5 ...
## $ category1: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 1 1 1 1 1 1 ...
## $ category2: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
```

As you may guess from the structure of the dataframe above, `category1`

has 4 levels (`A`

, `B`

, `C`

and `D`

) and `category2`

has only 2 levels (`1`

and `2`

).

As for any boxplot, the function to use for drawing the bars is `geom_boxplot()`

. Since we have two categorical variables and the response variable to map, the function `aes()`

will look more or less like this: `aes(values, category1, category2)`

. However, we have to order properly the variables and ask ggplot to group and color the boxes according to one of categories. We will use `fill=`

to do so. Our plan is to:

- plot
`values`

on the Y-axis, - plot
`category1`

on the X-axis, - cluster the
`category2`

levels with`fill=`

, - draw the plot with
`geom_boxplot()`

.

Here is the code:

```
ggplot(df, aes(x = category1, y = values, fill = category2)) +
geom_boxplot()
```

Alternatively we may replace `fill=`

with `color=`

. While `fill=`

colors the entire boxes, `color=`

changes the color of the box frames and lines only:

```
ggplot(df, aes(x = category1, y = values, color = category2)) +
geom_boxplot()
```

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are:

- plot title,
- axis title,
- axis scale,
- axis ticks,
- category labels,
- legend,
- secondary Y-axis,
- colors,
- etc.