Boxplots are useful plots to help visualize the median and spread of separate samples at once. Boxplots have a lot in common with violin plots as they represent the distribution of the samples. The difference is that the box displays the quartiles instead of the probability density.

We will see how to use ggplot() to code for a boxplot representing 4 groups of 150 data points each. This example is based on the same data set used to illustrate how to draw violin plots and jitter plots, among others. Here is the dataframe:

# dataframe
df <- data.frame(group, response)
str(df)
## 'data.frame':    600 obs. of  2 variables:
##  $ group   : Factor w/ 4 levels "Gr1","Gr2","Gr3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ response: num  19.4 27.2 15 22.9 24.7 ...



We first map the data with aes(group, response) and we use geom_boxplot() to draw the plot. The code is as follows:

ggplot(df, aes(group, response)) +     
  geom_boxplot() 

Note that potential outliers appear in the figure in the form of additional dots abobe and below the whiskers. If relevant, you may hide them using the argument outlier.shape = NA:

ggplot(df, aes(group, response)) +     
  geom_boxplot(outlier.shape = NA) 



You may alternatively highlight these outliers by changing their colors, shape and size with outlier.color=, outlier.fill=, outlier.shape= and outlier.size=:

ggplot(df, aes(group, response)) +     
  geom_boxplot(outlier.color= "darkblue" , outlier.fill= "yellow", outlier.size= 3, outlier.shape = 23) 

The colors of the boxes are tunable with color= and fill=:

ggplot(df, aes(group, response)) + 
  geom_boxplot(color= "darkblue", fill="lightblue")        



Finally, you can play with the width of the boxes via the argument width=:

ggplot(df, aes(group, response)) + 
  geom_boxplot(width=0.2)



Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: