Here we will see a couple of common methods to introduce colors in your plot to differenciate among categories/groups in discrete variables. Check this page to see how to choose a palette of colors for your plot. Let’s start with the code for a reference plot without colors:

ggplot(df, aes(category, values)) +
    geom_violin()



fill= vs. color=

There are two easy ways to bring colors to categories. You may either color the frame and lines of the items (in which case you will use color= followed by the name of the variable), and/or you may fill the items (in which case you will use fill= followed by the name of the variable). In both cases, the code must be placed in aes(). NB: Note that these arguments may be placed either in aes() within ggplot(), or in aes() within the geometry geom_violin(). We will use the latter in the present tutorial.

Let’s first see what color= does to our plot:

ggplot(df, aes(category, values)) +   
    geom_violin(aes(color=category))



And here is the same plot with fill=

ggplot(df, aes(category, values)) +   
    geom_violin(aes(fill=category))



Changing the color palette

ggplot uses a palette of colors by default. This set of colors is rather good when few categories are displayed, but becomes hard to read when the numbers of items is greater than 6-8. You may thus want to use another palette such as viridis or brewer.

Note that the functions that will apply for discrete variables are:

Their use is further explained HERE.

Removing the legend from the plot

Finally, if you think that the colors and labels talk for themselves and that the legend that comes automatically to the right of the plot is just useless, modify the theme by adding the argument legend.position="none" to theme()in the following manner:

ggplot(df, aes(category, values)) +
    geom_violin(aes(fill=category)) +
    theme(legend.position="none")