Here we will see how to add/change colors to a plot when working with continuous variables. To illustrate this, we will use a scatter plot with 3 continuous variables: values1 which is plotted along the X-axis, values2 which is plotted along the Y-axis, and values3 which range will be represented with colors. Here is the code for this reference plot without colors (i.e. values3 is not involved yet):
ggplot(df, aes(values1, values2)) +
geom_point()
To bring colors to the dots as a function of values3, we use the argument color=
in aes()
, either directly within ggplot()
or within geom_point()
. In the following example, we use geom_point(aes(color=...))
. Note that we also use size=2
to increase the size of the dots:
ggplot(df, aes(values1, values2)) +
geom_point(aes(color = values3), size = 2)
The settings by default tell ggplot to use a blue gradient to represent the variable values3
. This gradient, represented in the legend to the right of the plot, is indexed on values3
. The lower the value of values3
, the darker the color of the dot.
Instead of a relatively simple blue gradient, you may want to use a palette of nicer colors. The palettes provided by viridis
or brewer/distiller
are among the most frequently used. Note that the functions that will apply for continuous variables are:
scale_color_viridis_c()
,scale_color_distiller()
.Their use is further explained HERE.
Now, if you do not find the exact set of colors that you want, you may use the color/gradient of your choice by adding scale_color_gradient()
with the arguments low=
and high=
to the original code. The function will produce a two-color gradient ranging from the color indicated by low=
to the color indicated by high=
. Whether you indicate the colors by their R name or by their Hex code is up to you. Here are two examples:
ggplot(df, aes(values1, values2)) +
geom_point(aes(color = values3), size = 2) +
scale_color_gradient(low = "yellow", high = "darkblue") # left plot, colors coded with R names
ggplot(df, aes(values1, values2)) +
geom_point(aes(color = values3), size = 2) +
scale_color_gradient(low = "#AF7AC5", high = "#E74C3C") # right plot, colors coded with Hex codes
scale_fill_gradient2()
is a function that also produces a gradient of colors, but this time based on three colors defined by low=
, mid=
and high=
. This color mid=
is special in the way that you can use it to highlight a breaking point in your data (encoded by the argument midpoint=
). For example, if the average of the variable values3
has a special meaning for the interpretation of the data, you may set midpoint = mean(values3)
. Thus all values close to the average will get a color close to the one defined by mid=
. Here is the code and the result:
ggplot(df, aes(values1, values2, values3)) +
geom_point(aes(color = values3), size = 2) +
scale_color_gradient2(low = "yellow", mid = "darkblue", high = "red", midpoint = mean(values3))