Here we will see how to add/change colors to a plot when working with continuous variables. To illustrate this, we will use a scatter plot with 3 continuous variables: values1 which is plotted along the X-axis, values2 which is plotted along the Y-axis, and values3 which range will be represented with colors. Here is the code for this reference plot without colors (i.e. values3 is not involved yet):

ggplot(df, aes(values1, values2)) +
  geom_point()



Bringing colors to the dots

To bring colors to the dots as a function of values3, we use the argument color= in aes(), either directly within ggplot() or within geom_point(). In the following example, we use geom_point(aes(color=...)). Note that we also use size=2 to increase the size of the dots:

ggplot(df, aes(values1, values2)) +
  geom_point(aes(color = values3), size = 2)

The settings by default tell ggplot to use a blue gradient to represent the variable values3. This gradient, represented in the legend to the right of the plot, is indexed on values3. The lower the value of values3, the darker the color of the dot.

Changing the color palette

Instead of a relatively simple blue gradient, you may want to use a palette of nicer colors. The palettes provided by viridis or brewer/distiller are among the most frequently used. Note that the functions that will apply for continuous variables are:

Their use is further explained HERE.

Using a gradient based on two given colors

Now, if you do not find the exact set of colors that you want, you may use the color/gradient of your choice by adding scale_color_gradient() with the arguments low= and high= to the original code. The function will produce a two-color gradient ranging from the color indicated by low= to the color indicated by high=. Whether you indicate the colors by their R name or by their Hex code is up to you. Here are two examples:

ggplot(df, aes(values1, values2)) +                       
  geom_point(aes(color = values3), size = 2) +
  scale_color_gradient(low = "yellow", high = "darkblue")             # left plot, colors coded with R names
ggplot(df, aes(values1, values2)) +                     
  geom_point(aes(color = values3), size = 2) +
  scale_color_gradient(low = "#AF7AC5", high = "#E74C3C")             # right plot, colors coded with Hex codes



Using a gradient based on two given colors and a breaking point

scale_fill_gradient2() is a function that also produces a gradient of colors, but this time based on three colors defined by low=, mid= and high=. This color mid= is special in the way that you can use it to highlight a breaking point in your data (encoded by the argument midpoint=). For example, if the average of the variable values3 has a special meaning for the interpretation of the data, you may set midpoint = mean(values3). Thus all values close to the average will get a color close to the one defined by mid=. Here is the code and the result:

ggplot(df, aes(values1, values2, values3)) +
  geom_point(aes(color = values3), size = 2) +
  scale_color_gradient2(low = "yellow", mid = "darkblue", high = "red", midpoint = mean(values3))