When a scatter plot becomes so dense that the symbols overlap massively, the plot lacks clarity and loses its purpose. This is called overplotting. In such a case, there are a few alternatives, one of them being a 2D density plot (the 2D version of the density plot described HERE). The 2D density plot brings an extra dimension to the scatter plot by allowing to visualize the density of observations with a colored scale.

Let’s take the following example where the variable to be plotted along the X-axis (values1) and the variable to be plotted along the Y-axis (values2) both contain 4000 points (!). Here is the dataframe:

# dataframe
df <- data.frame(values1, values2)
## 'data.frame':    4000 obs. of  2 variables:
##  $ values1: num  54.3 41.8 49.6 41.7 67.1 ...
##  $ values2: num  5.31 6 7.08 5.51 5.6 ...

If we attempt to draw a regular scatter plot with geom_point(), we obtain this unreadable figure:

ggplot(df, aes(values1, values2)) +

In such a case of overplotting, it is logical to switch to a different plot type. To draw a 2D density plot, we will either use stat_density_2d() or geom_density_2d() with a series of necessary arguments. Here is an example with stat_density_2d():

ggplot(df, aes(values1, values2)) +                       
  stat_density_2d(aes(fill = ..level..), geom = "polygon") 

And here is the same example with geom_density_2d():

ggplot(df, aes(values1, values2)) +                       
  geom_density_2d(aes(color = ..level..))

A color bar guide named level automatically appears to the right thanks to the parameter ..level... It shows the different shades of blue (color by default) used in the figure to help you define approximately the density level in the figure.

If you feel like changing the colors used in the color bar guide, you can use scale_fill_gradient(), scale_color_gradient(), scale_fill_viridis_c() or scale_color_viridis_c().

ggplot(df, aes(values1, values2)) +                            # left plot with stat_density_2d
  stat_density_2d(aes(fill = ..level..), geom = "polygon") +
ggplot(df, aes(values1, values2)) +                            # right plot with geom_density_2d
  geom_density_2d(aes(color = ..level..)) +

Read this page to learn more about color palettes.

Note that it is possible to highlight the borders between the levels in the stat_density_2d() plot by adding contours. To do so, you simply have to add the arguments color= to the function stat_density_2d(). You may also modulate the thickness of these contour lines by adding size=:

ggplot(df, aes(values1, values2)) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", colour="red", size = 1.5) +

The same applies to lines in the geom_density_2d() plot:

ggplot(df, aes(values1, values2)) +
  geom_density_2d(aes(color = ..level..), size=1.5) +

Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: