When a scatter plot becomes so dense that the symbols overlap massively, the plot lacks clarity and loses its purpose. This is called overplotting. In such a case, there are a few alternatives, one of them being the 2D histogram. While a regular one-dimensional histogram uses bars to display the count of observations by intervals (or bins) on the X-axis, the 2D histogram displays a map of squares made by the intersection of intervals (or bins) on both the X- and Y-axis. Each square is thus defined by an interval on the X- and Y-axis, and a color (or shade) which describes the local count. Note that there exists a variant of this plot called Hexbin plot.

Let’s take the following example where the variable to be plotted along the X-axis (values1) and the variable to be plotted along the Y-axis (values2) both contain 4000 points (!). Here is the dataframe:

# dataframe
df <- data.frame(values1, values2)
str(df)
## 'data.frame':    4000 obs. of  2 variables:
##  $ values1: num  54 27.5 48.3 52.2 52.3 ...
##  $ values2: num  5.43 4.59 5.62 6.58 5.16 ...



If we attempt to draw a regular scatter plot with geom_point(), we obtain this unreadable figure:

ggplot(df, aes(values1, values2)) +
  geom_point()



In such a case of overplotting, it is logical to switch to a different plot type. To draw a 2D histogram, we will use geom_bin2d():

ggplot(df, aes(values1, values2)) +
  geom_bin2d()  

A color bar guide named count automatically appears to the right, showing the different shades of blue (color by default) used in the figure to help you define approximately the count for each of the squares.

As for the regular histogram, there is a possibility to change the value of the bins, either by defining the binwidth with binwidth= or by defining the number of bins with bins=:

ggplot(df, aes(values1, values2)) +
  geom_bin2d(bins = 70)  

The size of the squares changes and the scale of the color bar guide is redefined accordingly.

If you feel like changing the colors used in the color bar guide, you can use scale_fill_gradient() or scale_fill_viridis_c() for example:

ggplot(df, aes(values1, values2)) +
  geom_bin2d(bins = 70) + 
  scale_fill_viridis_c()

Read this page to learn more about color palettes.

Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: