When a scatter plot becomes so dense that the symbols overlap massively, the plot lacks clarity and loses its purpose. This is called overplotting. In such a case, there are a few alternatives, one of them being a Hexbin plot (i.e. Hexagonal heatmap of 2d bin counts). While a regular one-dimensional histogram uses bars to display the count of observations by intervals (or bins) on the X-axis, the Hexbin plot displays a map of hexagons made by the intersection of intervals (or bins) on both the X- and Y-axis. Each hexagon is thus defined by an interval on the X- and Y-axis, and a color (or shade) which describes the local count. Note that this is a variant of the plot called 2D histogram.

Let’s take the following example where the variable to be plotted along the X-axis (values1) and the variable to be plotted along the Y-axis (values2) both contain 4000 points (!). Here are the variables and the dataframe:

# dataframe
df <- data.frame(values1, values2)
str(df)
## 'data.frame':    4000 obs. of  2 variables:
##  $ values1: num  42.9 50.2 28.2 53.3 39.7 ...
##  $ values2: num  3.54 4.62 6.12 5.3 5.21 ...



If we attempt to draw a regular scatter plot with geom_point(), we obtain this unreadable figure:

ggplot(df, aes(values1, values2)) +
  geom_point()



In such a case of overplotting, it is logical to switch to a different plot type. To draw a Hexbin plot, we will use geom_hex():

ggplot(df, aes(values1, values2)) +
  geom_hex()  

A color bar guide named count automatically appears to the right, showing the different shades of blue (color by default) used in the figure to help you define approximately the count for each of the hexagons.

As for the regular histogram, there is a possibility to change the value of the bins, either by defining the binwidth with binwidth= or by defining the number of bins with bins=:

ggplot(df, aes(values1, values2)) +
  geom_hex(bins = 70)  

The size of the hexagons changes and the scale of the color bar guide is redefined accordingly.

If you feel like changing the colors used in the color bar guide, you can use scale_fill_gradient() or scale_fill_viridis_c() for example:

ggplot(df, aes(values1, values2)) +
  geom_hex(bins = 70) + 
  scale_fill_viridis_c()

Read this page to learn more about color palettes.



Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: