Any plot designed with the function ggplot()
has 3 necessary components:
aes()
that takes care of positioning/mapping the variables on the plot,geom()
that takes care of drawing the graph (bar graph, line, scatter plot, etc).Simplified a lot, these 3 components can be seen as layers which, added on top of each other, will form the resulting plot. The code for the “simplest plot ever” has the following syntax:
ggplot(data = dataframe, aes(x = variable1, y = variable2)) +
geom()
Note that both the dataframe and aes()
are inside ggplot()
while geom()
is added to ggplot()
.
Let’s take a simple example where we draw a scatter plot based on the variables var1
and var2
which are contained in the dataframe df
. Here is the code for the variables and dataframe:
# variable1
var1 <- c(1,2,3,4,5,6,7,8,9,10)
# variable2
var2 <- c(45,12,48,79,65,32,78,95,12,75)
# dataframe
df <- data.frame(var1, var2)
The code for the scatter plot is:
ggplot(data = df, aes(x = var1, y = var2)) + # maps var1 and var2 contained in df
geom_point() # draws the dots of the scatter plot
It is possible to write the code in a simpler manner by removing data =
, x =
and y =
. However, make sure that there is no ambiguity about the order of the variables written in aes()
since the first variable will be defined as x
by convention. Any inversion will result at best in an inverted plot.
The code for the scatter plot in the example above may be simplified the following way:
ggplot(df, aes(var1, var2)) +
geom_point()
However, inverting the variables results in something rather different:
ggplot(df, aes(var2, var1)) +
geom_point()
Let’s now assume that you change your mind and want to draw a bar plot instead of a scatter plot. The only thing you need to change in the code is the layer that is responsible for drawing, aka the geometry. You shall thus replace geom_point()
with a more appropriate function such as geom_col()
. The code becomes:
ggplot(data = df, aes(x = var1, y = var2)) +
geom_col() # draws bars instead of dots
The relative simplicity in the syntax and structure of ggplot()
will allow you to not only rapidly modify layers, but also add other layers, thus creating multiple plots, faceted plots, etc.
Here we omit the function geom_point()
. ggplot()
is left with the dataframe and the æsthetics, but it does not know what to do with it. As a result, it shows you the axes, scales, grid and background, but nothing more:
ggplot(data = df, aes(x = var1, y = var2))
Here we omit the dataframe df
. Doing so, ggplot()
knows that you want to draw a scatter plot, that the data is in two variables called var1
and var2
, but it does not find the data frame that contains them. It just ends with an error message:
ggplot(aes(x = var1, y = var2)) +
geom_point()
## Error: `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class uneval
## Did you accidentally pass `aes()` to the `data` argument?
Here we omit the function aes()
and its content. Doing so, ggplot()
knows that the data is in the dataframe df
, that you want to draw a scatter plot, but it cannot map the data. It ends again with an error message:
ggplot(data = df) +
geom_point()
## Error: geom_point requires the following missing aesthetics: x, y