A multiple area plot (also called stacked area plot) is useful to visualize how different components or groups in a dataset contribute to a sum or a population.
Here we will plot data from Statistics Norway describing the Norwegian population (response variable population
) in 2019. Each observation (Norwegian person) is characterized by gender and age (predictor variables). Data source: Statistics Norway, the national statistical institute of Norway (ssb.no).
The dataframe for this tutorial is as follows:
# dataframe
df <- data.frame(age, population, gender)
# structure of the dataframe
str(df)
## 'data.frame': 212 obs. of 3 variables:
## $ age : int 0 1 2 3 4 5 6 7 8 9 ...
## $ population: num 28556 29649 31187 31157 31303 ...
## $ gender : Factor w/ 2 levels "Men","Women": 1 1 1 1 1 1 1 1 1 1 ...
The function to use for drawing the areas is geom_area()
. We must map the variables with the function aes()
which will look more or less like this: aes(age, population, gender)
. Our plan is to:
age
on the X-axis,population
on the Y-axis,fill=
to display the data in colored areas based on the predictor variable gender
.The code for the plot is as follows:
ggplot(df, aes(x = age, y = population, fill=gender)) +
geom_area()
As you see here, geom_area()
creates automatically a stacked area plot where the values projected onto the Y-axis are cumulative, not absolute. If you prefer to plot the absolute values, in other words if you want to have the two genders to overlap, you may use the argument position = position_dodge()
:
ggplot(df, aes(x = age, y = population, fill=gender)) +
geom_area(position = position_dodge())
However, the opacity of the top area (Women
) makes it impossible to read the values from Men
. This may be solved by making the layers semi-transparent with alpha=
:
ggplot(df, aes(x = age, y = population, fill=gender)) +
geom_area(position = position_dodge(), alpha=.5)
Finally, here is a trick if you want to display both genders in a symmetric way, one above the X-axis, and the other one below it. Use ifelse()
to create a conditional display, where the values linked to the variable Women
are displayed normally (positive values, population
) while the others are displayed negatively (-population
):
ggplot(df, aes(x = age, y = population, fill=gender)) +
geom_area(aes(y = ifelse(gender=="Women", population, -population)))
In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: