Scatter plots that display a huge amount of data tend to be very confusing, and it is often difficult to define whether there is a pattern or trend in the data distribution. One strategy to better visualize a trend is to add a smoother, a line that removes the noise or inherent variation in the data set.
Before going any further, if you are not so familiar with scatter plots, have a quick look at this page:
Adding a smoother is simply done by adding an extra layer in the form of the function geom_smooth()
to the code of the scatter plot. Let’s take the following example. The dataframe for this example is as follows:
# dataframe
df <- data.frame(values_x, values_y)
# structure of the dataframe
str(df)
## 'data.frame': 600 obs. of 2 variables:
## $ values_x: int 1 2 3 4 5 6 7 8 9 10 ...
## $ values_y: num 61.3 32 16.6 46.4 75.5 ...
And the corresponding scatter plot is:
ggplot(df, aes(values_x, values_y)) +
geom_point()
To add the smoother, we add geom_smooth()
to the code:
ggplot(df, aes(values_x, values_y)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As you may see on the plot above, the plot displays now a smoother (the blue line) AND the standard error (grey area surrounding the smoother). A warning message also appears in the console. It lists the default parameters that geom_smooth()
has used, since we have not given it any information or arguments.
It is possible to adjust the shape of the smoother by playing with the wiggliness of the line. To do so, use the argument span=
and give it a value between 0 and 1:
ggplot(df, aes(values_x, values_y)) +
geom_point() +
geom_smooth(span = .2)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
If you are interested in the smoother but not in the standard error, you can ask geom_smooth()
to omit it. Just add the argument se=FALSE
:
ggplot(df, aes(values_x, values_y)) +
geom_point() +
geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
geom_smooth()
can also be used to overlay a line of best fit . For that, we use the argument method="lm"
in geom_smooth()
:
ggplot(df, aes(values_x, values_y)) +
geom_point() +
geom_smooth(method="lm", se=FALSE)
Finally, you may adjust the thickness, color and linetype with size=
, color=
and linetype=
, respectively:
ggplot(df, aes(values_x, values_y)) +
geom_point() +
geom_smooth(size=2, color="orange", linetype="dashed")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: