A scatter plot is probably one of the simplest forms of plot that can be drawn. The data is represented in the form of dots (alternatively squares, crosses, circles or other simple geometrical shapes), one per data point, providing that the data can be described by two variables, one plotted along the X-axis and the other along the Y-axis.

Let’s take a simple example to see how to build a scatter plot with ggplot(). Here is the dataframe:

# dataframe
df <- data.frame(x,y)
str(df)
## 'data.frame':    100 obs. of  2 variables:
##  $ x: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ y: num  1.11 2.15 3.01 3.29 4.53 ...



Let’s map the data from the variable x and y by typing ggplot(df, aes(x, y)) and use geom_point() to draw the scatter plot:

ggplot(df, aes(x,y)) +
    geom_point()



You will certainly say that this scatter plot is rather boring; it could have colors and larger dots for example. We can add a few arguments such as size= and color= to geom_point() to make the dots larger and blue.

ggplot(df, aes(x,y)) +
    geom_point(size=2, color="blue")



You can also change the symbols by introducing the argument shape=. Check this table and pick the number that matches the shape you want.

ggplot(df, aes(x,y)) +
  geom_point(size=2, colour="blue", shape=17)



Scatter plot with groups

Until now, we have been drawing scatter plots representing only one sample. We may use such a plot to compare several groups or samples by the means of color= or shape= in aes(). To illustrate this, we need a more appropriate dataframe:

# dataframe
df2 <- data.frame(X, group, values)
str(df2)
## 'data.frame':    300 obs. of  3 variables:
##  $ X     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ group : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
##  $ values: num  1.08 0.85 1.01 1.12 1.07 ...



Here is the code that creates a scatter plot with 3 groups recognizable by color:

ggplot(df2, aes(x = X, y = values)) + 
  geom_point(aes(color = group), size = 2)

Read this page to learn more about color palettes.

And here is the same plot, this time with shapes to differentiate between groups:

ggplot(df2, aes(x = X, y = values)) + 
  geom_point(aes(shape = group), size = 2)



Adding plot title, axis titles, ticks, labels and other essential elements

In this section, you will learn how to set/modify all the necessary elements that make a plot complete and comprehensible. Such elements are: