You have been running a one-way ANOVA with lm() and anova(), and the F-test has revealed the existence of a significant difference between some of the tested groups. But which groups? To answer that question, you will need to run the appropriate post-hoc tests to assess the significance of differences between pairs of group means. The functions emmeans() and glht() will help you do this.

We will reuse the example introduced here (one-way ANOVA). To create the corresponding dataframe, use the following code:

# response variable
size <- c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22)

# predictor variable
location <- as.factor(c(rep("ForestA",10), rep("ForestB",10), rep("ForestC",10)))

# dataframe
my.dataframe <- data.frame(size,location)


For recall, here is the corresponding plot:

and the model that was fitted with lm():

# linear model stored in the object 'model'
model <- lm(size ~ location, data = my.dataframe)



Alternative 1: running the test with emmeans()

emmeans() is part of the package emmeans, which we first need to activate:

library(emmeans)


The next step consists in “feeding” the linear model to emmeans(). The syntax is emmeans(model, pairwise ~ predictor) where predictor is the predictor variable which levels have to be compared pairwise:

# running emmeans()
emmeans(model, pairwise ~ location)
## $emmeans
##  location emmean    SE df lower.CL upper.CL
##  ForestA    24.0 0.684 27     22.6     25.4
##  ForestB    25.3 0.684 27     23.9     26.7
##  ForestC    21.7 0.684 27     20.3     23.1
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB     -1.3 0.967 27 -1.345  0.3836 
##  ForestA - ForestC      2.3 0.967 27  2.379  0.0619 
##  ForestB - ForestC      3.6 0.967 27  3.723  0.0026 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

The output shows first some statistics for each group in the upper part, while the lower part displays the three pairwise comparisons and p-values. Only the last comparison is associated with a p-value under 0.05.



Alternative 2: running the test with glht()

glht() is part of the package multcomp which we first need to activate:

library(multcomp)


Running the test with glht() involves two steps:

Let’s apply this to our example:

# running glht()
post.hoc <- glht(model, linfct = mcp(location = 'Tukey'))

# displaying the result table with summary()
summary(post.hoc)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = size ~ location, data = my.dataframe)
## 
## Linear Hypotheses:
##                        Estimate Std. Error t value Pr(>|t|)   
## ForestB - ForestA == 0   1.3000     0.9669   1.345   0.3836   
## ForestC - ForestA == 0  -2.3000     0.9669  -2.379   0.0618 . 
## ForestC - ForestB == 0  -3.6000     0.9669  -3.723   0.0025 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

The output shows all three pairwise comparisons, p-values and corresponding stars. And again, only the last comparison shows a p-value under 0.05.