You have been running a one-way ANOVA with lm()
and anova()
, and the F-test has revealed the existence of a significant difference between some of the tested groups. But which groups? To answer that question, you will need to run the appropriate post-hoc tests to assess the significance of differences between pairs of group means. The functions emmeans()
and glht()
will help you do this.
We will reuse the example introduced here (one-way ANOVA). To create the corresponding dataframe, use the following code:
# response variable
size <- c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22)
# predictor variable
location <- as.factor(c(rep("ForestA",10), rep("ForestB",10), rep("ForestC",10)))
# dataframe
my.dataframe <- data.frame(size,location)
For recall, here is the corresponding plot:
and the model that was fitted with lm()
:
# linear model stored in the object 'model'
model <- lm(size ~ location, data = my.dataframe)
emmeans()
emmeans()
is part of the package emmeans
, which we first need to activate:
library(emmeans)
The next step consists in “feeding” the linear model to emmeans()
. The syntax is emmeans(model, pairwise ~ predictor)
where predictor
is the predictor variable which levels have to be compared pairwise:
# running emmeans()
emmeans(model, pairwise ~ location)
## $emmeans
## location emmean SE df lower.CL upper.CL
## ForestA 24.0 0.684 27 22.6 25.4
## ForestB 25.3 0.684 27 23.9 26.7
## ForestC 21.7 0.684 27 20.3 23.1
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.3 0.967 27 -1.345 0.3836
## ForestA - ForestC 2.3 0.967 27 2.379 0.0619
## ForestB - ForestC 3.6 0.967 27 3.723 0.0026
##
## P value adjustment: tukey method for comparing a family of 3 estimates
The output shows first some statistics for each group in the upper part, while the lower part displays the three pairwise comparisons and p-values. Only the last comparison is associated with a p-value under 0.05.
glht()
glht()
is part of the package multcomp
which we first need to activate:
library(multcomp)
Running the test with glht()
involves two steps:
glht
for which the syntax is glht(model, linfct = mcp(predictor = 'Tukey'))
where predictor
is the predictor variable which levels have to be compared pairwise,summary()
.Let’s apply this to our example:
# running glht()
post.hoc <- glht(model, linfct = mcp(location = 'Tukey'))
# displaying the result table with summary()
summary(post.hoc)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: lm(formula = size ~ location, data = my.dataframe)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## ForestB - ForestA == 0 1.3000 0.9669 1.345 0.3836
## ForestC - ForestA == 0 -2.3000 0.9669 -2.379 0.0618 .
## ForestC - ForestB == 0 -3.6000 0.9669 -3.723 0.0025 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
The output shows all three pairwise comparisons, p-values and corresponding stars. And again, only the last comparison shows a p-value under 0.05.