You have been running a two-way ANOVA with lm()
and anova()
, and the F-test has revealed a main effect of at least one of the factors, and possibly an interaction. You would like to investigate further to find out which group means are significantly different from each other. To do so, you will need to run the appropriate post-hoc tests to assess the significance of differences between pairs of group means. The function emmeans()
will help you do this.
We will reuse the example introduced here (factorial design (two-way ANOVA)). To create the corresponding dataframe, use the following code:
# response variable
size <- c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22,24,27,26,24,25,27,22,28,25,24,27,29,26,27,25,27,28,24,24,26,21,23,25,20,25,23,25,19,22,21)
# predictor variables
location <- as.factor(c(rep("ForestA",10), rep("ForestB",10), rep("ForestC",10), rep("ForestA",10), rep("ForestB",10), rep("ForestC",10)))
year <- as.factor(c(rep("2005",30), rep("2015",30)))
# dataframe
my.dataframe <- data.frame(size,location,year)
For recall, here is the corresponding plot:
and the model that was fitted with lm()
was:
# linear model stored in the object 'model'
model <- lm(size ~ location + year, data = my.dataframe)
The analysis of variance had shown that there is a main effect of location
, but not year
, and no interaction location:year
(hence the use of +
instead of *
).
emmeans()
emmeans()
is part of the package emmeans
, which we first need to activate:
library(emmeans)
The next step consists in “feeding” the linear model to emmeans()
. The syntax is emmeans(model, pairwise ~ predictor1 * predictor2)
where predictor1
and predictor2
are the predictor variables which levels have to be compared pairwise:
# running emmeans()
emmeans(model, pairwise ~ location * year)
## $emmeans
## location year emmean SE df lower.CL upper.CL
## ForestA 2005 24.1 0.515 56 23.1 25.1
## ForestB 2005 25.3 0.515 56 24.3 26.3
## ForestC 2005 21.6 0.515 56 20.5 22.6
## ForestA 2015 25.1 0.515 56 24.1 26.1
## ForestB 2015 26.3 0.515 56 25.3 27.3
## ForestC 2015 22.5 0.515 56 21.5 23.6
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## ForestA,2005 - ForestB,2005 -1.200 0.631 56 -1.902 0.4118
## ForestA,2005 - ForestC,2005 2.550 0.631 56 4.042 0.0022
## ForestA,2005 - ForestA,2015 -0.967 0.515 56 -1.876 0.4271
## ForestA,2005 - ForestB,2015 -2.167 0.815 56 -2.660 0.0999
## ForestA,2005 - ForestC,2015 1.583 0.815 56 1.944 0.3872
## ForestB,2005 - ForestC,2005 3.750 0.631 56 5.943 <.0001
## ForestB,2005 - ForestA,2015 0.233 0.815 56 0.286 0.9997
## ForestB,2005 - ForestB,2015 -0.967 0.515 56 -1.876 0.4271
## ForestB,2005 - ForestC,2015 2.783 0.815 56 3.417 0.0143
## ForestC,2005 - ForestA,2015 -3.517 0.815 56 -4.317 0.0009
## ForestC,2005 - ForestB,2015 -4.717 0.815 56 -5.791 <.0001
## ForestC,2005 - ForestC,2015 -0.967 0.515 56 -1.876 0.4271
## ForestA,2015 - ForestB,2015 -1.200 0.631 56 -1.902 0.4118
## ForestA,2015 - ForestC,2015 2.550 0.631 56 4.042 0.0022
## ForestB,2015 - ForestC,2015 3.750 0.631 56 5.943 <.0001
##
## P value adjustment: tukey method for comparing a family of 6 estimates
The lower part of the table shows all 15 pairwise comparisons and the corresponding p-values.
However, this full analysis may only be done if the model originally includes the interaction term location:year
. In our analysis, there was no interaction found, only a main effect of location
. We may thus use the syntax emmeans(model, pairwise ~ predictor1 | predictor2)
to obtain pairwise comparisons of all levels of predictor1
presented separately for each level of predictor2
:
# running emmeans()
emmeans(model, pairwise ~ location | year)
## $emmeans
## year = 2005:
## location emmean SE df lower.CL upper.CL
## ForestA 24.1 0.515 56 23.1 25.1
## ForestB 25.3 0.515 56 24.3 26.3
## ForestC 21.6 0.515 56 20.5 22.6
##
## year = 2015:
## location emmean SE df lower.CL upper.CL
## ForestA 25.1 0.515 56 24.1 26.1
## ForestB 26.3 0.515 56 25.3 27.3
## ForestC 22.5 0.515 56 21.5 23.6
##
## Confidence level used: 0.95
##
## $contrasts
## year = 2005:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## year = 2015:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## P value adjustment: tukey method for comparing a family of 3 estimates
This new table shows all three pairwises comparisons (and corresponding p-values) of ForestA
, ForestB
and ForestC
first for 2005
, and then for 2015
.
Another option to obtain pairwise comparisons is to run the result of emmeans(model, ~ predictor1 * predictor2)
through the function pairs()
with the argument simple = "each"
. This creates a table with pairs of means grouped by each single factor. The syntax is pairs(result, simple = "each")
:
# running pairs()
emm <- emmeans(model, ~ location * year)
pairs(emm, simple = "each")
## $`simple contrasts for location`
## year = 2005:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## year = 2015:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## P value adjustment: tukey method for comparing a family of 3 estimates
##
## $`simple contrasts for year`
## location = ForestA:
## contrast estimate SE df t.ratio p.value
## 2005 - 2015 -0.967 0.515 56 -1.876 0.0658
##
## location = ForestB:
## contrast estimate SE df t.ratio p.value
## 2005 - 2015 -0.967 0.515 56 -1.876 0.0658
##
## location = ForestC:
## contrast estimate SE df t.ratio p.value
## 2005 - 2015 -0.967 0.515 56 -1.876 0.0658
If you are not interested in the full table, but only the part showing the breakdown based on the levels of a specific factor, run pairs(emm, simple = "predictor")
instead:
# running pairs()
emm <- emmeans(model, ~ location * year)
pairs(emm, simple = "location")
## year = 2005:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## year = 2015:
## contrast estimate SE df t.ratio p.value
## ForestA - ForestB -1.20 0.631 56 -1.902 0.1476
## ForestA - ForestC 2.55 0.631 56 4.042 0.0005
## ForestB - ForestC 3.75 0.631 56 5.943 <.0001
##
## P value adjustment: tukey method for comparing a family of 3 estimates