post-hoc-tests-lm2.utf8

You have been running a two-way ANOVA with lm() and anova(), and the F-test has revealed a main effect of at least one of the factors, and possibly an interaction. You would like to investigate further to find out which group means are significantly different from each other. To do so, you will need to run the appropriate post-hoc tests to assess the significance of differences between pairs of group means. The function emmeans() will help you do this.

We will reuse the example introduced here (factorial design (two-way ANOVA)). To create the corresponding dataframe, use the following code:

# response variable
size <- c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22,24,27,26,24,25,27,22,28,25,24,27,29,26,27,25,27,28,24,24,26,21,23,25,20,25,23,25,19,22,21)

# predictor variables
location <- as.factor(c(rep("ForestA",10), rep("ForestB",10), rep("ForestC",10), rep("ForestA",10), rep("ForestB",10), rep("ForestC",10)))
year <- as.factor(c(rep("2005",30), rep("2015",30)))

# dataframe
my.dataframe <- data.frame(size,location,year)

For recall, here is the corresponding plot:

and the model that was fitted with lm() was:

# linear model stored in the object 'model'
model <- lm(size ~ location + year, data = my.dataframe)

The analysis of variance had shown that there is a main effect of location, but not year, and no interaction location:year (hence the use of + instead of *).

Running the test with `emmeans()`

emmeans() is part of the package emmeans, which we first need to activate:

library(emmeans)

The next step consists in “feeding” the linear model to emmeans(). The syntax is emmeans(model, pairwise ~ predictor1 * predictor2) where predictor1 and predictor2 are the predictor variables which levels have to be compared pairwise:

# running emmeans()
emmeans(model, pairwise ~ location * year)

## $emmeans
##  location year emmean    SE df lower.CL upper.CL
##  ForestA  2005   24.1 0.515 56     23.1     25.1
##  ForestB  2005   25.3 0.515 56     24.3     26.3
##  ForestC  2005   21.6 0.515 56     20.5     22.6
##  ForestA  2015   25.1 0.515 56     24.1     26.1
##  ForestB  2015   26.3 0.515 56     25.3     27.3
##  ForestC  2015   22.5 0.515 56     21.5     23.6
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast                    estimate    SE df t.ratio p.value
##  ForestA,2005 - ForestB,2005   -1.200 0.631 56 -1.902  0.4118 
##  ForestA,2005 - ForestC,2005    2.550 0.631 56  4.042  0.0022 
##  ForestA,2005 - ForestA,2015   -0.967 0.515 56 -1.876  0.4271 
##  ForestA,2005 - ForestB,2015   -2.167 0.815 56 -2.660  0.0999 
##  ForestA,2005 - ForestC,2015    1.583 0.815 56  1.944  0.3872 
##  ForestB,2005 - ForestC,2005    3.750 0.631 56  5.943  <.0001 
##  ForestB,2005 - ForestA,2015    0.233 0.815 56  0.286  0.9997 
##  ForestB,2005 - ForestB,2015   -0.967 0.515 56 -1.876  0.4271 
##  ForestB,2005 - ForestC,2015    2.783 0.815 56  3.417  0.0143 
##  ForestC,2005 - ForestA,2015   -3.517 0.815 56 -4.317  0.0009 
##  ForestC,2005 - ForestB,2015   -4.717 0.815 56 -5.791  <.0001 
##  ForestC,2005 - ForestC,2015   -0.967 0.515 56 -1.876  0.4271 
##  ForestA,2015 - ForestB,2015   -1.200 0.631 56 -1.902  0.4118 
##  ForestA,2015 - ForestC,2015    2.550 0.631 56  4.042  0.0022 
##  ForestB,2015 - ForestC,2015    3.750 0.631 56  5.943  <.0001 
## 
## P value adjustment: tukey method for comparing a family of 6 estimates

The lower part of the table shows all 15 pairwise comparisons and the corresponding p-values.

However, this full analysis may only be done if the model originally includes the interaction term location:year. In our analysis, there was no interaction found, only a main effect of location. We may thus use the syntax emmeans(model, pairwise ~ predictor1 | predictor2) to obtain pairwise comparisons of all levels of predictor1 presented separately for each level of predictor2:

# running emmeans()
emmeans(model, pairwise ~ location | year)

## $emmeans
## year = 2005:
##  location emmean    SE df lower.CL upper.CL
##  ForestA    24.1 0.515 56     23.1     25.1
##  ForestB    25.3 0.515 56     24.3     26.3
##  ForestC    21.6 0.515 56     20.5     22.6
## 
## year = 2015:
##  location emmean    SE df lower.CL upper.CL
##  ForestA    25.1 0.515 56     24.1     26.1
##  ForestB    26.3 0.515 56     25.3     27.3
##  ForestC    22.5 0.515 56     21.5     23.6
## 
## Confidence level used: 0.95 
## 
## $contrasts
## year = 2005:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## year = 2015:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

This new table shows all three pairwises comparisons (and corresponding p-values) of ForestA, ForestB and ForestC first for 2005, and then for 2015.

Another option to obtain pairwise comparisons is to run the result of emmeans(model, ~ predictor1 * predictor2) through the function pairs() with the argument simple = "each". This creates a table with pairs of means grouped by each single factor. The syntax is pairs(result, simple = "each"):

# running pairs()
emm <- emmeans(model, ~ location * year)
pairs(emm, simple = "each")

## $`simple contrasts for location`
## year = 2005:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## year = 2015:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## 
## $`simple contrasts for year`
## location = ForestA:
##  contrast    estimate    SE df t.ratio p.value
##  2005 - 2015   -0.967 0.515 56 -1.876  0.0658 
## 
## location = ForestB:
##  contrast    estimate    SE df t.ratio p.value
##  2005 - 2015   -0.967 0.515 56 -1.876  0.0658 
## 
## location = ForestC:
##  contrast    estimate    SE df t.ratio p.value
##  2005 - 2015   -0.967 0.515 56 -1.876  0.0658

If you are not interested in the full table, but only the part showing the breakdown based on the levels of a specific factor, run pairs(emm, simple = "predictor") instead:

# running pairs()
emm <- emmeans(model, ~ location * year)
pairs(emm, simple = "location")

## year = 2005:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## year = 2015:
##  contrast          estimate    SE df t.ratio p.value
##  ForestA - ForestB    -1.20 0.631 56 -1.902  0.1476 
##  ForestA - ForestC     2.55 0.631 56  4.042  0.0005 
##  ForestB - ForestC     3.75 0.631 56  5.943  <.0001 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

Running the test with emmeans()

Running the test with `emmeans()`