Whereas one-way ANOVA allows for comparison of three and more group means based on the different levels of a single factor, **factorial design** allows for comparison of groups based on **several independent variables and their various levels**. Thus, comparing observations (such as tree size, egg size…) categorized by to two predictor variables (for example location and species) may be done via a **two-way ANOVA**; if three predictor variables are involved (for example location, species and year), then a **three-way ANOVA** may be used… Further below, we will essentially talk about two-way ANOVA, but you’ll quickly understand how the functions may be adapted to more factors.

Using factorial ANOVA, we get to assess **main effects**, in other words the impact of single factors (predictor variables) on a given response variable (plant size, egg number, individual bodyweight…). We also get to see whether there exist **interactions** between factors, i.e. whether the effect of a specific factor is influenced by (one of the) other factors.

As for one-way ANOVA, the function to use is `lm()`

, followed by `anova()`

.

Let’s see that in an example similar to the one that we looked at in one-way ANOVA. Here, we check whether the average size of blue ground beetles (*Carabus intricatus*) differs depending on their location. We now introduce a new factor: measurements were performed in 2005 and in 2015 at the same 3 forests A, B and C. In each location, we measured the size (in millimeters) of 10 individuals (making it a balanced design). In total, 60 individuals were measured. The two factors are `location`

and `year`

.

Here is the code for creating the dataframe:

```
# response variable
size <- c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22,24,27,26,24,25,27,22,28,25,24,27,29,26,27,25,27,28,24,24,26,21,23,25,20,25,23,25,19,22,21)
# predictor variables
location <- as.factor(c(rep("ForestA",10), rep("ForestB",10), rep("ForestC",10), rep("ForestA",10), rep("ForestB",10), rep("ForestC",10)))
year <- as.factor(c(rep("2005",30), rep("2015",30)))
# dataframe
my.dataframe <- data.frame(size,location,year)
```

We start by visualizing the data with boxplots. Since we have several factors, we’ll group some of the boxes according to these factors.

```
ggplot(my.dataframe, aes(x = location, y = size, fill = year)) +
geom_boxplot()
```

The assumptions are globally the same as for one-way ANOVA:

**independence of observations**(each individual is represented by 1 entry/measurement ONLY)**normality of distribution**(to be tested for each group, for example with the Shapiro-Wilk test)**homogeneity of variance**(to be tested with, for example, Levene’s test)- groups contain
**no outliers**.

The syntax is `lm(response ~ predictor1 * predictor2, data = dataframe)`

where `response`

is the response variable, `predictor1`

and `predictor2`

are the predictor variables or factors (which categorize the observations) and `dataframe`

the name of the dataframe that contains the data. We first need to fit the linear model with `lm()`

and then we store it in the object `model`

. We then compute and display the table for the analysis using `anova()`

:

```
model <- lm(size ~ location * year, data = my.dataframe)
anova(model)
```

```
## Analysis of Variance Table
##
## Response: size
## Df Sum Sq Mean Sq F value Pr(>F)
## location 2 146.700 73.350 17.8178 1.142e-06 ***
## year 1 14.017 14.017 3.4049 0.07049 .
## location:year 2 0.633 0.317 0.0769 0.92606
## Residuals 54 222.300 4.117
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The table above shows the F value and p-value for each of the main effects, `location`

and `year`

, as well as for the interaction `location:year`

. It also conveniently indicates with stars where significance levels are reached. For instance, `size`

is significantly affected by the factor `location`

, and the p-value is very small (p<0.001, ***). There is no indication of an effect of `year`

, nor an interaction between the two predictor variables.

Since there is no interaction, we may remove the interaction term from the model and run this syntax instead: `lm(response ~ predictor1 + predictor2, data = dataframe)`

. The only difference is that a `+`

replaces the `*`

:

```
model2 <- lm(size ~ location + year, data = my.dataframe)
anova(model2)
```

```
## Analysis of Variance Table
##
## Response: size
## Df Sum Sq Mean Sq F value Pr(>F)
## location 2 146.700 73.350 18.4252 7.101e-07 ***
## year 1 14.017 14.017 3.5209 0.06581 .
## Residuals 56 222.933 3.981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Here again, `size`

is significantly affected by the factor `location`

, but not `year`

. Note that the p-values are different from those we obtained for the first model.

**But this does not tell us anything about the groups which means are significantly different.**

Indeed, this test tells you about main effects and interactions, but does not tell you which groups are significantly different from others. For that, we will need *post hoc* tests. We can run multiple comparisons in linear models [factorial design].

If you need an alternative test to the factorial ANOVA because, for example, you suspect violation of the assumption of normality of distribution, you may use a non-parametric test called **Friedman rank sum test (or simply Friedman’s test)**.