library(lme4); library(lmerTest); library(emmeans)
library(dplyr); library(broom.mixed); library(performance)
6 RCBD Design with Several Crossed Factors
6.1 Background
Factorial design involves studying the impact of multiple factors simultaneously. Each factor can have multiple levels, and combinations of these levels form the experimental conditions. This design allows us to understand the main effects of individual factors and their interactions on the response variable. The statistical model for factorial design is: \[y_{ij} = \mu + \tau_i+ \beta_j + (\tau\beta)_{ij} + \epsilon_{ij}\] Where:
\(\mu\) = experiment mean
\(\tau\) = effect of factor A
\(\beta\) = effect of factor B
\(\tau\beta\) = interaction effect of factor A and B.
Assumptions of this model includes: independent and identically distributed error terms with a constant variance \(\sigma^2\).
6.2 Example Analysis
First step is to load the libraries required for the analysis:
library(nlme); library(broom.mixed); library(emmeans)
library(dplyr); library(performance)
Next, we will load the dataset named ‘cochran.factorial’ from the agridat package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus.
The objective of this example is evaluate the individual and interactive effect of “d”, “n”, “p”, and “k” treatments on bean yield.
Note, while importing the data, d, n, p, and k were converted into factor variables using the mutate()
function from dplyr package. This helps in reducing the extra steps of converting each single variable to factor manually.
library(agridat)
<- agridat::cochran.factorial %>%
data1 mutate(d = as.factor(d),
n = as.factor(n),
p = as.factor(p),
k = as.factor(k))
block | blocking unit |
rep | replication unit |
trt | treatment factor, 16 levels |
d | dung treatment, 2 levels |
n | nitrogen treatment, 2 levels |
p | phosphorus treatment, 2 levels |
k | potassium treatment, 2 levels |
yield | yield (lbs) |
6.2.1 Data Integrity Checks
- Check structure of the data
First step is to Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric.
str(data1)
'data.frame': 32 obs. of 8 variables:
$ rep : Factor w/ 2 levels "R1","R2": 1 1 1 1 1 1 1 1 1 1 ...
$ block: Factor w/ 2 levels "B1","B2": 1 1 1 1 1 1 1 1 2 2 ...
$ trt : Factor w/ 16 levels "(1)","d","dk",..: 15 10 2 14 5 6 9 11 8 12 ...
$ yield: int 45 55 53 36 41 48 55 42 50 44 ...
$ d : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 1 2 ...
$ n : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 2 1 2 1 ...
$ p : Factor w/ 2 levels "0","1": 1 2 2 1 2 1 1 2 1 2 ...
$ k : Factor w/ 2 levels "0","1": 2 1 2 1 1 2 1 2 2 1 ...
This looks good.
- Inspect the independent variables
we are inspecting levels of independent variables to make sure the expected levels are present in the data.
table(data1$d, data1$n, data1$p, data1$k)
, , = 0, = 0
0 1
0 2 2
1 2 2
, , = 1, = 0
0 1
0 2 2
1 2 2
, , = 0, = 1
0 1
0 2 2
1 2 2
, , = 1, = 1
0 1
0 2 2
1 2 2
The design looks well balanced.
- Check the extent of missing data
colSums(is.na(data1))
rep block trt yield d n p k
0 0 0 0 0 0 0 0
There are no missing values in this data set.
- Inspect the dependent variable
This is the last step is to inspect the dependent variable to ensure it looks as expected.

hist(data1$yield, , col = "turquoise")
No extreme values are observed in the dependent variable, and the distribution looks as expected.
6.2.2 Model fitting
Model fitting with R is exactly the same as shown in previous chapters: we need to include all fixed effects, as well as the interaction, which is represented by using the colon indicator ‘:’.
The model syntax is:
yield ~ d + n + p + k + d:n + d:p + d:k + n:p + n:k + p:k + d:n:p:k
which can be abbreviated as:
yield ~ d*n*p*k
In this analysis, “d”, “n”, “p”, & “k” are fixed factors and block is a random effect.
<- lmer(yield ~ d*n*p*k + (1|block),
model1_lmer data = data1,
na.action = na.exclude)
tidy(model1_lmer)
# A tibble: 18 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 49 3.70 13.2 16.0 4.91e-10
2 fixed <NA> d1 -9.5 5.24 -1.81 16.0 8.84e- 2
3 fixed <NA> n1 0.500 5.24 0.0955 16.0 9.25e- 1
4 fixed <NA> p1 -11.5 5.24 -2.20 16.0 4.31e- 2
5 fixed <NA> k1 1.00 5.24 0.191 16.0 8.51e- 1
6 fixed <NA> d1:n1 13.5 7.82 1.73 16.0 1.03e- 1
7 fixed <NA> d1:p1 15.5 7.82 1.98 16.0 6.49e- 2
8 fixed <NA> n1:p1 9.50 7.82 1.22 16.0 2.42e- 1
9 fixed <NA> d1:k1 4.00 7.82 0.512 16.0 6.16e- 1
10 fixed <NA> n1:k1 0.500 7.82 0.0639 16.0 9.50e- 1
11 fixed <NA> p1:k1 3.00 7.82 0.384 16.0 7.06e- 1
12 fixed <NA> d1:n1:p1 -14.5 12.1 -1.19 16.0 2.50e- 1
13 fixed <NA> d1:n1:k1 -17.0 12.1 -1.40 16.0 1.81e- 1
14 fixed <NA> d1:p1:k1 -7.00 12.1 -0.576 16.0 5.72e- 1
15 fixed <NA> n1:p1:k1 -4.50 12.1 -0.370 16.0 7.16e- 1
16 fixed <NA> d1:n1:p1:k1 25.0 19.9 1.26 16.0 2.27e- 1
17 ran_pars block sd__(Intercep… 1.26 NA NA NA NA
18 ran_pars Residual sd__Observati… 4.92 NA NA NA NA
<- lme(yield ~ d*n*p*k,
model2_lme random = ~ 1|block,
data = data1,
na.action = na.exclude)
tidy(model2_lme)
# A tibble: 18 × 8
effect group term estimate std.error df statistic p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 49 4.79 15 10.2 3.66e-8
2 fixed <NA> d1 -9.50 6.77 15 -1.40 1.81e-1
3 fixed <NA> n1 0.500 6.77 15 0.0739 9.42e-1
4 fixed <NA> p1 -11.5 6.77 15 -1.70 1.10e-1
5 fixed <NA> k1 1.00 6.77 15 0.148 8.85e-1
6 fixed <NA> d1:n1 13.5 11.6 15 1.16 2.63e-1
7 fixed <NA> d1:p1 15.5 11.6 15 1.34 2.02e-1
8 fixed <NA> n1:p1 9.50 11.6 15 0.818 4.26e-1
9 fixed <NA> d1:k1 4.00 11.6 15 0.345 7.35e-1
10 fixed <NA> n1:k1 0.500 11.6 15 0.0431 9.66e-1
11 fixed <NA> p1:k1 3.00 11.6 15 0.258 8.00e-1
12 fixed <NA> d1:n1:p1 -14.5 21.0 15 -0.690 5.01e-1
13 fixed <NA> d1:n1:k1 -17.0 21.0 15 -0.809 4.31e-1
14 fixed <NA> d1:p1:k1 -7.00 21.0 15 -0.333 7.44e-1
15 fixed <NA> n1:p1:k1 -4.50 21.0 15 -0.214 8.33e-1
16 fixed <NA> d1:n1:p1:k1 25.0 39.7 15 0.630 5.38e-1
17 ran_pars block sd_(Intercept) 3.28 NA NA NA NA
18 ran_pars Residual sd_Observation 4.92 NA NA NA NA
The tidy()
function from the broom.mixed package provides a short summary output of the model.
6.2.3 Check Model Assumptions
check_model(model1_lmer, check = c('normality', 'linearity'))
check_model(model2_lme, check = c('normality', 'linearity'))
The linearity and homogeneity of variance plots show no trend. There are modest departure in the normality of residuals with heavy tails.
6.2.4 Inference
We can get an ANOVA table for the linear mixed model using the function anova()
, which works for both lmer()
and lme()
models.
anova(model1_lmer, type = "3")
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
d 2.00 2.00 1 16 0.0825 0.777658
n 325.12 325.12 1 16 13.4072 0.002107 **
p 6.12 6.12 1 16 0.2526 0.622113
k 4.50 4.50 1 16 0.1856 0.672379
d:n 32.00 32.00 1 16 1.3196 0.267550
d:p 242.00 242.00 1 16 9.9794 0.006079 **
n:p 78.12 78.12 1 16 3.2216 0.091583 .
d:k 6.12 6.12 1 16 0.2526 0.622113
n:k 32.00 32.00 1 16 1.3196 0.267550
p:k 24.50 24.50 1 16 1.0103 0.329789
d:n:p 2.00 2.00 1 16 0.0825 0.777658
d:n:k 10.13 10.13 1 16 0.4175 0.527334
d:p:k 15.13 15.13 1 16 0.6237 0.441219
n:p:k 32.00 32.00 1 16 1.3196 0.267550
d:n:p:k 38.26 38.26 1 16 1.5778 0.227113
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model2_lme, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 15 104.83445 <.0001
d 1 15 1.97029 0.1808
n 1 15 0.00546 0.9421
p 1 15 2.88720 0.1099
k 1 15 0.02183 0.8845
d:n 1 15 1.35278 0.2630
d:p 1 15 1.78330 0.2017
n:p 1 15 0.66990 0.4259
d:k 1 15 0.11876 0.7352
n:k 1 15 0.00186 0.9662
p:k 1 15 0.06680 0.7996
d:n:p 1 15 0.47580 0.5009
d:n:k 1 15 0.65401 0.4313
d:p:k 1 15 0.11089 0.7437
n:p:k 1 15 0.04583 0.8334
d:n:p:k 1 15 0.39719 0.5380
Here we did not observe any difference in group variance of interaction effects. Among all treatment factors, only “p” had a significant effect on bean yield when evaluated at significance level of 0.05.
Let’s find estimates for some of the factors such as N, P and the N-by-K interactions in order to understand the combined effect of nitrogen and potassium on bean yield.
emmeans(model1_lmer, specs = ~ n)
NOTE: Results may be misleading due to involvement in interactions
n emmean SE df lower.CL upper.CL
0 43.8 1.52 37 40.7 46.8
1 50.1 1.52 37 47.0 53.2
Results are averaged over the levels of: d, p, k
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model1_lmer, specs = ~ p)
NOTE: Results may be misleading due to involvement in interactions
p emmean SE df lower.CL upper.CL
0 47.4 1.52 37 44.3 50.5
1 46.5 1.52 37 43.4 49.6
Results are averaged over the levels of: d, n, k
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model1_lmer, specs = ~ n:p)
NOTE: Results may be misleading due to involvement in interactions
n p emmean SE df lower.CL upper.CL
0 0 45.8 1.95 25.4 41.7 49.8
1 0 49.0 1.95 25.4 45.0 53.0
0 1 41.8 1.95 25.4 37.7 45.8
1 1 51.2 1.95 25.4 47.2 55.3
Results are averaged over the levels of: d, k
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ n)
NOTE: Results may be misleading due to involvement in interactions
n emmean SE df lower.CL upper.CL
0 43.8 2.63 1 10.4 77.1
1 50.1 2.63 1 16.7 83.5
Results are averaged over the levels of: d, p, k
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ p)
NOTE: Results may be misleading due to involvement in interactions
p emmean SE df lower.CL upper.CL
0 47.4 2.63 1 14.0 80.8
1 46.5 2.63 1 13.1 79.9
Results are averaged over the levels of: d, n, k
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ n:p)
NOTE: Results may be misleading due to involvement in interactions
n p emmean SE df lower.CL upper.CL
0 0 45.8 2.9 1 8.88 82.6
1 0 49.0 2.9 1 12.13 85.9
0 1 41.8 2.9 1 4.88 78.6
1 1 51.2 2.9 1 14.38 88.1
Results are averaged over the levels of: d, k
Degrees-of-freedom method: containment
Confidence level used: 0.95
The code chunks showed above estimated marginal means for individual effects of “n” and “p” and their interaction (n:p) effect on bean yield. Make sure to give an attention to the warning message that means are averaged over the “d”, “p”, and “k” levels. It’s an important detail that needs to be pointed out while making conclusions. When working with factorial designs make sure to carefully interpret ANOVA and estimated marginal means for main and interaction effects.