library(lme4); library(lmerTest); library(emmeans)
library(dplyr); library(broom.mixed); library(performance)
5 RCBD Design with Several Crossed Factors
5.1 Background
Factorial design involves studying the impact of multiple factors simultaneously. Each factor can have multiple levels, and combinations of these levels form the experimental conditions. This design allows us to understand the main effects of individual factors and their interactions on the response variable. The statistical model for factorial design is: \[y_{ij} = \mu + \tau_i+ \beta_j + \tau_i\beta_j + \epsilon_{ij}\] Where: \(\mu\) = experiment mean, \(\tau\) = effect of factor A, \(\beta\) = effect of factor B, and \(\tau\beta\) = interaction effect of factor A and B.
Assumptions of this model includes: independent and identically distributed error terms with a constant variance.
5.2 Example Analysis
First step is to load the libraries required for the analysis:
library(nlme); library(broom.mixed); library(emmeans)
library(dplyr); library(performance)
Next, we will load the dataset named ‘cochran.factorial’ from the ‘agridat’ package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus The goal of this analysis is the estimate the effect on d, n, p, k, and their interactions on bean yield.
Note, while importing the data, d, n, p, and k were converted into factor variables using the mutate()
function from dplyr package. This helps in reducing the extra steps of converting each single variable to factor manually.
library(agridat)
<- agridat::cochran.factorial %>%
data1 mutate(d = as.factor(d),
n = as.factor(n),
p = as.factor(p),
k = as.factor(k))
block | blocking unit |
rep | replication unit |
trt | treatment factor, 16 levels |
d | dung treatment, 2 levels |
n | nitrogen treatment, 2 levels |
p | phosphorus treatment, 2 levels |
k | potassium treatment, 2 levels |
yield | yield (lbs) |
The objective of this example is evaluate the individual and interactive effect of “d”, “n”, “p”, and “k” treatments on yield.
5.2.1 Data Integrity Checks
Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric/integer.
str(data1)
'data.frame': 32 obs. of 8 variables:
$ rep : Factor w/ 2 levels "R1","R2": 1 1 1 1 1 1 1 1 1 1 ...
$ block: Factor w/ 2 levels "B1","B2": 1 1 1 1 1 1 1 1 2 2 ...
$ trt : Factor w/ 16 levels "(1)","d","dk",..: 15 10 2 14 5 6 9 11 8 12 ...
$ yield: int 45 55 53 36 41 48 55 42 50 44 ...
$ d : Factor w/ 2 levels "0","1": 2 2 1 2 1 1 1 2 1 2 ...
$ n : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 2 1 2 1 ...
$ p : Factor w/ 2 levels "0","1": 1 2 2 1 2 1 1 2 1 2 ...
$ k : Factor w/ 2 levels "0","1": 2 1 2 1 1 2 1 2 2 1 ...
This looks good.
Next step is to inspect the independent variables and make sure the expected levels are present in the data.
table(data1$d, data1$n, data1$p, data1$k)
, , = 0, = 0
0 1
0 2 2
1 2 2
, , = 1, = 0
0 1
0 2 2
1 2 2
, , = 0, = 1
0 1
0 2 2
1 2 2
, , = 1, = 1
0 1
0 2 2
1 2 2
The design looks well balanced.
Last step is to inspect the dependent variable to ensure its distribution follows the bell-shaped curve and no skewness is there.
hist(data1$yield)
The range is roughly falling into the expected range. I didn’t observe any extreme observations (too high/low), indicating no issues with data. don’t see
5.2.2 Model fitting
Model fitting with R is exactly the same as shown in previous chapters: we need to include all effect, as well as the interaction, which is represented by using the colon indicator ‘:’. Therefore, model syntax is:
yield ~ d + n + p + k + d:n + d:p + d:k + n:p + n:k + p:k + d:n:p:k
which can be abbreviated as:
yield ~ d*n*p*k
<- lmer(yield ~ d*n*p*k + (1|block),
model1_lmer data = data1,
na.action = na.exclude)
tidy(model1_lmer)
# A tibble: 18 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 49 3.70 13.2 16.0 4.91e-10
2 fixed <NA> d1 -9.5 5.24 -1.81 16.0 8.84e- 2
3 fixed <NA> n1 0.500 5.24 0.0955 16.0 9.25e- 1
4 fixed <NA> p1 -11.5 5.24 -2.20 16.0 4.31e- 2
5 fixed <NA> k1 1.00 5.24 0.191 16.0 8.51e- 1
6 fixed <NA> d1:n1 13.5 7.82 1.73 16.0 1.03e- 1
7 fixed <NA> d1:p1 15.5 7.82 1.98 16.0 6.49e- 2
8 fixed <NA> n1:p1 9.50 7.82 1.22 16.0 2.42e- 1
9 fixed <NA> d1:k1 4.00 7.82 0.512 16.0 6.16e- 1
10 fixed <NA> n1:k1 0.500 7.82 0.0639 16.0 9.50e- 1
11 fixed <NA> p1:k1 3.00 7.82 0.384 16.0 7.06e- 1
12 fixed <NA> d1:n1:p1 -14.5 12.1 -1.19 16.0 2.50e- 1
13 fixed <NA> d1:n1:k1 -17.0 12.1 -1.40 16.0 1.81e- 1
14 fixed <NA> d1:p1:k1 -7.00 12.1 -0.576 16.0 5.72e- 1
15 fixed <NA> n1:p1:k1 -4.50 12.1 -0.370 16.0 7.16e- 1
16 fixed <NA> d1:n1:p1:k1 25.0 19.9 1.26 16.0 2.27e- 1
17 ran_pars block sd__(Intercep… 1.26 NA NA NA NA
18 ran_pars Residual sd__Observati… 4.92 NA NA NA NA
<- lme(yield ~ d*n*p*k,
model2_lme random = ~ 1|block,
data = data1,
na.action = na.exclude)
tidy(model2_lme)
# A tibble: 18 × 8
effect group term estimate std.error df statistic p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 49 4.79 15 10.2 3.66e-8
2 fixed <NA> d1 -9.5 6.77 15 -1.40 1.81e-1
3 fixed <NA> n1 0.500 6.77 15 0.0739 9.42e-1
4 fixed <NA> p1 -11.5 6.77 15 -1.70 1.10e-1
5 fixed <NA> k1 1.00 6.77 15 0.148 8.85e-1
6 fixed <NA> d1:n1 13.5 11.6 15 1.16 2.63e-1
7 fixed <NA> d1:p1 15.5 11.6 15 1.34 2.02e-1
8 fixed <NA> n1:p1 9.50 11.6 15 0.818 4.26e-1
9 fixed <NA> d1:k1 4.00 11.6 15 0.345 7.35e-1
10 fixed <NA> n1:k1 0.500 11.6 15 0.0431 9.66e-1
11 fixed <NA> p1:k1 3.00 11.6 15 0.258 8.00e-1
12 fixed <NA> d1:n1:p1 -14.5 21.0 15 -0.690 5.01e-1
13 fixed <NA> d1:n1:k1 -17.0 21.0 15 -0.809 4.31e-1
14 fixed <NA> d1:p1:k1 -7.00 21.0 15 -0.333 7.44e-1
15 fixed <NA> n1:p1:k1 -4.50 21.0 15 -0.214 8.33e-1
16 fixed <NA> d1:n1:p1:k1 25.0 39.7 15 0.630 5.38e-1
17 ran_pars block sd_(Intercept) 3.28 NA NA NA NA
18 ran_pars Residual sd_Observation 4.92 NA NA NA NA
Instead of summary()
function, we used tidy()
function from ‘broom.mixed’ package to get a short summary output of the model.
5.2.3 Check Model Assumptions
check_model(model1_lmer, check = c('normality', 'linearity'))
check_model(model2_lme, check = c('normality', 'linearity'))
The linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects all fall nearly on a straight line so we can be satisfied with that.
5.2.4 Inference
We can get an ANOVA table for the linear mixed model using the function anova()
, which works for both lmer()
and lme()
models..
::Anova(model1_lmer, type = 'III', test.statistic="F") car
Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
Response: yield
F Df Df.res Pr(>F)
(Intercept) 175.2030 1 20.439 1.729e-11 ***
d 3.2928 1 20.439 0.08429 .
n 0.0091 1 20.439 0.92484
p 4.8252 1 20.439 0.03974 *
k 0.0365 1 20.439 0.85040
d:n 2.9812 1 25.421 0.09637 .
d:p 3.9300 1 25.421 0.05834 .
n:p 1.4763 1 25.421 0.23552
d:k 0.2617 1 25.421 0.61335
n:k 0.0041 1 25.421 0.94951
p:k 0.1472 1 25.421 0.70440
d:n:p 1.4251 1 37.012 0.24016
d:n:k 1.9589 1 37.012 0.16996
d:p:k 0.3321 1 37.012 0.56789
n:p:k 0.1373 1 37.012 0.71313
d:n:p:k 1.5778 1 66.709 0.21346
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model2_lme, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 15 104.83445 <.0001
d 1 15 1.97029 0.1808
n 1 15 0.00546 0.9421
p 1 15 2.88720 0.1099
k 1 15 0.02183 0.8845
d:n 1 15 1.35278 0.2630
d:p 1 15 1.78330 0.2017
n:p 1 15 0.66990 0.4259
d:k 1 15 0.11876 0.7352
n:k 1 15 0.00186 0.9662
p:k 1 15 0.06680 0.7996
d:n:p 1 15 0.47580 0.5009
d:n:k 1 15 0.65401 0.4313
d:p:k 1 15 0.11089 0.7437
n:p:k 1 15 0.04583 0.8334
d:n:p:k 1 15 0.39719 0.5380
Let’s find estimates for some of the factors such as n, p, and n:k interaction. We will try the random intercept model first.
emmeans(model1_lmer, specs = ~ n)
NOTE: Results may be misleading due to involvement in interactions
n emmean SE df lower.CL upper.CL
0 43.8 1.52 37 40.7 46.8
1 50.1 1.52 37 47.0 53.2
Results are averaged over the levels of: d, p, k
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model1_lmer, specs = ~ p)
NOTE: Results may be misleading due to involvement in interactions
p emmean SE df lower.CL upper.CL
0 47.4 1.52 37 44.3 50.5
1 46.5 1.52 37 43.4 49.6
Results are averaged over the levels of: d, n, k
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model1_lmer, specs = ~ n:k)
NOTE: Results may be misleading due to involvement in interactions
n k emmean SE df lower.CL upper.CL
0 0 42.4 1.95 25.4 38.4 46.4
1 0 50.8 1.95 25.4 46.7 54.8
0 1 45.1 1.95 25.4 41.1 49.1
1 1 49.5 1.95 25.4 45.5 53.5
Results are averaged over the levels of: d, p
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ n)
NOTE: Results may be misleading due to involvement in interactions
n emmean SE df lower.CL upper.CL
0 43.8 2.63 1 10.4 77.1
1 50.1 2.63 1 16.7 83.5
Results are averaged over the levels of: d, p, k
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ p)
NOTE: Results may be misleading due to involvement in interactions
p emmean SE df lower.CL upper.CL
0 47.4 2.63 1 14.0 80.8
1 46.5 2.63 1 13.1 79.9
Results are averaged over the levels of: d, n, k
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(model2_lme, specs = ~ n:k)
NOTE: Results may be misleading due to involvement in interactions
n k emmean SE df lower.CL upper.CL
0 0 42.4 2.9 1 5.50 79.2
1 0 50.8 2.9 1 13.88 87.6
0 1 45.1 2.9 1 8.25 82.0
1 1 49.5 2.9 1 12.63 86.4
Results are averaged over the levels of: d, p
Degrees-of-freedom method: containment
Confidence level used: 0.95
- Unbalanced factorial design