6 Split Plot Design
Split-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized (CRD), completely randomized block (RCBD). The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.
6.1 Details for Split Plot Designs
- Whole Plot Randomized as a completely randomized design
The statistical model structure this design:
\[y_{ijk} = \mu + \alpha_i + \beta_k + (\alpha_j\beta_k) + \epsilon_{ij} + \delta_{ijk} \] Where:
\(\mu\)= overall experimental mean, \(\alpha\) = main effect of whole plot (fixed), \(\beta\) = main effect of subplot (fixed), \(\alpha\)\(\tau\) = interaction between factors A and B, \(\epsilon_{ij}\) = whole plot error, \(\delta_{ijk}\) = subplot error.
\[ \epsilon \sim N(0, \sigma_\epsilon)\]
\[\ \delta \sim N(0, \sigma_\delta)\]
Both the error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \(\sigma_\epsilon\) and \(\sigma_\delta\), respectively.
- Whole Plot Randomized as an RCBD
This is also referred as “Split-Block RCB” design. The statistical model structure for split plot design:
\[y_{ijk} = \mu + \rho_j + \alpha_i + \beta_k + (\alpha_i\beta_k) + \epsilon_{ij} + \delta_{ijk}\] Where:
\(\mu\) = overall experimental mean, \(\rho\) = block effect (random), \(\alpha\) = main effect of whole plot (fixed), \(\beta\) = main effect of subplot (fixed), \(\alpha\)\(\beta\) = interaction between factors A and B, \(\epsilon_{ij}\) = whole plot error, \(\delta_{ijk}\) = subplot error.
\[ \epsilon \sim N(0, \sigma_\epsilon)\]
\[\ \delta \sim N(0, \sigma_\delta)\]
Both the overall error and the rep effects are assumed to be normally distributed with a mean of zero and standard deviations of \(\sigma\) and \(\sigma_\delta\), respectively.
In these model, the error terms, \(\epsilon\) are assumed to be “iid”, that is, independently and identically distributed. This means they have constant variance and they each individual error term is independent from the others.
6.2 Analysis Examples
Load required libraries
library(lme4); library(lmerTest); library(emmeans)
library(dplyr); library(performance); library(ggplot2)
library(broom.mixed)
library(nlme); library(performance); library(emmeans)
library(dplyr); library(ggplot2); library(broom.mixed)
6.2.1 Example model for CRD Split Plot Designs
Let’s import height data. It is located here if you want to download it yourself (recommended).
The data (Height data) for this example involves a CRD split plot designed experiment. Treatments are 4 Timings (times) and 8 managements (manage). The whole plots are times and management represents subplot and 3 replications.
<- readxl::read_excel(here::here("data", "height_data.xlsx")) height_data
rep | replication unit |
time | Main plot with 4 levels |
Manage | Split-plot with 8 levels |
sample | two sampling units per each rep |
height | yield (lbs per acre) |
6.2.1.1 Data integrity checks
- Run a cross tabulation using
table()
to check the arrangement of whole-plots and sub-plots.
table(height_data$time, height_data$manage)
M1 M2 M3 M4 M5 M6 M7 M8
T1 6 6 6 6 6 6 6 6
T2 6 6 6 6 6 6 6 6
T3 6 6 6 6 6 6 6 6
T4 6 6 6 6 6 6 6 6
The levels of whole plots and subplots are balanced.
- Look at structure of the data using
str()
, this will help in identifying class of the variable. In this data set, class of the whole-plot, sub-plot, and block should be factor/character and response variable (height) should be numeric.
str(height_data)
tibble [192 × 5] (S3: tbl_df/tbl/data.frame)
$ time : chr [1:192] "T1" "T1" "T1" "T1" ...
$ manage: chr [1:192] "M1" "M2" "M3" "M4" ...
$ rep : chr [1:192] "R1" "R1" "R1" "R1" ...
$ sample: chr [1:192] "S1" "S1" "S1" "S1" ...
$ height: num [1:192] 104.5 92.3 96.8 94.7 105.7 ...
The ‘time’, ‘manage’, and ‘rep’ are character and variable height is numeric. The structure of the data is in format as needed.
- Check the number of missing values in each column.
apply(height_data, 2, function(x) sum(is.na(x)))
time manage rep sample height
0 0 0 0 0
- Exploratory boxplot to look at the height observations at different times with variable managements.
ggplot(data = height_data, aes(y = height, x = time)) +
geom_boxplot(aes(fill = manage), alpha = 0.6)
Last, check the dependent variable by plotting a histogram of height data.
hist(height_data$height, main = "", xlab = "yield")
The distribution of height data looks close to normal.
6.2.1.2 Model building
Recall the model:
\[y_{ijk} = \mu + \gamma_i + \alpha_j + \beta_k + (\alpha_j\beta_k) + \epsilon_{ijk}\]
For this model, \(\gamma\) = block/rep effect (random), \(\alpha\) = main effect of whole plot (fixed), \(\beta\) = main effect of subplot (fixed), \(\alpha\)\(\beta\) = interaction between factors A and B (fixed).
In order to test the main effects of the treatments as well as the interaction between two factors, we can specify that in model as: time + manage + time:manage
or time*manage
.
When dealing with split plot design across reps or blocks, the random effects needs to be nested hierarchically, from largest unit to smallest. For example, in this example the random effects will be designated as (1 | rep/time)
. This implies that we use random intercept at each of the rep and time (whole-plot) level.
<- lmer(height ~ time*manage + (1|rep/time), data = height_data)
model_lmer tidy(model_lmer)
# A tibble: 35 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 108. 3.19 33.9 4.38 0.00000181
2 fixed <NA> timeT2 3.18 2.63 1.21 104. 0.229
3 fixed <NA> timeT3 -2.25 2.63 -0.855 104. 0.394
4 fixed <NA> timeT4 1.28 2.63 0.488 104. 0.627
5 fixed <NA> manageM2 -4.45 2.55 -1.74 152. 0.0832
6 fixed <NA> manageM3 -5.30 2.55 -2.08 152. 0.0395
7 fixed <NA> manageM4 -6.18 2.55 -2.42 152. 0.0166
8 fixed <NA> manageM5 -5.02 2.55 -1.97 152. 0.0511
9 fixed <NA> manageM6 -3.42 2.55 -1.34 152. 0.183
10 fixed <NA> manageM7 -9.75 2.55 -3.82 152. 0.000193
# ℹ 25 more rows
<-lme(height ~ time*manage,
model_lme random = ~ 1|rep/time, data = height_data)
tidy(model_lme)
Warning in tidy.lme(model_lme): ran_pars not yet implemented for multiple
levels of nesting
# A tibble: 32 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 108. 3.19 152 33.9 9.59e-73
2 fixed timeT2 3.18 2.63 6 1.21 2.72e- 1
3 fixed timeT3 -2.25 2.63 6 -0.855 4.25e- 1
4 fixed timeT4 1.28 2.63 6 0.488 6.43e- 1
5 fixed manageM2 -4.45 2.55 152 -1.74 8.32e- 2
6 fixed manageM3 -5.30 2.55 152 -2.08 3.95e- 2
7 fixed manageM4 -6.18 2.55 152 -2.42 1.66e- 2
8 fixed manageM5 -5.02 2.55 152 -1.97 5.11e- 2
9 fixed manageM6 -3.42 2.55 152 -1.34 1.83e- 1
10 fixed manageM7 -9.75 2.55 152 -3.82 1.93e- 4
# ℹ 22 more rows
6.2.1.3 Check Model Assumptions
Before interpreting the model we should investigate the assumptions of the model to ensure any conclusions we draw are valid. There are assumptions that we can check are 1. Homogeneity (equal variance) 2. normality of residuals 3. values with high leverage.
We will use check_model()
function from ‘performance’ package. The plots generated using this code gives a visual check of various assumptions including normality of residuals, normality of random effects, heteroscedasticity, homogeneity of variance, and multicollinearity.
check_model(model_lmer, check = c('normality', 'linearity'))
check_model(model_lme, check = c('normality', 'linearity'))
In this case the residuals fit the assumptions of the model well.
6.2.1.4 Inference
The anova()
function prints the the rows of analysis of variance table for whole-plot, sub-plot, and their interactions. We observed a significant effect of manage factor only.
::Anova(model_lmer, type = 'III', test.statistics = "F") car
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: height
Chisq Df Pr(>Chisq)
(Intercept) 1148.5658 1 < 2e-16 ***
time 4.5139 3 0.21105
manage 15.9090 7 0.02596 *
time:manage 24.3349 21 0.27711
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model_lme, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 152 1148.6202 <.0001
time 3 6 1.5046 0.3061
manage 7 152 2.2727 0.0315
time:manage 21 152 1.1588 0.2955
We can further compute estimated marginal means for each fixed effect and interaction effect can be obtained using emmeans()
.
<- emmeans(model_lmer, ~ time) m1
NOTE: Results may be misleading due to involvement in interactions
m1
time emmean SE df lower.CL upper.CL
T1 103 2.7 2.27 92.8 114
T2 106 2.7 2.27 95.5 116
T3 100 2.7 2.27 89.8 111
T4 104 2.7 2.27 94.0 115
Results are averaged over the levels of: manage
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
<- emmeans(model_lme, ~ time) m2
NOTE: Results may be misleading due to involvement in interactions
m2
time emmean SE df lower.CL upper.CL
T1 103 2.7 2 91.6 115
T2 106 2.7 2 94.2 118
T3 100 2.7 2 88.6 112
T4 104 2.7 2 92.8 116
Results are averaged over the levels of: manage
Degrees-of-freedom method: containment
Confidence level used: 0.95
Further, a pairwise comparison or contrasts can be analyzed using estimated means. In this model, ‘time’ factor has 4 levels. We can use pairs()
function to evaluate pairwise comparison among different ‘time’ levels.
Here’s a example using pairs()
function to compare difference in height among different time points.
pairs(m1)
contrast estimate SE df t.ratio p.value
T1 - T2 -2.68 1.11 6 -2.426 0.1719
T1 - T3 2.95 1.11 6 2.665 0.1287
T1 - T4 -1.21 1.11 6 -1.091 0.7072
T2 - T3 5.63 1.11 6 5.091 0.0089
T2 - T4 1.48 1.11 6 1.334 0.5767
T3 - T4 -4.15 1.11 6 -3.756 0.0358
Results are averaged over the levels of: manage
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 4 estimates
pairs(m2)
contrast estimate SE df t.ratio p.value
T1 - T2 -2.68 1.11 6 -2.426 0.1719
T1 - T3 2.95 1.11 6 2.665 0.1287
T1 - T4 -1.21 1.11 6 -1.091 0.7072
T2 - T3 5.63 1.11 6 5.091 0.0089
T2 - T4 1.48 1.11 6 1.334 0.5767
T3 - T4 -4.15 1.11 6 -3.756 0.0358
Results are averaged over the levels of: manage
Degrees-of-freedom method: containment
P value adjustment: tukey method for comparing a family of 4 estimates
pairs()
The default p-value adjustment in pairs()
function is “tukey”, other options include “holm”, “hochberg”, “BH”, “BY”, and “none”. In addition, it’s okay to use this function when independent variable has few factors (2-4). For variable with multiple levels, it’s better to use custom contrasts. For more information on custom contrasts please visit Chapter 12.
6.2.2 Example model for RCBD Split Plot Designs
The oats data used in this example is from the MASS package. The design is RCBD split plot with 6 blocks, 3 main plots and 4 subplots. The primary outcome variable was oat yield.
block | blocking unit |
Variety (V) | Main plot with 3 levels |
Nitrogen (N) | Split-plot with 4 levels |
yield (Y) | yield (lbs per acre) |
The objective of this analysis is to study the impact of different varieties and nitrogen application rates on oat yields.
To fully examine the yield of oats due to varieties and nutrient levels in a split plots. We will need to statistically analyse and compare the effects of varieties (main plot), nutrient levels (subplot), their interaction.
library(MASS)
data("oats")
head(oats,5)
B V N Y
1 I Victory 0.0cwt 111
2 I Victory 0.2cwt 130
3 I Victory 0.4cwt 157
4 I Victory 0.6cwt 174
5 I Golden.rain 0.0cwt 117
6.2.2.1 Data integrity checks
Let’s look at the structure of the data. The “B”, “V”, and “N” needs to be ‘factor’ and “Y” should be numeric.
str(oats)
'data.frame': 72 obs. of 4 variables:
$ B: Factor w/ 6 levels "I","II","III",..: 1 1 1 1 1 1 1 1 1 1 ...
$ V: Factor w/ 3 levels "Golden.rain",..: 3 3 3 3 1 1 1 1 2 2 ...
$ N: Factor w/ 4 levels "0.0cwt","0.2cwt",..: 1 2 3 4 1 2 3 4 1 2 ...
$ Y: int 111 130 157 174 117 114 161 141 105 140 ...
Next, run the table() command to verify the levels of main-plots and sub-plots.
table(oats$V, oats$N)
0.0cwt 0.2cwt 0.4cwt 0.6cwt
Golden.rain 6 6 6 6
Marvellous 6 6 6 6
Victory 6 6 6 6
- Check the number of missing values in each column.
apply(oats, 2, function(x) sum(is.na(x)))
B V N Y
0 0 0 0
Last, check the dependent variable by plotting a histogram of yield data.
hist(oats$Y, main = "", xlab = "yield")
6.2.2.2 Model Building the Model
We are evaluating the effect of V, N and their interaction on yield. The 1|B/V
implies that random intercepts vary with block and V within each block.
Recall the model:
\[y_{ijk} = \mu + \rho_j + \alpha_i + \beta_k + (\alpha_i\beta_k) + \epsilon_{ij} + \delta_{ijk}\] Where:
\(\mu\) = overall experimental mean, \(\rho\) = block effect (random), \(\alpha\) = main effect of whole plot (fixed), \(\beta\) = main effect of subplot (fixed), \(\alpha\)\(\beta\) = interaction between factors A and B, \(\epsilon_{ij}\) = whole plot error, \(\delta_{ijk}\) = subplot error.
<- lmer(Y ~ V + N + V:N + (1|B/V),
model2_lmer data = oats,
na.action = na.exclude)
tidy(model2_lmer)
# A tibble: 15 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 80.0 9.11 8.78 16.1 1.55e-7
2 fixed <NA> VMarvellous 6.67 9.72 0.686 30.2 4.98e-1
3 fixed <NA> VVictory -8.50 9.72 -0.875 30.2 3.89e-1
4 fixed <NA> N0.2cwt 18.5 7.68 2.41 45.0 2.02e-2
5 fixed <NA> N0.4cwt 34.7 7.68 4.51 45.0 4.58e-5
6 fixed <NA> N0.6cwt 44.8 7.68 5.84 45.0 5.48e-7
7 fixed <NA> VMarvellous:N0… 3.33 10.9 0.307 45.0 7.60e-1
8 fixed <NA> VVictory:N0.2c… -0.333 10.9 -0.0307 45.0 9.76e-1
9 fixed <NA> VMarvellous:N0… -4.17 10.9 -0.383 45.0 7.03e-1
10 fixed <NA> VVictory:N0.4c… 4.67 10.9 0.430 45.0 6.70e-1
11 fixed <NA> VMarvellous:N0… -4.67 10.9 -0.430 45.0 6.70e-1
12 fixed <NA> VVictory:N0.6c… 2.17 10.9 0.199 45.0 8.43e-1
13 ran_pars V:B sd__(Intercept) 10.3 NA NA NA NA
14 ran_pars B sd__(Intercept) 14.6 NA NA NA NA
15 ran_pars Residual sd__Observation 13.3 NA NA NA NA
<- lme(Y ~ V + N + V:N ,
model2_lme random = ~1|B/V,
data = oats,
na.action = na.exclude)
tidy(model2_lme)
Warning in tidy.lme(model2_lme): ran_pars not yet implemented for multiple
levels of nesting
# A tibble: 12 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 80 9.11 45 8.78 2.56e-11
2 fixed VMarvellous 6.67 9.72 10 0.686 5.08e- 1
3 fixed VVictory -8.50 9.72 10 -0.875 4.02e- 1
4 fixed N0.2cwt 18.5 7.68 45 2.41 2.02e- 2
5 fixed N0.4cwt 34.7 7.68 45 4.51 4.58e- 5
6 fixed N0.6cwt 44.8 7.68 45 5.84 5.48e- 7
7 fixed VMarvellous:N0.2cwt 3.33 10.9 45 0.307 7.60e- 1
8 fixed VVictory:N0.2cwt -0.333 10.9 45 -0.0307 9.76e- 1
9 fixed VMarvellous:N0.4cwt -4.17 10.9 45 -0.383 7.03e- 1
10 fixed VVictory:N0.4cwt 4.67 10.9 45 0.430 6.70e- 1
11 fixed VMarvellous:N0.6cwt -4.67 10.9 45 -0.430 6.70e- 1
12 fixed VVictory:N0.6cwt 2.17 10.9 45 0.199 8.43e- 1
6.2.2.3 Check Model Assumptions
As shown in example 1, We need to verify the normality of residuals and homogeneous variance. Here we are using the check_model()
function from the performance package.
check_model(model2_lmer, check = c('normality', 'linearity'))
check_model(model2_lme, check = c('normality', 'linearity'))
6.2.2.4 Inference
We can evaluate the model for the analysis of variance, for V, N and their interaction effect.
::Anova(model2_lmer, type = "III", test.statistics = "F") car
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: Y
Chisq Df Pr(>Chisq)
(Intercept) 77.1664 1 < 2.2e-16 ***
V 2.4491 2 0.2939
N 39.0683 3 1.679e-08 ***
V:N 1.8169 6 0.9357
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model2_lme, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 45 77.16729 <.0001
V 2 10 1.22454 0.3344
N 3 45 13.02273 <.0001
V:N 6 45 0.30282 0.9322
Next, we can estimate marginal means for V, N, or their interaction (V*N) effect.
<- emmeans(model2_lmer, ~ V *N)
emm1 emm1
V N emmean SE df lower.CL upper.CL
Golden.rain 0.0cwt 80.0 9.11 16.1 60.7 99.3
Marvellous 0.0cwt 86.7 9.11 16.1 67.4 106.0
Victory 0.0cwt 71.5 9.11 16.1 52.2 90.8
Golden.rain 0.2cwt 98.5 9.11 16.1 79.2 117.8
Marvellous 0.2cwt 108.5 9.11 16.1 89.2 127.8
Victory 0.2cwt 89.7 9.11 16.1 70.4 109.0
Golden.rain 0.4cwt 114.7 9.11 16.1 95.4 134.0
Marvellous 0.4cwt 117.2 9.11 16.1 97.9 136.5
Victory 0.4cwt 110.8 9.11 16.1 91.5 130.1
Golden.rain 0.6cwt 124.8 9.11 16.1 105.5 144.1
Marvellous 0.6cwt 126.8 9.11 16.1 107.5 146.1
Victory 0.6cwt 118.5 9.11 16.1 99.2 137.8
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
<- emmeans(model2_lme, ~ V *N)
emm1 emm1
V N emmean SE df lower.CL upper.CL
Golden.rain 0.0cwt 80.0 9.11 5 56.6 103.4
Marvellous 0.0cwt 86.7 9.11 5 63.3 110.1
Victory 0.0cwt 71.5 9.11 5 48.1 94.9
Golden.rain 0.2cwt 98.5 9.11 5 75.1 121.9
Marvellous 0.2cwt 108.5 9.11 5 85.1 131.9
Victory 0.2cwt 89.7 9.11 5 66.3 113.1
Golden.rain 0.4cwt 114.7 9.11 5 91.3 138.1
Marvellous 0.4cwt 117.2 9.11 5 93.8 140.6
Victory 0.4cwt 110.8 9.11 5 87.4 134.2
Golden.rain 0.6cwt 124.8 9.11 5 101.4 148.2
Marvellous 0.6cwt 126.8 9.11 5 103.4 150.2
Victory 0.6cwt 118.5 9.11 5 95.1 141.9
Degrees-of-freedom method: containment
Confidence level used: 0.95
In the next chapter, we will continue with extension of split plot design called split-split plot design.