library(lme4); library(lmerTest); library(emmeans)
library(dplyr); library(broom.mixed); library(performance)
9 Incomplete Block Design
9.1 Background
The block design in Chapter 4 was complete, meaning that every block contained all the treatments. In practice, it may not be possible to have too many treatments in each block. Sometimes, there are also situations where it is advised to not have many treatments in each block.
In such cases, incomplete block designs are used where we have to decide what subset of treatments to be used in an individual block. This will work well if we enough blocks. However, if we only have small number of blocks, there would be the risk that certain quantities are not estimable anymore.
To avoid having a disconnected design, a balanced incomplete block design can be used
The statistical model for balanced incomplete block design is:
\[y_{ij} = \mu + \alpha_i + \beta_j + \epsilon_{ij}\] Where:
\(\mu\) = overall experimental mean \(\alpha\) = treatment effects (fixed) \(\beta\) = block effects (random) \(\epsilon\) = error terms
\[ \epsilon \sim N(0, \sigma)\]
\[ \beta \sim N(0, \sigma_b)\] There are few key points that we need to keep in mind while designing incomplete block designs:
A drawback of this design is that block effect and treatment effects are confounded.
To eliminate of block effects, better compare treatments within a block.
No treatment should appear twice in any block as they contributes nothing no within block comparisons.
An excellent description of incomplete block design is provided in ANOVA and Mixed Models by Lukas Meier.
Because the blocks are incomplete, the Type I and Type III sums of squares will be different. That is, the missing treatments in each block represent missing observations (but not missing ‘at random’).
9.2 Example Analysis
We will demonstrate an example data set designed in a balanced incomplete block design. First, load the libraries for analysis and estimation: ::: panel-tabset ### lme4
9.2.1 nlme
library(nlme); library(broom.mixed); library(emmeans)
library(dplyr); library(performance)
:::
https://kwstat.github.io/agridat/reference/weiss.incblock.html
library(agridat)
data(weiss.incblock)
<- weiss.incblock dat
9.2.2 Data integrity checks
The first thing is to make sure the data is what we expect. There are two steps:
- make sure data are the expected data type
- check the extent of missing data
- inspect the independent variables and make sure the expected levels are present in the data
- inspect the dependent variable to ensure its distribution is following expectations
str(dat)
'data.frame': 186 obs. of 5 variables:
$ block: Factor w/ 31 levels "B01","B02","B03",..: 1 2 3 4 5 6 7 8 9 10 ...
$ gen : Factor w/ 31 levels "G01","G02","G03",..: 24 15 20 18 20 5 22 1 9 14 ...
$ yield: num 29.8 24.2 30.5 20 35.2 25 23.6 23.6 29.3 25.5 ...
$ row : int 42 36 30 24 18 12 6 42 36 30 ...
$ col : int 1 1 1 1 1 1 1 2 2 2 ...
These look okay with block and gen being factor variables and yield, row, and col being numeric variables.
Next, check the independent variables. Running a cross tabulations is often sufficient to ascertain this.
$row <- as.factor(dat$row) dat
table(dat$gen, dat$block)
B01 B02 B03 B04 B05 B06 B07 B08 B09 B10 B11 B12 B13 B14 B15 B16 B17 B18
G01 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1
G02 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0
G03 0 0 1 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0
G04 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0
G05 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0
G06 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
G07 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
G08 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0
G09 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
G10 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1
G11 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
G12 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
G13 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0
G14 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
G15 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0
G16 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
G17 0 0 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0
G18 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1
G19 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
G20 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
G21 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0
G22 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1
G23 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
G24 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0
G25 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
G26 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
G27 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0
G28 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
G29 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0
G30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
G31 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0
B19 B20 B21 B22 B23 B24 B25 B26 B27 B28 B29 B30 B31
G01 1 0 0 0 0 0 0 0 0 0 1 1 0
G02 0 1 0 0 0 0 0 0 1 1 0 0 0
G03 0 0 0 0 1 0 0 0 0 0 0 0 0
G04 0 0 1 0 0 0 1 0 0 0 0 0 1
G05 0 0 0 0 0 1 0 0 0 0 0 0 0
G06 1 1 0 1 0 0 0 0 0 0 0 0 1
G07 0 0 0 1 0 1 0 0 0 0 1 0 0
G08 0 0 1 1 0 0 0 0 0 1 0 0 0
G09 0 0 0 1 0 0 1 0 1 0 0 1 0
G10 0 0 0 1 1 0 0 0 0 0 0 0 0
G11 1 0 0 0 0 0 0 0 1 0 0 0 0
G12 0 0 1 0 1 0 0 0 0 0 0 1 0
G13 0 0 0 0 0 0 0 0 0 0 1 0 1
G14 0 0 0 0 0 1 1 0 0 1 0 0 0
G15 0 1 0 0 0 0 0 0 0 0 0 0 0
G16 1 0 1 0 0 1 0 0 0 0 0 0 0
G17 0 0 0 0 0 0 0 0 0 0 0 0 0
G18 0 0 0 0 0 0 0 0 1 0 0 0 0
G19 0 1 0 0 1 0 1 0 0 0 1 0 0
G20 0 0 0 0 0 0 0 0 0 1 0 1 1
G21 1 0 0 0 1 0 0 0 0 1 0 0 0
G22 0 0 0 0 0 0 0 0 0 0 0 0 1
G23 0 1 0 0 0 1 0 0 0 0 0 1 0
G24 0 0 0 0 0 0 1 0 0 0 0 0 0
G25 0 0 1 0 0 0 0 0 1 0 1 0 0
G26 0 0 0 1 0 0 0 1 0 0 0 0 0
G27 1 0 0 0 0 0 1 1 0 0 0 0 0
G28 0 1 1 0 0 0 0 1 0 0 0 0 0
G29 0 0 0 0 0 0 0 1 0 0 0 1 0
G30 0 0 0 0 1 1 0 1 1 0 0 0 1
G31 0 0 0 0 0 0 0 1 0 1 1 0 0
There are 31 varieties and it is perfectly balanced, with exactly one observation per treatment per block.
Here is a quick check I run to count the number of missing data in each column.
apply(dat, 2, function(x) sum(is.na(x)))
block gen yield row col
0 0 0 0 0
We observed no missing data!
Last, check the dependent variable. A histogram is often quite sufficient to accomplish this. This is designed to be a quick check, so no need to spend time making the plot look good.
hist(dat$yield, main = "", xlab = "yield")
This data set is ready for analysis!
9.2.3 Model Building
<- lmer(yield ~ gen + (1|block),
model_icbd data = dat,
na.action = na.exclude)
tidy(model_icbd)
# A tibble: 33 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 24.6 0.922 26.7 153. 2.30e-59
2 fixed <NA> genG02 2.40 1.17 2.06 129. 4.17e- 2
3 fixed <NA> genG03 8.04 1.17 6.88 129. 2.31e-10
4 fixed <NA> genG04 2.37 1.17 2.03 129. 4.42e- 2
5 fixed <NA> genG05 1.60 1.17 1.37 129. 1.73e- 1
6 fixed <NA> genG06 7.39 1.17 6.32 129. 3.82e- 9
7 fixed <NA> genG07 -0.419 1.17 -0.359 129. 7.20e- 1
8 fixed <NA> genG08 3.04 1.17 2.60 129. 1.04e- 2
9 fixed <NA> genG09 4.84 1.17 4.14 129. 6.22e- 5
10 fixed <NA> genG10 -0.0429 1.17 -0.0367 129. 9.71e- 1
# ℹ 23 more rows
#model_icbd1 <- lmer(yield ~ gen + (1|block) + (1|row:block),
# data = dat,
# na.action = na.exclude)
#tidy(model_icbd1)
<- lme(yield ~ gen,
model_icbd random = ~ 1|block,
data = dat,
na.action = na.exclude)
tidy(model_icbd)
9.2.4 Check Model Assumptions
check_model(model_icbd)
9.2.5 Inference
anova(model_icbd)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
gen 1901.1 63.369 30 129.06 17.675 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(model_icbd, ~ gen)
gen emmean SE df lower.CL upper.CL
G01 24.6 0.923 153 22.7 26.4
G02 27.0 0.923 153 25.2 28.8
G03 32.6 0.923 153 30.8 34.4
G04 26.9 0.923 153 25.1 28.8
G05 26.2 0.923 153 24.4 28.0
G06 32.0 0.923 153 30.1 33.8
G07 24.2 0.923 153 22.3 26.0
G08 27.6 0.923 153 25.8 29.4
G09 29.4 0.923 153 27.6 31.2
G10 24.5 0.923 153 22.7 26.4
G11 27.1 0.923 153 25.2 28.9
G12 29.3 0.923 153 27.4 31.1
G13 29.9 0.923 153 28.1 31.8
G14 24.2 0.923 153 22.4 26.1
G15 26.1 0.923 153 24.3 27.9
G16 25.9 0.923 153 24.1 27.8
G17 19.7 0.923 153 17.9 21.5
G18 25.7 0.923 153 23.9 27.5
G19 29.0 0.923 153 27.2 30.9
G20 33.2 0.923 153 31.3 35.0
G21 31.1 0.923 153 29.3 32.9
G22 25.2 0.923 153 23.3 27.0
G23 29.8 0.923 153 28.0 31.6
G24 33.6 0.923 153 31.8 35.5
G25 27.0 0.923 153 25.2 28.8
G26 27.1 0.923 153 25.3 29.0
G27 23.8 0.923 153 22.0 25.6
G28 26.5 0.923 153 24.6 28.3
G29 24.8 0.923 153 22.9 26.6
G30 36.2 0.923 153 34.4 38.0
G31 27.1 0.923 153 25.3 28.9
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95