library(lme4); library(lmerTest); library(emmeans); library(performance)
library(dplyr); library(broom.mixed); library(agridat); library(desplot)
10 Latin Square Design
10.1 Background
Latin square design In the Latin Square design, two blocking factors are arranged across the row and the column of the square. This allows blocking of two nuisance factors across rows and columns to reduce even more experimental error. The requirement of Latin square design is that all t treatments appears only once in each row and column and number of replications is equal to number of treatments.
Advantages of Latin square design are: 1. The design is particularly appropriate for comparing t treatment means in the presence of two sources of extraneous variation, each measured at t levels. 2. The analysis is quite simple.
Disadvantage: 1. A Latin square can be constructed for any value of t, however, it is best suited for comparing t treatments when 5≤t≤10.
Any additional extraneous sources of variability tend to inflate the error term, making it more difficult to detect differences among the treatment means.
The effect of each treatment on the response must be approximately the same across rows and columns.
Statistical model for a response in Latin square design is:
\(Y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_k + \epsilon_{ijk}\)
where, \(\mu\) is the experiment mean, \(\alpha_i's\) are treatment effects, \(\beta\) and \(\gamma\) are the row- and column specific effects.
Assumptions of this design includes normality and independent distribution of error (\(\epsilon_{ijk}\)) terms. And there is no interaction between two blocking (rows & columns) factors and treatments.
10.2 Example Analysis
Let’s start the analysis firstly by loading the required libraries:
library(nlme); library(broom.mixed); library(emmeans); library(performance)
library(dplyr); library(agridat); library(desplot)
The data used in this example is extracted from the agridat
package. In this experiment, 5 treatments (A = Dusted before rains. B = Dusted after rains. C = Dusted once each week. D = Drifting, once each week. E = Not dusted) were tested to control stem rust in wheat.
<- agridat::goulden.latin dat
trt | treatment factor, 5 levels |
row | row position for each plot |
col | column position for each plot |
yield | wheat yield |
10.2.1 Data integrity checks
Firstly, let’s verify the class of variables in the dataset using str()
function in base R
str(dat)
'data.frame': 25 obs. of 4 variables:
$ trt : Factor w/ 5 levels "A","B","C","D",..: 2 3 4 5 1 4 1 3 2 5 ...
$ yield: num 4.9 9.3 7.6 5.3 9.3 6.4 4 15.4 7.6 6.3 ...
$ row : int 5 4 3 2 1 5 4 3 2 1 ...
$ col : int 1 1 1 1 1 2 2 2 2 2 ...
Here yield and trt are classified as numeric and factor variables, respectively, as needed. But we need to change ‘row’ and ‘col’ from integer t factor/character.
<- dat |>
dat1 mutate(row = as.factor(row),
col = as.factor(col))
Next, to verify if the data meets the assumption of the Latin square design let’s plot the field layout for this experiment.
::desplot(data = dat, flip = TRUE,
desplotform = yield ~ row + col,
out1 = row, out1.gpar=list(col="black", lwd=3),
out2 = col, out2.gpar=list(col="black", lwd=3),
text = trt, cex = 1, shorten = "no",
main = "Field layout",
show.key = FALSE)
This looks great! Here we can see that there are equal number of treatments, rows, and columns. Treatments were randomized in such a way that one treatment doesn’t appear more than once in each row and column.
Next step is to check if there are any missing values in response variable.
apply(dat, 2, function(x) sum(is.na(x)))
trt yield row col
0 0 0 0
And we do not have any missing values in the data.
Before fitting the model, let’s create a histogram of response variable to see if there are extreme values.
hist(dat$yield, main = "", xlab = "yield")
10.2.2 Model fitting
Here we will fit a model to evaluate the impact of fungicide treatments on wheat yield with trt as a fixed effect and row & col as a random effect.
<- lmer(yield ~ trt + (1|row) + (1|col),
m1_a data = dat1,
na.action = na.exclude)
tidy(m1_a)
# A tibble: 8 × 8
effect group term estimate std.error statistic df p.value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed <NA> (Intercept) 6.84 0.942 7.26 11.9 1.03e-5
2 fixed <NA> trtB -0.380 0.967 -0.393 12.0 7.01e-1
3 fixed <NA> trtC 6.28 0.967 6.50 12.0 2.96e-5
4 fixed <NA> trtD 1.12 0.967 1.16 12.0 2.69e-1
5 fixed <NA> trtE -1.92 0.967 -1.99 12.0 7.04e-2
6 ran_pars row sd__(Intercept) 1.37 NA NA NA NA
7 ran_pars col sd__(Intercept) 0.483 NA NA NA NA
8 ran_pars Residual sd__Observation 1.53 NA NA NA NA
$dummy <- factor(1)
dat<- lme(yield ~ trt,
m1_b random =list(~1|row, ~1|col),
#list(dummy = pdBlocked(list(
# pdIdent(~row - 1),
# pdIdent(~col - 1)))),
data = dat,
na.action = na.exclude)
summary(m1_b)
Linear mixed-effects model fit by REML
Data: dat
AIC BIC logLik
106.0974 114.0633 -45.04872
Random effects:
Formula: ~1 | row
(Intercept)
StdDev: 1.344469
Formula: ~1 | col %in% row
(Intercept) Residual
StdDev: 1.494696 0.628399
Fixed effects: yield ~ trt
Value Std.Error DF t-value p-value
(Intercept) 6.84 0.9419764 16 7.261328 0.0000
trtB -0.38 1.0254756 16 -0.370560 0.7158
trtC 6.28 1.0254756 16 6.123987 0.0000
trtD 1.12 1.0254756 16 1.092176 0.2909
trtE -1.92 1.0254756 16 -1.872302 0.0796
Correlation:
(Intr) trtB trtC trtD
trtB -0.544
trtC -0.544 0.500
trtD -0.544 0.500 0.500
trtE -0.544 0.500 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.5686726 -0.2469684 -0.1061146 0.2349101 0.7617205
Number of Observations: 25
Number of Groups:
row col %in% row
5 25
#VarCorr(m1_b)
10.2.3 Check Model Assumptions
check_model(m1_a, check = c("linearity", "normality"))
check_model(m1_b, check = c("linearity", "normality"))
10.2.4 Inference
We can look look at the analysis of variance for treatment effect on yield using anova()
function.
anova(m1_a, type = "1")
Type I Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
trt 196.61 49.152 4 12 21.032 2.366e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m1_b, type = "sequential")
numDF denDF F-value p-value
(Intercept) 1 16 132.38123 <.0001
trt 4 16 18.69608 <.0001
Here we observed a significant impact on fungicide treatment on crop yield. Let’s have a look at the estimated marginal means of wheat yield with each treatment using emmeans()
function.
emmeans(m1_a, ~ trt)
trt emmean SE df lower.CL upper.CL
A 6.84 0.942 11.9 4.79 8.89
B 6.46 0.942 11.9 4.41 8.51
C 13.12 0.942 11.9 11.07 15.17
D 7.96 0.942 11.9 5.91 10.01
E 4.92 0.942 11.9 2.87 6.97
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans(m1_b, ~ trt)
trt emmean SE df lower.CL upper.CL
A 6.84 0.942 4 4.22 9.46
B 6.46 0.942 4 3.84 9.08
C 13.12 0.942 4 10.50 15.74
D 7.96 0.942 4 5.34 10.58
E 4.92 0.942 4 2.30 7.54
Degrees-of-freedom method: containment
Confidence level used: 0.95