library(nlme); library(performance); library(emmeans)
library(dplyr); library(broom.mixed); library(ggplot2)
11 Repeated measures mixed models
In the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects.
Studies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time of each subject. This is common in any studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects.
In these models, the ‘iid’ assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms.
There are several types of covariance structures:
Structure name | R function | Assumption |
---|---|---|
Autoregressive (AR1) | corAR1() |
observations which are more proximate are more correlated than measures that are more distant |
Compound symmetry | corCompSymm() |
Correlation is equal for all time gaps between observations |
Unstructured | CorSymm() |
Correlations are different for the various time gaps |
To read more about selecting appropriate covariance structure based on your data, please refer to this link.
The repeated measures syntax in nlme follow this convention: corr = corAR1(value = (b/w -1 & 1), form = ~ t|g, fixed = (T or F))
.
One can use differnt correlation structure classes such as CorAR1()
, corCompSymm()
, CorSymm()
.
For form()
, ~ t or ~ t|g, specifying a time covariate t and, optionally a grouping factor g. When we use ~t|g form, the correlation structure is assumed to apply only to observations within the same grouping level.
The default starting value
is zero, and if fixed = FALSE
(the current nlme default), this value will be allowed to change during the model fitting process. A covariate for this correlation structure must be a integer value.
There are several other options in the nlme machinery (search “cor” for more options and details on the syntax).
Fitting models with correlated observations requires new libraries including mmrm and nlme. The lmer package allows random effects only.
In this tutorial we will analyze the data with repeated measures from different experiment designs including randomized complete block design, split plot, and split-split plot design.
For examples used in this chapter we will fitting model using mmrm
and lme
packages. So, let’s start with loading the required libraries for this analysis.
library(mmrm); library(performance); library(emmeans)
library(dplyr); library(broom.mixed); library(ggplot2)
12 Example Analysis
First, we will start with the first example from a randomized complete block design with repeated measures.
12.1 RCBD Repeated Measures
The example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.
We need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).
<- agriTutorial::sorghum %>%
dat mutate(week = as.numeric(factweek),
block = as.character(varblock))
block | blocking unit |
Replicate | replication unit |
Week | Time points when data was collected |
variety | treatment factor, 4 levels |
y | yield (lbs) |
12.1.1 Data Integrity Checks
Let’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.
str(dat)
'data.frame': 100 obs. of 9 variables:
$ y : num 5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...
$ variety : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Replicate: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ...
$ factweek : Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...
$ factplot : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ...
$ varweek : int 1 2 3 4 5 1 2 3 4 5 ...
$ varblock : int 1 1 1 1 1 2 2 2 2 2 ...
$ week : num 1 2 3 4 5 1 2 3 4 5 ...
$ block : chr "1" "1" "1" "1" ...
In this data, we have block, factplot, factweek as factor variables and y & week as numeric.
table(dat$variety, dat$block)
1 2 3 4 5
1 5 5 5 5 5
2 5 5 5 5 5
3 5 5 5 5 5
4 5 5 5 5 5
The cross tabulation shows a equal number of variety treatments in each block.
ggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +
geom_boxplot() +
#scale_fill_brewer(palette="Dark2") +
scale_fill_viridis_d(option = "F") +
theme_bw()
Looks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. One last step before we fit model is to look at the distribution of response variable.
hist(dat$y, main = "", xlab = "yield")
12.1.2 Model Building
Let’s fit the basic model first using lme()
from the nlme package.
<- lme(y ~ variety + factweek + variety:factweek,
lm1 random = ~1|block/factplot,
data = dat,
na.action = na.exclude)
The model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.
By adding this correlation structure, we are accounting for variation caused by repeated measurements over weeks for each plot. The AR1 structure assumes that data points collected more proximate are more correlated. Whereas, the compound symmetry structure assumes that correlation is equal for all time gaps. Here, we will fit model with both correlation structures and compare models to find out the best fit model.
In this analysis, time variable is week
and it must be numeric.
<- corAR1(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)
cs1 <- corCompSymm(form = ~ week|block/factplot, value = 0.2, fixed = FALSE) cs2
In the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices. In nlme, please search the help tool to know more about functions for different correlation structure classes.
<- update(lm1, corr = cs1)
lm2 <- update(lm1, corr= cs2) lm3
Now let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova()
or by compare_performance()
function from the ‘performance’ library.
anova(lm1, lm2, lm3)
Model df AIC BIC logLik Test L.Ratio p-value
lm1 1 23 18.837478 73.62409 13.58126
lm2 2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487 <.0001
lm3 3 24 20.837478 78.00612 13.58126
<- compare_performance(lm1, lm2, lm3) result
Some of the nested models seem to be identical and probably only vary in
their random effects.
print_md(result)
Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
---|---|---|---|---|---|---|---|---|---|
lm1 | lme | -50.5 (<.001) | -36.0 (<.001) | 9.4 (<.001) | 0.99 | 0.37 | 0.98 | 0.10 | 0.13 |
lm2 | lme | -77.5 (>.999) | -61.5 (>.999) | -15.0 (>.999) | 0.97 | 0.41 | 0.95 | 0.15 | 0.18 |
lm3 | lme | -48.5 (<.001) | -32.5 (<.001) | 14.0 (<.001) | 0.98 | 0.37 | 0.98 | 0.11 | 0.14 |
We prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.
Let’s run a tidy()
on lm2 model to look at the estimates for random and fixed effects.
tidy(lm2)
Warning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of
nesting
# A tibble: 20 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 4.24 0.291 64 14.6 5.44e-22
2 fixed variety2 0.906 0.114 12 7.94 4.05e- 6
3 fixed variety3 0.646 0.114 12 5.66 1.05e- 4
4 fixed variety4 0.912 0.114 12 8.00 3.78e- 6
5 fixed factweek2 -0.196 0.0571 64 -3.44 1.04e- 3
6 fixed factweek3 -0.836 0.0755 64 -11.1 1.60e-16
7 fixed factweek4 -1.16 0.0867 64 -13.3 4.00e-20
8 fixed factweek5 -1.54 0.0943 64 -16.3 1.57e-24
9 fixed variety2:factweek2 0.0280 0.0807 64 0.347 7.30e- 1
10 fixed variety3:factweek2 0.382 0.0807 64 4.73 1.26e- 5
11 fixed variety4:factweek2 -0.0140 0.0807 64 -0.174 8.63e- 1
12 fixed variety2:factweek3 0.282 0.107 64 2.64 1.03e- 2
13 fixed variety3:factweek3 0.662 0.107 64 6.20 4.55e- 8
14 fixed variety4:factweek3 0.388 0.107 64 3.64 5.55e- 4
15 fixed variety2:factweek4 0.228 0.123 64 1.86 6.77e- 2
16 fixed variety3:factweek4 0.744 0.123 64 6.06 7.86e- 8
17 fixed variety4:factweek4 0.390 0.123 64 3.18 2.28e- 3
18 fixed variety2:factweek5 0.402 0.133 64 3.01 3.70e- 3
19 fixed variety3:factweek5 0.672 0.133 64 5.04 4.11e- 6
20 fixed variety4:factweek5 0.222 0.133 64 1.66 1.01e- 1
12.1.3 Check Model Assumptions
check_model(lm2, check = c('normality', 'linearity'))
12.1.4 Inference
The ANOVA table suggests a significant effect of the variety, week, and variety x week interaction effect.
anova(lm2, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 64 212.10509 <.0001
variety 3 12 28.28895 <.0001
factweek 4 64 74.79758 <.0001
variety:factweek 12 64 7.03546 <.0001
We can estimate the marginal means for variety and week effect and their interaction using emmeans()
function.
<- emmeans(lm2, ~ variety) mean_1
NOTE: Results may be misleading due to involvement in interactions
mean_1
variety emmean SE df lower.CL upper.CL
1 3.50 0.288 4 2.70 4.29
2 4.59 0.288 4 3.79 5.39
3 4.63 0.288 4 3.84 5.43
4 4.61 0.288 4 3.81 5.40
Results are averaged over the levels of: factweek
Degrees-of-freedom method: containment
Confidence level used: 0.95
<- emmeans(lm2, ~ variety*factweek)
mean_2 mean_2
variety factweek emmean SE df lower.CL upper.CL
1 1 4.24 0.291 4 3.43 5.05
2 1 5.15 0.291 4 4.34 5.96
3 1 4.89 0.291 4 4.08 5.70
4 1 5.15 0.291 4 4.35 5.96
1 2 4.05 0.291 4 3.24 4.85
2 2 4.98 0.291 4 4.17 5.79
3 2 5.07 0.291 4 4.27 5.88
4 2 4.94 0.291 4 4.14 5.75
1 3 3.41 0.291 4 2.60 4.21
2 3 4.59 0.291 4 3.79 5.40
3 3 4.71 0.291 4 3.91 5.52
4 3 4.71 0.291 4 3.90 5.51
1 4 3.09 0.291 4 2.28 3.89
2 4 4.22 0.291 4 3.41 5.03
3 4 4.48 0.291 4 3.67 5.28
4 4 4.39 0.291 4 3.58 5.20
1 5 2.70 0.291 4 1.89 3.51
2 5 4.01 0.291 4 3.20 4.82
3 5 4.02 0.291 4 3.21 4.83
4 5 3.83 0.291 4 3.03 4.64
Degrees-of-freedom method: containment
Confidence level used: 0.95
Here is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.
12.2 Split Plot Repeated Measures
Recall, we have evaluated split plot design Chapter 5. In this example we will use the same methodology used in Chapter 5 and update it with repeated measures component.
Next, let’s load “Yield” data. It is located here.
<- read.csv(here::here("data/Yield.csv")) Yield
This example contains yield data in a split-plot design. The yield data was collected repeatedly from the same Reps over 5 Sample_times. In this data set, we have:
Rep | replication unit |
Variety | Main plot, 2 levels |
Fertilizer | Split plot, 3 levels |
Yield | crop yield |
Sample_time | time points for data collection |
12.2.1 Data Integrity Checks
Firstly, we need to look at the class of variables in the data set.
str(Yield)
'data.frame': 120 obs. of 6 variables:
$ Sample_time: int 1 1 1 1 1 1 1 1 1 1 ...
$ Variety : chr "VAR1" "VAR1" "VAR1" "VAR1" ...
$ Fertilizer : int 1 1 1 1 2 2 2 2 3 3 ...
$ Rep : int 1 2 3 4 1 2 3 4 1 2 ...
$ pH : num 7.07 7.06 7.08 7.09 7.13 7.12 7.15 7.14 7.18 7.18 ...
$ Yield : num 0.604 0.595 3.145 3.091 2.415 ...
We will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect.
For lme(), independent variables in a character/factor form works fine. But, for mmrm() independent variables must be a factor. Thus, for sake of consistancy, we will be using independent variables in factor class.
$Variety <- factor(Yield$Variety)
Yield$Fertilizer <- factor(Yield$Fertilizer)
Yield$Sample_time1 <- factor(Yield$Sample_time)
Yield$Rep <- factor(Yield$Rep) Yield
To fit model, we first need to convert Variety, Fertilizer, and Sample_time as factors. In addition, we need to create a new variable named ‘plot’ with a unique value for each plot. In addition, we need a create variable for each subject which is plot in this case and contains a unique value for each plot. The plot variable is needed to model the variation in each plot over the sampling time. The plot will be used as a subject with repeated measures. The subject variable can be factor or numeric but the time (it could be year, or sample_time) has to be a factor.
##creating a plot variable
$plot <- factor(paste(Yield$Rep, Yield$Fertilizer, Yield$Variety, sep='-'))
Yield$Rep2 <- factor(paste(Yield$Rep, Yield$Variety, sep='-'))
Yieldtable(Yield$plot)
1-1-VAR1 1-1-VAR2 1-2-VAR1 1-2-VAR2 1-3-VAR1 1-3-VAR2 2-1-VAR1 2-1-VAR2
5 5 5 5 5 5 5 5
2-2-VAR1 2-2-VAR2 2-3-VAR1 2-3-VAR2 3-1-VAR1 3-1-VAR2 3-2-VAR1 3-2-VAR2
5 5 5 5 5 5 5 5
3-3-VAR1 3-3-VAR2 4-1-VAR1 4-1-VAR2 4-2-VAR1 4-2-VAR2 4-3-VAR1 4-3-VAR2
5 5 5 5 5 5 5 5
table(Yield$Fertilizer, Yield$Variety)
VAR1 VAR2
1 20 20
2 20 20
3 20 20
Looks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.
Before fitting a model, let’s check the distribution of the response variable.
hist(Yield$Yield)
12.2.2 Model fit
This data can be analyzed either using nlme
or mmrm
.
using lme() from nlme package.
Let’s say we want to fit a model using AR1 structure as shown in the RCBD repeated measures example. Previously, we used lme()
from nlme package to fit the model. In this example, along with nlme()
we will also mmrm()
function from the mmrm package. In addition, instead of summary()
function we will use tidy()
function from the ‘broom.mixed’ package to look at estimates of mixed and random effects. This will generate a tidy workflow in particular by providing standardized verbs that provide information on estimates, standard errors, confidence intervals, etc.
= corAR1(form = ~ Sample_time|Rep/Variety/plot, value = 0.2, fixed = FALSE)
corr_str1
<- lme(Yield ~ Sample_time1*Variety*Fertilizer,
fit1 random = ~ 1|Rep/Variety/plot,
corr= corr_str1,
data = Yield, na.action= na.exclude)
tidy(fit1)
# A tibble: 30 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 1.86 0.708 72 2.63 0.0105
2 fixed Sample_time12 0.515 0.688 72 0.748 0.457
3 fixed Sample_time13 0.787 0.674 72 1.17 0.247
4 fixed Sample_time14 1.35 0.675 72 2.00 0.0496
5 fixed Sample_time15 2.84 0.675 72 4.21 0.0000731
6 fixed VarietyVAR2 -0.996 0.861 3 -1.16 0.331
7 fixed Fertilizer2 1.27 0.861 12 1.47 0.167
8 fixed Fertilizer3 2.07 0.861 12 2.40 0.0333
9 fixed Sample_time12:VarietyVAR2 0.739 0.974 72 0.759 0.451
10 fixed Sample_time13:VarietyVAR2 0.269 0.954 72 0.282 0.779
# ℹ 20 more rows
<- mmrm(formula = Yield ~ Sample_time1*Variety*Fertilizer +
fit2 ar1(Sample_time1|Rep/plot),
data = Yield)
tidy(fit2)
# A tibble: 30 × 6
term estimate std.error df statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.86 0.464 12.7 6.16 0.0000387
2 Sample_time12 0.656 0.310 1.81 2.12 0.182
3 Sample_time13 1.40 0.414 2.29 3.39 0.0636
4 Sample_time14 1.46 0.484 2.87 3.01 0.0605
5 Sample_time15 2.47 0.549 3.14 4.50 0.0186
6 VarietyVAR2 -1.07 0.656 12.7 -1.63 0.128
7 Fertilizer2 1.67 0.656 12.7 2.55 0.0245
8 Fertilizer3 0.595 0.656 12.7 0.908 0.381
9 Sample_time12:VarietyVAR2 -0.591 0.438 1.81 -1.35 0.321
10 Sample_time13:VarietyVAR2 -0.412 0.586 2.29 -0.704 0.546
# ℹ 20 more rows
12.2.3 Model diagnostics
We will use check_model()
from ‘performance’ package to evaluate the model fitness of model fitted using nlme (mod1). However, the mmrm model class doesn’t work with performance package, so we will evalute the model diagnostics by plotting the residuals using base R functions.
check_model(fit1, check = c('normality', 'linearity'))
plot(residuals(fit2), xlab = "fitted values", ylab = "residuals")
qqnorm(residuals(fit2)); qqline(residuals(fit2))
These diagnostic plots look great! The linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects fall on a straight line so we can be satisfied with that.
12.2.4 Inference
anova(fit1, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 72 6.899272 0.0105
Sample_time1 4 72 5.318690 0.0008
Variety 1 3 1.338879 0.3310
Fertilizer 2 12 2.936073 0.0916
Sample_time1:Variety 4 72 0.998154 0.4143
Sample_time1:Fertilizer 8 72 8.158884 <.0001
Variety:Fertilizer 2 12 0.237417 0.7923
Sample_time1:Variety:Fertilizer 8 72 0.731698 0.6631
#car::Anova(fit2, type = "III")
#Anova.mmrm(fit2, type = "III")
The ANOVA showed a significant effect of Sample_time and Sample_time x Fertilizer interaction effect.
Next, we can estimate marginal means and confidence intervals for the independent variables using emmeans()
.
emmeans(fit1,~ Sample_time1)
NOTE: Results may be misleading due to involvement in interactions
Sample_time1 emmean SE df lower.CL upper.CL
1 2.65 0.438 3 1.25 4.04
2 4.40 0.438 3 3.01 5.79
3 5.53 0.438 3 4.13 6.92
4 7.26 0.438 3 5.87 8.66
5 8.82 0.438 3 7.42 10.21
Results are averaged over the levels of: Variety, Fertilizer
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit1,~ Sample_time1|Fertilizer)
NOTE: Results may be misleading due to involvement in interactions
Fertilizer = 1:
Sample_time1 emmean SE df lower.CL upper.CL
1 1.36 0.562 3 -0.427 3.15
2 2.25 0.562 3 0.458 4.03
3 2.28 0.562 3 0.495 4.07
4 2.65 0.562 3 0.861 4.44
5 3.66 0.562 3 1.874 5.45
Fertilizer = 2:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.04 0.562 3 1.248 4.82
2 5.17 0.562 3 3.383 6.96
3 6.46 0.562 3 4.668 8.24
4 8.72 0.562 3 6.935 10.51
5 10.09 0.562 3 8.304 11.88
Fertilizer = 3:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.55 0.562 3 1.762 5.34
2 5.78 0.562 3 3.995 7.57
3 7.84 0.562 3 6.051 9.63
4 10.42 0.562 3 8.630 12.21
5 12.69 0.562 3 10.905 14.48
Results are averaged over the levels of: Variety
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit2,~ Sample_time1)
NOTE: Results may be misleading due to involvement in interactions
Sample_time1 emmean SE df lower.CL upper.CL
1 3.43 0.189 12.7 3.02 3.84
2 5.21 0.169 12.7 4.84 5.58
3 6.59 0.163 11.9 6.23 6.94
4 7.96 0.169 12.7 7.60 8.33
5 9.65 0.189 12.7 9.24 10.06
Results are averaged over the levels of: Variety, Fertilizer, Rep
Confidence level used: 0.95
emmeans(fit2,~ Sample_time1|Fertilizer)
NOTE: Results may be misleading due to involvement in interactions
Fertilizer = 1:
Sample_time1 emmean SE df lower.CL upper.CL
1 2.32 0.328 12.7 1.61 3.03
2 2.68 0.293 12.7 2.05 3.32
3 3.52 0.283 11.9 2.90 4.14
4 3.54 0.293 12.7 2.91 4.18
5 4.27 0.328 12.7 3.56 4.98
Fertilizer = 2:
Sample_time1 emmean SE df lower.CL upper.CL
1 4.26 0.328 12.7 3.55 4.97
2 6.37 0.293 12.7 5.74 7.01
3 7.82 0.283 11.9 7.21 8.44
4 9.66 0.293 12.7 9.03 10.30
5 11.54 0.328 12.7 10.83 12.25
Fertilizer = 3:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.70 0.328 12.7 2.99 4.41
2 6.58 0.293 12.7 5.94 7.21
3 8.42 0.283 11.9 7.81 9.04
4 10.69 0.293 12.7 10.05 11.32
5 13.14 0.328 12.7 12.43 13.85
Results are averaged over the levels of: Variety, Rep
Confidence level used: 0.95
To explore more about contrasts and emmeans please refer to Chapter 12.
12.3 Split-split plot repeated measures
Recall, we have evaluated the split-split experiment design in Chapter 5, where we had a one factor in main-plot, other in subplot and the third factor in sub-subplot. In this example we will be adding a repeated measures compoenet to the split-split plot design.
<- read.csv(here::here("data", "split_split_repeated.csv")) phos
plot | experimental unit |
block | replication unit |
Ptrt | Main plot, 2 levels |
Inoc | Split plot, 2 levels |
Cv | Split-split plot, 5 levels |
time | time points for data collection |
P_leaf | leaf phosphorous content |
12.3.1 Data Integrity Checks
str(phos)
'data.frame': 240 obs. of 7 variables:
$ plot : int 1 1 1 2 2 2 3 3 3 4 ...
$ bloc : int 1 1 1 1 1 1 1 1 1 1 ...
$ Ptrt : chr "high" "high" "high" "high" ...
$ Inoc : chr "none" "none" "none" "none" ...
$ Cv : chr "LOUISE" "LOUISE" "LOUISE" "BlancaG" ...
$ time : chr "PT1" "PT2" "PT3" "PT1" ...
$ P_leaf: num 3154 2331 247 3016 2160 ...
$time = as.factor(phos$time)
phos<- phos %>%
phos1 mutate(time1 = as.numeric(time),
rep = as.character(bloc),
plot = as.character(plot))
table(phos1$Ptrt, phos1$Inoc, phos1$Cv)
, , = ALPOWA
myco none
high 12 12
low 12 12
, , = BlancaG
myco none
high 12 12
low 12 12
, , = LOUISE
myco none
high 12 12
low 12 12
, , = OTIS
myco none
high 12 12
low 12 12
, , = WALWORTH
myco none
high 12 12
low 12 12
Looks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.
Before fitting a model, let’s check the distribution of the response variable.
hist(phos1$P_leaf)
12.3.2 Model fit
= corAR1(form = ~ time1|rep/Ptrt/Inoc/plot, value = 0.2, fixed = FALSE)
corr_str1
<- lme(P_leaf ~ time*Ptrt*Inoc*Cv,
fit1 random = ~ 1|rep/Ptrt/Inoc/plot,
corr= corr_str1,
data = phos1, na.action= na.exclude)
tidy(fit1)
# A tibble: 60 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 3175. 82.6 120 38.4 2.63e-69
2 fixed timePT2 -866. 91.6 120 -9.46 3.41e-16
3 fixed timePT3 -3015. 96.9 120 -31.1 2.66e-59
4 fixed Ptrtlow -185. 101. 3 -1.84 1.64e- 1
5 fixed Inocnone 129. 97.6 6 1.33 2.33e- 1
6 fixed CvBlancaG 48.4 97.6 48 0.496 6.22e- 1
7 fixed CvLOUISE -23.2 97.6 48 -0.238 8.13e- 1
8 fixed CvOTIS 2.49 97.6 48 0.0255 9.80e- 1
9 fixed CvWALWORTH -413. 97.6 48 -4.23 1.03e- 4
10 fixed timePT2:Ptrtlow 104. 129. 120 0.800 4.25e- 1
# ℹ 50 more rows
check_model(fit1, check = c('normality', 'linearity'))
We see a cluster of values in residuals which was due to large number of observations having low values.
anova(fit1, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 120 1477.6555 <.0001
time 2 120 518.4625 <.0001
Ptrt 1 3 3.3729 0.1636
Inoc 1 6 1.7592 0.2330
Cv 4 48 7.5577 0.0001
time:Ptrt 2 120 0.6765 0.5103
time:Inoc 2 120 2.2797 0.1067
Ptrt:Inoc 1 6 2.4771 0.1666
time:Cv 8 120 2.4426 0.0175
Ptrt:Cv 4 48 0.5051 0.7321
Inoc:Cv 4 48 2.1222 0.0925
time:Ptrt:Inoc 2 120 0.8339 0.4369
time:Ptrt:Cv 8 120 0.2320 0.9843
time:Inoc:Cv 8 120 1.0401 0.4100
Ptrt:Inoc:Cv 4 48 0.4733 0.7551
time:Ptrt:Inoc:Cv 8 120 0.4155 0.9098
emmeans(fit1,~ time)
NOTE: Results may be misleading due to involvement in interactions
time emmean SE df lower.CL upper.CL
PT1 3096 46.2 3 2948.7 3242
PT2 2270 46.2 3 2122.7 2416
PT3 198 46.2 3 50.8 345
Results are averaged over the levels of: Ptrt, Inoc, Cv
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit1,~ time|Cv)
NOTE: Results may be misleading due to involvement in interactions
Cv = ALPOWA:
time emmean SE df lower.CL upper.CL
PT1 3201 55.5 3 3024.57 3378
PT2 2225 55.5 3 2047.87 2401
PT3 178 55.5 3 1.53 355
Cv = BlancaG:
time emmean SE df lower.CL upper.CL
PT1 3183 55.5 3 3006.50 3360
PT2 2334 55.5 3 2157.45 2511
PT3 210 55.5 3 32.95 386
Cv = LOUISE:
time emmean SE df lower.CL upper.CL
PT1 3121 55.5 3 2944.36 3298
PT2 2366 55.5 3 2189.56 2543
PT3 174 55.5 3 -2.43 351
Cv = OTIS:
time emmean SE df lower.CL upper.CL
PT1 3228 55.5 3 3051.65 3405
PT2 2253 55.5 3 2076.66 2430
PT3 234 55.5 3 56.86 410
Cv = WALWORTH:
time emmean SE df lower.CL upper.CL
PT1 2744 55.5 3 2567.30 2921
PT2 2170 55.5 3 1992.90 2346
PT3 193 55.5 3 15.88 369
Results are averaged over the levels of: Ptrt, Inoc
Degrees-of-freedom method: containment
Confidence level used: 0.95
Really low P leaf content at PT3 in all the cultivars.
The biggest advantage of mixed models is their incredible flexibility. They handle clustered individuals as well as repeated measures (even in the same model). They handle crossed random factors as well as nested
The biggest disadvantage of mixed models, at least for someone new to them, is their incredible flexibility. It’s easy to mis-specify a mixed model, and this is a place where a little knowledge is definitely dangerous.