library(nlme); library(performance); library(emmeans)
library(dplyr); library(broom.mixed); library(ggplot2)
12 Repeated measures mixed models
In the previous chapters we have covered how to run linear mixed models for different experiment designs. All of the examples in those chapters were independent measure designs, where each subject was assigned to a different treatment. Now we will move on to experiment with repeated measures effects (also called “longtudinal data”).
Studies that involve repeated observations of the exact same experimental units (or subjects) requires a repeated measures component in analysis to properly model correlations across time for each subject. This is common in studies that are evaluated across different time periods. For example, if samples are collected over the different time periods from same subject, we have to model the repeated measures effect while analyzing the main effects.
In these models, the ‘iid’ assumption (independently and identically distributed) is being violated often, so we need to introduce specialized covariance structures that can account for these correlations between error terms.
Fitting models with correlated observations requires new libraries including mmrm and nlme. The lme4 package allows random effects only. In this chapter, we will analyze the data with repeated measures from different experiment designs including randomized complete block design, split plot, and split-split plot design.
There are several types of covariance structures:
Structure name | nlme function | mmrm function |
---|---|---|
Autoregressive (AR1) | corAR1() |
ar1 |
Compound symmetry | corCompSymm() |
|
Unstructured | CorSymm() |
To read more about selecting appropriate covariance structure based on your data, please refer to this link.
The repeated measures syntax in nlme follow this convention:
corr = corAR1(value = 0, form = ~ t|g, fixed = FALSE)
.
The argument for ‘value’ is the starting value for iterations (zero, unless you specify something else), and if fixed = FALSE
(the current nlme default), this value will be allowed to change during the model fitting process. The argument structure for form
, ~ t
or ~ t|g
, specifying a time covariate ~t|g
form, the correlation structure is assumed to apply only to observations within the same grouping level. The covariate for this correlation structure must be a integer value. corCompSymm()
and corSymm()
follow the same argument syntax.
There are other covariance structures (e.g. corARMA()
, corCAR1()
), but we have found that corAR1()
and corCompSymm()
work for most circumstances. Type ?cor
in an R console for more options and details on the syntax
For this chapter we will be analyzing using mmrm
and lme
packages. So, let’s start with loading the required libraries for this analysis.
library(mmrm); library(performance); library(emmeans)
library(dplyr); library(broom.mixed); library(ggplot2)
13 Example Analysis
First, we will start with the first example with randomized complete block design with repeated measures.
13.1 RCBD Repeated Measures
The example shown below contains data from a sorghum trial laid out as a randomized complete block design (5 blocks) with variety (4 varieties) treatment effect. The response variable ‘y’ is the leaf area index assessed in five consecutive weeks on each plot.
We need to have time as numeric and factor variable. In the model, to assess the week effect, week was used as a factor (factweek). For the correlation matrix, week needs to be numeric (week).
<- read.csv(here::here("data", "repeated_meas.csv")) |>
dat mutate(block = as.character(varblock),
factweek = as.character(factweek),
variety = as.character(variety))
block | blocking unit |
Replicate | replication unit |
Week | Time points when data was collected |
variety | treatment factor, 4 levels |
y | leaf area index |
13.1.1 Data Integrity Checks
Let’s do preliminary data check including evaluating data structure, distribution of treatments, number of missing values, and distribution of response variable.
- Check structure of the data
str(dat)
'data.frame': 100 obs. of 10 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ y : num 5 4.84 4.02 3.75 3.13 4.42 4.3 3.67 3.23 2.83 ...
$ variety : chr "1" "1" "1" "1" ...
$ Replicate: int 1 1 1 1 1 2 2 2 2 2 ...
$ factweek : chr "1" "2" "3" "4" ...
$ factplot : int 1 1 1 1 1 2 2 2 2 2 ...
$ varweek : int 1 2 3 4 5 1 2 3 4 5 ...
$ varblock : int 1 1 1 1 1 2 2 2 2 2 ...
$ week : int 1 2 3 4 5 1 2 3 4 5 ...
$ block : chr "1" "1" "1" "1" ...
In this data, we have block, factplot, factweek as factor variables and y & week as numeric.
- Inspect the independent variables
table(dat$variety, dat$block)
1 2 3 4 5
1 5 5 5 5 5
2 5 5 5 5 5
3 5 5 5 5 5
4 5 5 5 5 5
The cross tabulation shows a equal number of variety treatments in each block.
- Check the extent of missing data
colSums(is.na(dat))
X y variety Replicate factweek factplot varweek varblock
0 0 0 0 0 0 0 0
week block
0 0
No missing values
- Inspect the dependent variable
ggplot(data = dat, aes(y = y, x = factweek, fill = variety)) +
geom_boxplot() +
theme_bw()
Looks like variety ‘1’ has the lowest yield and showed drastic reduction in yield over weeks compared to other varieties. One last step before we fit model is to look at the distribution of response variable.
hist(dat$y, main = "", xlab = "leaf area index", col = "turquoise")
13.1.2 Model Building
Let’s fit the model first using lme()
from the nlme package.
<- lme(y ~ variety + factweek + variety:factweek,
lm1 random = ~1|block/factplot,
data = dat,
na.action = na.exclude)
The model fitted above doesn’t account for the repeated measures effect. To account for the variation caused by repeated measurements, we can model the correlation among responses for a given subject which is plot (factor variable) in this case.
By adding this correlation structure, we are accounting for variation caused by repeated measurements over weeks for each plot. The AR1 structure assumes that data points collected more proximate are more correlated. Whereas, the compound symmetry structure assumes that correlation is equal for all time gaps. Here, we will fit model with both correlation structures and compare models to find out the best fit model.
In this analysis, time variable is week
and it must be numeric.
<- corAR1(form = ~ week|block/factplot, value = 0.2, fixed = FALSE)
cs1 <- corCompSymm(form = ~ week|block/factplot, value = 0.2, fixed = FALSE) cs2
In the code chunk above, we fitted two correlation structures including AR1 and compound symmetry matrices. Next we will update the model lm1, with these two matrices.
<- update(lm1, corr = cs1)
lm2 <- update(lm1, corr= cs2) lm3
Now let’s compare how model fitness differs among models with no correlation structure (lm1), with AR1 correlation structure (lm2), and with compound symmetry structure (lm3). We will compare these models by using anova()
or by compare_performance()
function from the performance library.
anova(lm1, lm2, lm3)
Model df AIC BIC logLik Test L.Ratio p-value
lm1 1 23 18.837478 73.62409 13.58126
lm2 2 24 -2.347391 54.82125 25.17370 1 vs 2 23.18487 <.0001
lm3 3 24 20.837478 78.00612 13.58126
Let’s compare the models performance to select a model that fits better.
<- compare_performance(lm1, lm2, lm3) result
Some of the nested models seem to be identical and probably only vary in
their random effects.
print_md(result)
Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
---|---|---|---|---|---|---|---|---|---|
lm1 | lme | -50.5 (<.001) | -36.0 (<.001) | 9.4 (<.001) | 0.99 | 0.37 | 0.98 | 0.10 | 0.13 |
lm2 | lme | -77.5 (>.999) | -61.5 (>.999) | -15.0 (>.999) | 0.97 | 0.41 | 0.95 | 0.15 | 0.18 |
lm3 | lme | -48.5 (<.001) | -32.5 (<.001) | 14.0 (<.001) | 0.98 | 0.37 | 0.98 | 0.11 | 0.14 |
We prefer to chose model with lower AIC and BIC values. In this scenario, we will move forward with lm2 model containing AR1 structure.
Let’s run a tidy()
on lm2 model to look at the estimates for random and fixed effects.
tidy(lm2)
Warning in tidy.lme(lm2): ran_pars not yet implemented for multiple levels of
nesting
# A tibble: 20 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 4.24 0.291 64 14.6 5.44e-22
2 fixed variety2 0.906 0.114 12 7.94 4.05e- 6
3 fixed variety3 0.646 0.114 12 5.66 1.05e- 4
4 fixed variety4 0.912 0.114 12 8.00 3.78e- 6
5 fixed factweek2 -0.196 0.0571 64 -3.44 1.04e- 3
6 fixed factweek3 -0.836 0.0755 64 -11.1 1.60e-16
7 fixed factweek4 -1.16 0.0867 64 -13.3 4.00e-20
8 fixed factweek5 -1.54 0.0943 64 -16.3 1.57e-24
9 fixed variety2:factweek2 0.0280 0.0807 64 0.347 7.30e- 1
10 fixed variety3:factweek2 0.382 0.0807 64 4.73 1.26e- 5
11 fixed variety4:factweek2 -0.0140 0.0807 64 -0.174 8.63e- 1
12 fixed variety2:factweek3 0.282 0.107 64 2.64 1.03e- 2
13 fixed variety3:factweek3 0.662 0.107 64 6.20 4.55e- 8
14 fixed variety4:factweek3 0.388 0.107 64 3.64 5.55e- 4
15 fixed variety2:factweek4 0.228 0.123 64 1.86 6.77e- 2
16 fixed variety3:factweek4 0.744 0.123 64 6.06 7.86e- 8
17 fixed variety4:factweek4 0.39 0.123 64 3.18 2.28e- 3
18 fixed variety2:factweek5 0.402 0.133 64 3.01 3.70e- 3
19 fixed variety3:factweek5 0.672 0.133 64 5.04 4.11e- 6
20 fixed variety4:factweek5 0.222 0.133 64 1.66 1.01e- 1
13.1.3 Check Model Assumptions
check_model(lm2, check = c('normality', 'linearity'))
13.1.4 Inference
anova(lm2, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 64 212.10509 <.0001
variety 3 12 28.28895 <.0001
factweek 4 64 74.79758 <.0001
variety:factweek 12 64 7.03546 <.0001
The ANOVA table suggests a significant effect of the variety, week, and variety x week interaction effect.
We can estimate the marginal means for variety and week effect and their interaction using emmeans()
function.
<- emmeans(lm2, ~ variety) mean_1
NOTE: Results may be misleading due to involvement in interactions
mean_1
variety emmean SE df lower.CL upper.CL
1 3.50 0.288 4 2.70 4.29
2 4.59 0.288 4 3.79 5.39
3 4.63 0.288 4 3.84 5.43
4 4.61 0.288 4 3.81 5.40
Results are averaged over the levels of: factweek
Degrees-of-freedom method: containment
Confidence level used: 0.95
<- emmeans(lm2, ~ variety*factweek)
mean_2 mean_2
variety factweek emmean SE df lower.CL upper.CL
1 1 4.24 0.291 4 3.43 5.05
2 1 5.15 0.291 4 4.34 5.96
3 1 4.89 0.291 4 4.08 5.70
4 1 5.15 0.291 4 4.35 5.96
1 2 4.05 0.291 4 3.24 4.85
2 2 4.98 0.291 4 4.17 5.79
3 2 5.07 0.291 4 4.27 5.88
4 2 4.94 0.291 4 4.14 5.75
1 3 3.41 0.291 4 2.60 4.21
2 3 4.59 0.291 4 3.79 5.40
3 3 4.71 0.291 4 3.91 5.52
4 3 4.71 0.291 4 3.90 5.51
1 4 3.09 0.291 4 2.28 3.89
2 4 4.22 0.291 4 3.41 5.03
3 4 4.48 0.291 4 3.67 5.28
4 4 4.39 0.291 4 3.58 5.20
1 5 2.70 0.291 4 1.89 3.51
2 5 4.01 0.291 4 3.20 4.82
3 5 4.02 0.291 4 3.21 4.83
4 5 3.83 0.291 4 3.03 4.64
Degrees-of-freedom method: containment
Confidence level used: 0.95
Here is a quick step to make sure your fitting model correctly: make sure to have two time variables in your data one being numeric (e.g. ‘day’ as number) and other being factor/character(e.g. ‘day_factor’ as a factor/character). Where, numeric variable is used for fitting correlation matrix and factor/character variable used in model statement to evaluate the time variable effect on response variable.
13.2 Split Plot Repeated Measures
Recall, we have evaluated split plot design Chapter 7. In this example we will use the same methodology used in Chapter 5 and update it with repeated measures component.
Next, let’s load “Yield” data. It is located here.
<- read.csv(here::here("data/Yield.csv")) Yield
This example contains yield data in a split-plot design. The yield data was collected repeatedly from the same Reps over 5 Sample_times. In this data set, we have:
Rep | replication unit |
Variety | Main plot, 2 levels |
Fertilizer | Split plot, 3 levels |
Yield | crop yield |
Sample_time | time points for data collection |
13.2.1 Data Integrity Checks
- Check structure of the data
Firstly, we need to look at the class of variables in the data set.
str(Yield)
'data.frame': 120 obs. of 6 variables:
$ Sample_time: int 1 1 1 1 1 1 1 1 1 1 ...
$ Variety : chr "VAR1" "VAR1" "VAR1" "VAR1" ...
$ Fertilizer : int 1 1 1 1 2 2 2 2 3 3 ...
$ Rep : int 1 2 3 4 1 2 3 4 1 2 ...
$ pH : num 7.07 7.06 7.08 7.09 7.13 7.12 7.15 7.14 7.18 7.18 ...
$ Yield : num 0.604 0.595 3.145 3.091 2.415 ...
We will now convert the fertilizer and Rep into factor. In addition, we need to create a new factor variable (sample_time1) to analyze the time effect.
For lme(), independent variables in a character/factor form works fine. But, for mmrm() independent variables must be a factor. Thus, for sake of consistancy, we will be using independent variables in factor class.
$Variety <- factor(Yield$Variety)
Yield$Fertilizer <- factor(Yield$Fertilizer)
Yield$Sample_time1 <- factor(Yield$Sample_time)
Yield$Rep <- factor(Yield$Rep) Yield
To fit model, we first need to convert Variety, Fertilizer, and Sample_time as factors. In addition, we need to create a variable for each subject which is plot in this case and contains a unique value for each plot. The plot variable is needed to model the variation in each plot over the sampling time. The plot will be used as a subject with repeated measures. The subject variable can be factor or numeric but the time (it could be year, or sample_time) has to be a factor.
- Inspect the independent variables
##creating a plot variable
$plot <- factor(paste(Yield$Rep, Yield$Fertilizer, Yield$Variety, sep='-'))
Yield
table(Yield$plot)
1-1-VAR1 1-1-VAR2 1-2-VAR1 1-2-VAR2 1-3-VAR1 1-3-VAR2 2-1-VAR1 2-1-VAR2
5 5 5 5 5 5 5 5
2-2-VAR1 2-2-VAR2 2-3-VAR1 2-3-VAR2 3-1-VAR1 3-1-VAR2 3-2-VAR1 3-2-VAR2
5 5 5 5 5 5 5 5
3-3-VAR1 3-3-VAR2 4-1-VAR1 4-1-VAR2 4-2-VAR1 4-2-VAR2 4-3-VAR1 4-3-VAR2
5 5 5 5 5 5 5 5
table(Yield$Fertilizer, Yield$Variety)
VAR1 VAR2
1 20 20
2 20 20
3 20 20
Looks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.
- Check the extent of missing data
colSums(is.na(Yield))
Sample_time Variety Fertilizer Rep pH Yield
0 0 0 0 0 0
Sample_time1 plot
0 0
No missing values in the dataset.
- Inspect the dependent variable
Before fitting a model, let’s check the distribution of the response variable.
hist(Yield$Yield, xlab = "yield", col = "turquoise")
13.2.2 Model fit
This data can be analyzed either using nlme
or mmrm
.
Let’s say we want to fit a model using AR1 structure as shown in the RCBD repeated measures example. Previously, we used lme()
from nlme package to fit the model. In this example, along with nlme()
we will also mmrm()
function from the mmrm package. In addition, instead of summary()
function we will use tidy()
function from the broom.mixed package to look at estimates of mixed and random effects. This will generate a tidy workflow in particular by providing standardized verbs that provide information on estimates, standard errors, confidence intervals, etc.
= corAR1(form = ~ Sample_time|Rep/Variety/plot, value = 0.2, fixed = FALSE)
corr_str1
<- lme(Yield ~ Sample_time1*Variety*Fertilizer,
fit1 random = ~ 1|Rep/Variety/plot,
corr= corr_str1,
data = Yield, na.action= na.exclude)
tidy(fit1)
# A tibble: 30 × 7
effect term estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed (Intercept) 1.86 0.708 72 2.63 0.0105
2 fixed Sample_time12 0.515 0.688 72 0.748 0.457
3 fixed Sample_time13 0.787 0.674 72 1.17 0.247
4 fixed Sample_time14 1.35 0.675 72 2.00 0.0496
5 fixed Sample_time15 2.84 0.675 72 4.21 0.0000731
6 fixed VarietyVAR2 -0.996 0.861 3 -1.16 0.331
7 fixed Fertilizer2 1.27 0.861 12 1.47 0.167
8 fixed Fertilizer3 2.07 0.861 12 2.40 0.0333
9 fixed Sample_time12:VarietyVAR2 0.739 0.974 72 0.759 0.451
10 fixed Sample_time13:VarietyVAR2 0.269 0.954 72 0.282 0.779
# ℹ 20 more rows
<- mmrm(formula = Yield ~ Sample_time1*Variety*Fertilizer +
fit2 ar1(Sample_time1|Rep/plot),
data = Yield)
tidy(fit2)
# A tibble: 30 × 6
term estimate std.error df statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.86 0.464 12.7 6.16 0.0000387
2 Sample_time12 0.656 0.310 1.81 2.12 0.182
3 Sample_time13 1.40 0.414 2.29 3.39 0.0636
4 Sample_time14 1.46 0.484 2.87 3.01 0.0605
5 Sample_time15 2.47 0.549 3.14 4.50 0.0186
6 VarietyVAR2 -1.07 0.656 12.7 -1.63 0.128
7 Fertilizer2 1.67 0.656 12.7 2.55 0.0245
8 Fertilizer3 0.595 0.656 12.7 0.908 0.381
9 Sample_time12:VarietyVAR2 -0.591 0.438 1.81 -1.35 0.321
10 Sample_time13:VarietyVAR2 -0.412 0.586 2.29 -0.704 0.546
# ℹ 20 more rows
13.2.3 Check Model Assumptions
We will use check_model()
from performance package to evaluate the model fitness of model fitted using nlme (mod1). However, the mmrm()
model class doesn’t work with performance package, so we will evaluate the model diagnostics by plotting the residuals using base R functions.
check_model(fit1, check = c('normality', 'linearity'))
plot(residuals(fit2), xlab = "fitted values", ylab = "residuals")
qqnorm(residuals(fit2)); qqline(residuals(fit2))
These diagnostic plots look great! The linearity and homogeneity of variance plots show no trend. The normal Q-Q plots for the overall residuals and for the random effects fall on a straight line so we can be satisfied with that.
13.2.4 Inference
anova(fit1, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 72 6.899272 0.0105
Sample_time1 4 72 5.318690 0.0008
Variety 1 3 1.338879 0.3310
Fertilizer 2 12 2.936073 0.0916
Sample_time1:Variety 4 72 0.998154 0.4143
Sample_time1:Fertilizer 8 72 8.158884 <.0001
Variety:Fertilizer 2 12 0.237417 0.7923
Sample_time1:Variety:Fertilizer 8 72 0.731698 0.6631
::Anova(fit2, type = "III") car
mmrm() registered as car::Anova extension
Analysis of Fixed Effect Table (Type III F tests)
Num Df Denom Df F Statistic Pr(>=F)
Sample_time1 4 2.0000 193.465 0.005149 **
Variety 1 8.0183 5.002 0.055656 .
Fertilizer 2 9.1210 135.839 1.631e-07 ***
Sample_time1:Variety 4 2.0000 1.057 0.539021
Sample_time1:Fertilizer 8 2.0000 26.383 0.037022 *
Variety:Fertilizer 2 9.1210 3.122 0.092685 .
Sample_time1:Variety:Fertilizer 8 2.0000 0.540 0.781917
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA showed a significant effect of sample time and sample time-by-fertilizer interaction effect.
Next, we can estimate marginal means and confidence intervals at different levels of the independent variables using emmeans()
.
emmeans(fit1,~ Sample_time1)
NOTE: Results may be misleading due to involvement in interactions
Sample_time1 emmean SE df lower.CL upper.CL
1 2.65 0.438 3 1.25 4.04
2 4.40 0.438 3 3.01 5.79
3 5.53 0.438 3 4.13 6.92
4 7.26 0.438 3 5.87 8.66
5 8.82 0.438 3 7.42 10.21
Results are averaged over the levels of: Variety, Fertilizer
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit1,~ Sample_time1|Fertilizer)
NOTE: Results may be misleading due to involvement in interactions
Fertilizer = 1:
Sample_time1 emmean SE df lower.CL upper.CL
1 1.36 0.562 3 -0.427 3.15
2 2.25 0.562 3 0.458 4.03
3 2.28 0.562 3 0.495 4.07
4 2.65 0.562 3 0.861 4.44
5 3.66 0.562 3 1.874 5.45
Fertilizer = 2:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.04 0.562 3 1.248 4.82
2 5.17 0.562 3 3.383 6.96
3 6.46 0.562 3 4.668 8.24
4 8.72 0.562 3 6.935 10.51
5 10.09 0.562 3 8.304 11.88
Fertilizer = 3:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.55 0.562 3 1.762 5.34
2 5.78 0.562 3 3.995 7.57
3 7.84 0.562 3 6.051 9.63
4 10.42 0.562 3 8.630 12.21
5 12.69 0.562 3 10.905 14.48
Results are averaged over the levels of: Variety
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit2,~ Sample_time1)
NOTE: Results may be misleading due to involvement in interactions
Sample_time1 emmean SE df lower.CL upper.CL
1 3.43 0.189 12.7 3.02 3.84
2 5.21 0.169 12.7 4.84 5.58
3 6.59 0.163 11.9 6.23 6.94
4 7.96 0.169 12.7 7.60 8.33
5 9.65 0.189 12.7 9.24 10.06
Results are averaged over the levels of: Variety, Fertilizer, Rep
Confidence level used: 0.95
emmeans(fit2,~ Sample_time1|Fertilizer)
NOTE: Results may be misleading due to involvement in interactions
Fertilizer = 1:
Sample_time1 emmean SE df lower.CL upper.CL
1 2.32 0.328 12.7 1.61 3.03
2 2.68 0.293 12.7 2.05 3.32
3 3.52 0.283 11.9 2.90 4.14
4 3.54 0.293 12.7 2.91 4.18
5 4.27 0.328 12.7 3.56 4.98
Fertilizer = 2:
Sample_time1 emmean SE df lower.CL upper.CL
1 4.26 0.328 12.7 3.55 4.97
2 6.37 0.293 12.7 5.74 7.01
3 7.82 0.283 11.9 7.21 8.44
4 9.66 0.293 12.7 9.03 10.30
5 11.54 0.328 12.7 10.83 12.25
Fertilizer = 3:
Sample_time1 emmean SE df lower.CL upper.CL
1 3.70 0.328 12.7 2.99 4.41
2 6.58 0.293 12.7 5.94 7.21
3 8.42 0.283 11.9 7.81 9.04
4 10.69 0.293 12.7 10.05 11.32
5 13.14 0.328 12.7 12.43 13.85
Results are averaged over the levels of: Variety, Rep
Confidence level used: 0.95
To explore more about contrasts and emmeans please refer to Chapter 13.
13.3 Split-split Plot Repeated Measures
Recall, we have evaluated the split-split experiment design in Chapter 8, where we had a one factor in main-plot, other in subplot and the third factor in sub-subplot. In this example we will be adding a repeated measures component to the split-split plot design.
<- read.csv(here::here("data", "split_split_repeated.csv")) phos
plot | experimental unit |
block | replication unit |
Ptrt | Main plot, 2 levels |
Inoc | Split plot, 2 levels |
Cv | Split-split plot, 5 levels |
time | time points for data collection |
P_leaf | leaf phosphorous content |
13.3.1 Data Integrity Checks
- Check structure of the data
str(phos)
'data.frame': 240 obs. of 7 variables:
$ plot : int 1 1 1 2 2 2 3 3 3 4 ...
$ bloc : int 1 1 1 1 1 1 1 1 1 1 ...
$ Ptrt : chr "high" "high" "high" "high" ...
$ Inoc : chr "none" "none" "none" "none" ...
$ Cv : chr "Louise" "Louise" "Louise" "Blanca Grande" ...
$ time : chr "PT1" "PT2" "PT3" "PT1" ...
$ P_leaf: num 3154 2331 247 3016 2160 ...
We need to have two variables for time one being a factor other being a numeric.
$time = as.factor(phos$time)
phos<- phos %>%
phos1 mutate(time1 = as.numeric(time),
rep = as.character(bloc),
plot = as.character(plot))
- Inspect the independent variables
table(phos1$Cv, phos1$Ptrt, phos1$Inoc)
, , = myco
high low
Alpowa 12 12
Blanca Grande 12 12
Louise 12 12
Otis 12 12
Walworth 12 12
, , = none
high low
Alpowa 12 12
Blanca Grande 12 12
Louise 12 12
Otis 12 12
Walworth 12 12
Looks like a well balanced design with 2 variety treatments and 3 fertilizer treatments.
- Check the extent of missing data
colSums(is.na(phos1))
plot bloc Ptrt Inoc Cv time P_leaf time1 rep
0 0 0 0 0 0 0 0 0
No missing values in data.
- Inspect the dependent variable
Before fitting a model, let’s check the distribution of the response variable.
Here note that we observed uneven distribution of response variable with a bimodal distribution and a noticeable gap in the 500 to 1500 range. Given this odd distribution, it may be tempting to consider a transformation in order to attempt to impose normality. It’s important to remember that the assumption of normality applies to the residuals, not the raw data. Plotting the data is for checking the data looks as expected, a judgement that requires some knowledge of the experiment (this was Julia Piaskowski’s PhD research). In this case, the time points, PT1 and PT2, reflect early wheat growth stages (tillering and jointing, respectively), and the final time point is senescent leaf tissue at grain maturity. At that physiological stage, it is normal for phosphorus leaf concentration to be much lower. Since the data look as expected, we will proceed with a general linear model and evaluate the residuals from the model fitting process when deciding if a non-normal distribution is appropriate for the data.
13.3.2 Model fit
= corCompSymm(form = ~ time1|rep/Ptrt/Inoc/plot, value = 0.2, fixed = FALSE)
corr_str1
<- lme(P_leaf ~ time*Ptrt*Inoc*Cv,
fit1 random = ~ 1|rep/Ptrt/Inoc/plot,
corr = corr_str1,
data = phos1, na.action= na.exclude)
13.3.3 Check model assumptions
check_model(fit1, check = c('qq', 'linearity'), detrend=FALSE)
This model fit a first glance is not ideal, but that LOESS line is trying to model a space where there are no data (between 500 and 1500 ppm P leaf concentration), so that can introduce artifacts. Performance does have an option for testing for heteroscedascity:
check_heteroscedasticity(fit1)
Warning in deviance.lme(x, ...): deviance undefined for REML fit
OK: Error variance appears to be homoscedastic (p > .999).
These results do confirm our suspicions that the residuals were not as heteroscedastic as they first appeared. However, the boxplot indicated a a difference in variance for each time time point. This addressed in the chapter on variance components.
13.3.4 Inference
anova(fit1, type = "marginal")
numDF denDF F-value p-value
(Intercept) 1 120 1458.2396 <.0001
time 2 120 518.4432 <.0001
Ptrt 1 3 3.3677 0.1638
Inoc 1 6 1.7697 0.2317
Cv 4 48 7.6028 0.0001
time:Ptrt 2 120 0.6790 0.5091
time:Inoc 2 120 1.9811 0.1424
Ptrt:Inoc 1 6 2.4919 0.1655
time:Cv 8 120 2.4113 0.0189
Ptrt:Cv 4 48 0.5082 0.7299
Inoc:Cv 4 48 2.1349 0.0909
time:Ptrt:Inoc 2 120 0.8410 0.4338
time:Ptrt:Cv 8 120 0.2223 0.9863
time:Inoc:Cv 8 120 0.9462 0.4816
Ptrt:Inoc:Cv 4 48 0.4761 0.7530
time:Ptrt:Inoc:Cv 8 120 0.3886 0.9249
emmeans(fit1, ~ Inoc|Cv)
NOTE: Results may be misleading due to involvement in interactions
Cv = Alpowa:
Inoc emmean SE df lower.CL upper.CL
myco 1832 53 3 1663 2000
none 1904 53 3 1736 2073
Cv = Blanca Grande:
Inoc emmean SE df lower.CL upper.CL
myco 1919 53 3 1750 2088
none 1899 53 3 1730 2068
Cv = Louise:
Inoc emmean SE df lower.CL upper.CL
myco 1876 53 3 1707 2045
none 1898 53 3 1730 2067
Cv = Otis:
Inoc emmean SE df lower.CL upper.CL
myco 1855 53 3 1686 2023
none 1956 53 3 1787 2124
Cv = Walworth:
Inoc emmean SE df lower.CL upper.CL
myco 1667 53 3 1498 1836
none 1737 53 3 1568 1906
Results are averaged over the levels of: time, Ptrt
Degrees-of-freedom method: containment
Confidence level used: 0.95
emmeans(fit1, ~ time|Cv)
NOTE: Results may be misleading due to involvement in interactions
Cv = Alpowa:
time emmean SE df lower.CL upper.CL
PT1 3201 56.4 3 3021.89 3381
PT2 2225 56.4 3 2045.20 2404
PT3 178 56.4 3 -1.14 358
Cv = Blanca Grande:
time emmean SE df lower.CL upper.CL
PT1 3183 56.4 3 3003.83 3363
PT2 2334 56.4 3 2154.77 2513
PT3 210 56.4 3 30.28 389
Cv = Louise:
time emmean SE df lower.CL upper.CL
PT1 3121 56.4 3 2941.69 3300
PT2 2366 56.4 3 2186.89 2546
PT3 174 56.4 3 -5.11 354
Cv = Otis:
time emmean SE df lower.CL upper.CL
PT1 3228 56.4 3 3048.98 3408
PT2 2253 56.4 3 2073.98 2433
PT3 234 56.4 3 54.19 413
Cv = Walworth:
time emmean SE df lower.CL upper.CL
PT1 2744 56.4 3 2564.63 2923
PT2 2170 56.4 3 1990.22 2349
PT3 193 56.4 3 13.21 372
Results are averaged over the levels of: Ptrt, Inoc
Degrees-of-freedom method: containment
Confidence level used: 0.95