Field Guide to the R Mixed Model Wilderness
  1. 14  Variance and Variance Components
  • Preface
  • 1  Introduction
  • 2  Zen and the Art of Statistical Analysis
  • 3  Mixed Model Background
  • 4  Model Prep & Workflow
  • Experiment designs
    • 5  Randomized Complete Block Design
    • 6  Factorial RCBD Design
    • 7  Split Plot Design
    • 8  Split-Split Plot Design
    • 9  Strip Plot Design
    • 10  Incomplete Block Design
    • 11  Latin Square Design
  • 12  Repeated Measures
  • 13  Marginal Means and Contrasts
  • 14  Variance and Variance Components
  • 15  Troubleshooting
  • 16  Additional Resources
  • References

Table of contents

  • 14.1 Variance component estimation for 2+ random effects
  • 14.2 Multiple crossed variance components
    • 14.2.1 Model fitting
    • 14.2.2 Inference on random effects
  • 14.3 Nested variance components
    • 14.3.1 Model fitting
    • 14.3.2 Inference
  • 14.4 Variance estimation under heteroscedasticity
    • 14.4.1 Case 1: unequal variance due to a factor
    • 14.4.2 Case 2: Variance is related to the fitted values
    • 14.4.3 Case 3: Variance is related to a factor under other complex circumstances
  • 14.5 Coefficient of Variation
  • View source
  • Report an issue

14  Variance & Variance Components

Mixed, hierarchical or multilevel models provide the advantage of being able to estimate the variance of random variables and model correlations within the grouping structure of random variables. Instead of looking at a variable as a collection of specific levels to estimate, random effects view variables as being a random draw from a probability distribution.

The decision of how to designate a variable as random or fixed depends on (1) your experimental aims with regard to inference and (2) your data structure. 1 There is a philosophical consider and practical consideration. The philosophical approach is that random variable represents a small sample of a population you want to make inference on. The practical consideration is that when there are few sample levels, the random estimation procedure is not very reliable for those conditions. Ben Bolker has an excellent summary that we strongly recommend that you read to learn more about this. Below is an excerpt from that on the consequences of too few levels to estimate a random effect:

1 This animation show how accounting for grouping structure as random effects can impact the results of a linear model.

Treating factors with small numbers of levels as random will in the best case lead to very small and/or imprecise estimates of random effects; in the worst case it will lead to various numerical difficulties such as lack of convergence, zero variance estimates, etc.. (A small simulation exercise shows that at least the estimates of the standard deviation are downwardly biased in this case; it’s not clear whether/how this bias would affect the point estimates of fixed effects or their estimated confidence intervals.) In the classical method-of-moments approach these problems may not arise (because the sums of squares are always well defined as long as there are at least two units), but the underlying problems of lack of power are there nevertheless.

14.1 Variance component estimation for 2+ random effects

14.2 Multiple crossed variance components

This is a common scenario: an experiment is conducted and their are multiple grouping levels, none of which are nested within each other, but the observations have multiple group memberships. A common example is a multi-environmental trial where multiple crop genotypes are evaluated at multiple locations and years. We can consider genotypes, locations and years as random effects depending on our experimental aims. Very few R packges can handle crossed random effects, but lme4 can!

The data used in this example is a collection of potato field trials conducted across 3 locations and 17 years. Each year, the same potato clones[^vc-1] are evaluated across all 3 locations, but each year, different clones are evaluated. Some clones are evaluted for multiple years, and a small number are evaluted each year. These potato clones were evaluated for the length-to-width ratio (LxW). We want to know how much location, year and clone contribute to this trait.

[^vc-1] A clone is a genetically distinct potato genotype that is vegetatively propagated. It may be a released variety or an experimental breeding line.

year year of trial, 16 levels
state location of trial, 3 levels
clone potato genotype
LxW length-to-width ratio
potato <- read.csv(here::here("data", "potato_tuber_size.csv"))

Number of observations for each location and year:

table(potato$year, potato$state)
      
       ID OR WA
  2005 30 30 30
  2006 30 30 30
  2007 29 29 29
  2008 25 25 25
  2009 16 17 17
  2010 28 28 28
  2011 18 24 26
  2012 30 32 32
  2013 24 25 25
  2014 33 33 33
  2015 29 31 31
  2016 25 25 25
  2017 18 21 22
  2018 25 25 25
  2019 22 23 23
  2020 23 24 24

Total number of clones evaluated:

length(unique(potato$clone))
[1] 181

Total counts for how often invividual clones were evaluated:

potato |> count(clone, name = "Frequency of Evaluation") |> count(`Frequency of Evaluation`, name = "No. of Clones") 
   Frequency of Evaluation No. of Clones
1                        2             4
2                        3            92
3                        4             1
4                        6            30
5                        8             2
6                        9            20
7                       10             1
8                       11             2
9                       12            11
10                      13             1
11                      14             4
12                      15             8
13                      16             1
14                      17             1
15                      18             1
16                      91             1
17                      93             1

Distribution of the dependent variable, LxW or length-to-width ratio:

Distribution of Length-to-Width Ratio

14.2.1 Model fitting

For this analysis, location (“state”) is a fixed effect because it only has 3 levels, and the researchers are only interested in understanding clone performance in those locations (Idaho, Oregon and Washington). The remaining effects are random because (1) they represent a sample of the full population that inference is desired for, and (2) each has a sufficient number of levels to make estimation feasible.

potato_m1 <- lmer(LxW ~ state + (1|year) + (1|state:year) + (1|clone),
                  data = potato)
performance::check_model(potato_m1, check = c('qq', 'linearity'), detrend=FALSE, alpha = 0)

14.2.2 Inference on random effects

The variance components for each random effect can be extracted most easily using the tidy() function from broom.mixed. We can also a bit of extra code to calculate the percent of variance explained by each component.

var_comps <- tidy(potato_m1, "ran_pars", scales = "vcov") |> 
  dplyr::select(group, variance = "estimate") |> 
  mutate(`percent variance` = round(variance/sum(variance) * 100, 1))

var_comps
# A tibble: 4 × 3
  group      variance `percent variance`
  <chr>         <dbl>              <dbl>
1 clone      0.00344                61.6
2 state:year 0.000468                8.4
3 year       0.000301                5.4
4 Residual   0.00138                24.6

::: note-information, collapse=false ## log likelihood ratio tests ANOVA as classically defined (F-tests contrasting the between group and within group variation) is not an option for random effects. There are several ways to test if random effects are impactful on the model overall. One of the most reliable and popular methods is the log likelihood ratio test. In brief, a reduced model is refit from a full specified model omitting a random variable. The log likelihood from the two models (the full specified and reduced models) are compared and a p-value is computed for that difference given the change in number of parameters estimated. The null hypothesis is that the models are equivalent and the alternative hypothesis is that the models are not equivalent. Hence, low p-values provide evidence that the omitted factor is impactful on the dependent variable of interest. :::

The function ranova() in lmerTest conducts log-likelihood ratio tests for all random effects in a model.[^vc-2]

ranova(potato_m1)
ANOVA-like table for random-effects: Single term deletions

Model:
(1/LxW) ~ state + (1 | year) + (1 | state:year) + (1 | clone)
                 npar logLik     AIC     LRT Df Pr(>Chisq)    
<none>              7 2041.9 -4069.7                          
(1 | year)          6 2039.3 -4066.7    5.07  1    0.02436 *  
(1 | state:year)    6 1953.7 -3895.4  176.34  1    < 2e-16 ***
(1 | clone)         6 1477.3 -2942.7 1129.03  1    < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

[^vc-2] It is also possible to conduct likelihood tests manually by constructing reduced models and comparing it to the fully specificed (or more specified) model. In such cases, the fixed effects need to be identical between models, and they need to fit using maximimun likelihood instead of REML.

It is possible to make inferences on specific levels of random effects. This take a different analytical approach than fixed effects and has a different term, predictions, or more specifically Best Linear Unbiased Predictions, commonly called “BLUPs” (rhymes with “cups”). The estimated marginal means from fixed effects are technically BLUEs, or Best Linear Unbiased Estimates. By their nature, BLUPs “shrink” the estimates towards zero, reducing their overall spread compared to fixed effects. VSNi has a short summary on BLUPs and BLUEs.

Recall that random effects are distributed with a mean of zero and a standard deviation, \(\sigma_b\) that is estimated in the model fitting procedure. We can add the overall model intercept to all BLUPs in order to shift them to a scale that may be more intuitive for some. Since this constant (the model intercept) was added to all BLUPs, the overall relationship between the them (in this example, potato clones) is unchanged.

# the model intercept
intercept = fixef(potato_m1)[1]

# all random effects from the model
random_effects <- REextract(potato_m1)

# filter to clone BLUPs and add the intercept
clone_re <- filter(random_effects, groupFctr == "clone") |> 
  rename(clone = groupID, BLUP = "(Intercept)", SE = "(Intercept)_se") |> 
  mutate(blup_adj = BLUP + intercept)

Histogram of Clone BLUPs

Below are the BLUPs (in red) and standard errors for each clone, arranged from lowest to highest length-to-width ratio. The horizontal gray dotted line indicates the average clone effect.

ggplot(clone_re, aes(x = reorder(clone, blup_adj), y = blup_adj)) +
  geom_hline(yintercept = intercept, color = "gray", linetype = 2) + 
  geom_linerange(aes(ymax = blup_adj + SE, ymin = blup_adj - SE)) +
    geom_point(size = 0.7, col = "violetred1") + 
  theme_classic(base_size = 14) +
  ggtitle("length-to-width ratio") + 
  ylab("BLUP + intercept") + 
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(), 
        axis.title.x = element_blank())

14.3 Nested variance components

It is possible to have a nested data set: learners in a classroom and classrooms in institutions, cows in dairies and dairies in counties or other regional indicators. These end up being truly hierarchical models, where we expect that there is some covariance between observations at the different levels. Cows in the same dairy are likely to be more correlated with each other than cows from another dairy. But how correlated? And how much does each grouping level influence the variation of whatever dependent variable is being studied.

This example is looking at pay at institutions of higher education in Washington. All annual earnings less than $25,000 were excluded from the data set, and only earners with a minimum of four years of data were included. Although this is public information, all individual names were anonymized.2

2 This data set (source) was incompletely curated and hence no substantive conclusions should be drawn from this particular analysis. This example is for teaching and demonstration purposes only.

For this data set, we are concerned with what is contributing to employee salaries over time - to what extent did year, the institution itself and the individual employee contribute to salary changes over time. While this reflects a complete data set for Washington State (all institutions, all employees), the interest is inference for other U.S. higher education institutions. This is an example where we will use random slopes to examine how an individual’s salary changed over time. The data set:

agency educational institution, 36 levels
id anonymized employee within each agency
salary annual salary
year year of salary, set to 0 for 2019

This is a rather large data set, and it requires quite a bit of computing power to analyze. Hence, the analysis was done locally, and the data set and model object were saved as a .RData file. Now let’s load this object into an R session.

load(here::here("data/salary_modeldata.RData"))

14.3.1 Model fitting

Salary is very skewed, where the 70% of salaries are less than $100,000, but there are a few very high salaries, up to a maximum value of $3.1 million. We will log transform the data in a log-linear model for analysis. This is a special case of generalized linear mixed model (GLMM) and the only GLMM model used because it still follow the assumption of normally-distributed residuals terms when a the dependent variable is log-transformed.

hist(log(wa_salary$salary), ylab=NULL, xlab = NULL, main = "Washington Higher Ed Salaries")

Histogram of Log of Salaries
boxplot(log(salary) ~ year, data = wa_salary, ylab=NULL, xlab = NULL, main = "Salaries by Year")

Boxpplot of Log Salaries by Year

The analysis is derived from this model:

\[y_ij = \mu + x_i + \]

m1 <- lmer(log(salary) ~ factor(year) + (year|agency/id), 
           data = wa_salary, REML = FALSE)

random_effects <- REextract(m1)

save(m1, wa_salary, random_effects, file = here::here("data", "salary_modeldata.RData"))
m1 <- lmer(log(salary) ~ factor(year) + (year|agency/id), 
           data = wa_salary, REML = FALSE)

Random intercepts and slopes were fit for agency and id nested within agency.

14.3.2 Inference

14.3.2.1 Look at Variance Components

VarCorr(m1)
 Groups    Name        Std.Dev. Corr  
 id:agency (Intercept) 0.473969       
           year        0.079368 -0.368
 agency    (Intercept) 0.087550       
           year        0.016624 -0.674
 Residual              0.158676       
var_comps_salary <- tidy(m1, "ran_pars", scales = "vcov") |> 
  filter(grepl("^var.*", term)) |> 
  dplyr::select(group, term, variance = "estimate") |> 
  mutate(`percent variance` = round(variance/sum(variance) * 100, 1))

var_comps_salary
# A tibble: 5 × 4
  group     term             variance `percent variance`
  <chr>     <chr>               <dbl>              <dbl>
1 id:agency var__(Intercept) 0.225                  85.1
2 id:agency var__year        0.00630                 2.4
3 agency    var__(Intercept) 0.00767                 2.9
4 agency    var__year        0.000276                0.1
5 Residual  var__Observation 0.0252                  9.5

14.3.2.2 Extract BLUPs

# the model intercept
intercept = fixef(m1)[1]

# all random effects from the model
random_effects <- REextract(m1)

14.3.2.3 Visualize Random Effects

(intercepts and slopes)

  • Filter BLUPs to a single factor, “agency” (i.e. institute of higher education).
agency_re <- filter(random_effects, groupFctr == "agency") |> 
  rename(Institution = groupID, BLUP = "(Intercept)", SE = "(Intercept)_se") |> 
  mutate(blup_adj = BLUP + intercept) |> 
  mutate(`salary slope` = ifelse(year > 0, "+", "-"))
  • Plot Salaries by Institution:
ggplot(agency_re, aes(x = reorder(Institution, blup_adj), y = blup_adj)) +
  geom_hline(yintercept = intercept, color = "gray", linetype = 2) + 
  geom_linerange(aes(ymax = blup_adj + SE, ymin = blup_adj - SE)) +
    geom_point(size = 3, col = "violetred", shape = 18, alpha = 0.6) + 
  labs(title = "Institutional Salaries, 2019 - 2024", subtitle = "Log of Salaries") + 
  ylab("BLUP + mean and standard error") + 
  theme_classic(base_size = 14) +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(), 
        axis.title.x = element_blank())

  • Plot Salary Changes by Institution:
ggplot(agency_re) +  # need sequence -0.4 to 0.4
  geom_abline(aes(slope = year, intercept = blup_adj, col = `salary slope`), alpha = 0.7, linewidth = 0.75) +
  geom_hline(yintercept = intercept, color = "black", linetype = 2) + 
  labs(title = "Changes in Institutional Salaries, 2019 - 2024", subtitle = "Log of Salaries") + 
  xlim(c(0, 4)) +
  ylim(c(10.75, 11.3)) + 
  scale_color_manual(values = c("springgreen3", "violetred")) +
  #ylim(c(-0.25, 0.3)) +
  ylab("log salary") + xlab("year") + 
  theme_bw(base_size = 14) +
  theme(legend.position = c(0.98, 0.98),
        legend.justification = c(1,1),
        legend.background = element_rect(color = "gray60", fill = rgb(1, 1, 1, alpha = 0.8)),
        legend.key = element_rect(fill = "transparent"))
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.

  • Examine Relationship Between Employee Salary and Changes in Salary:
person_re <- filter(random_effects, groupFctr == "id:agency") |> 
  rename(employee = groupID, BLUP = "(Intercept)", SE = "(Intercept)_se") |> 
  mutate(blup_adj = BLUP + intercept) |> 
  mutate(`salary slope` = ifelse(year > 0, "+", "-"))
ggplot(person_re, aes(x = BLUP, y = year)) +
  geom_point(alpha = 0.4, color = "royalblue") +
  geom_hline(yintercept = 0, col = "black", linetype = 2) +
  geom_vline(xintercept = 0, col = "black", linetype = 2) +
  labs(title = "Changes in Employee Salaries, 2019 - 2024", subtitle = "(Log of Salaries)") +
  xlab("random intercept (BLUP) for Employee Salary") +
  ylab("random slope for Employee salary increase") +
  theme_bw()

There is a negative correlation between employee salary and employee salary change, which is also indicated when VarCorr(m1) was run.

There are other functions in merTools to explore random effects and variances from an lme4 analysis.

14.4 Variance estimation under heteroscedasticity

In the “general” linear model days, when a premium was placed on the i.i.d. error paradigm, if we did reject \(H_0\), it would set off a minor crisis in the form of a hunt for a variance stabilizing transformation. In contemporary modeling, we simply proceed with inference on estimable functions using the equal variance model.

– Walt Stroup, Marina Ptukhiuna and Julie Garai (2024), Generalized Linear Mixed Models, 2nd Ed, Section 7.2.3.1

In previous sections, we have assumed the error terms or residuals were “i.i.d.”, that is “independently and identically distributed. This means they were shared the same distribution (identical) and were uncorrelated (independent). Longitudinal studies, that is, those with repeated measures, do have correlated residuals, so we relax the independence assumption and model those correlations. However, residuals can be unexpectedly related to the their observations, particular treatments or the order data was gathered from the experimental units (among other causes). As mentioned in the previous quote, we now have tools for handling this rather than trying to transform the data. Below are examples on how to model heteroscedasticity.

14.4.1 Case 1: unequal variance due to a factor

This data set is from a set of canola variety trials conducted in a single year across multiple locations. The trials included 38 varieties that were evaluated at 9 locations using a RCBD design.

var_ex1 <- read.csv(here::here("data", "MET_trial_variance.csv")) |> 
  mutate(block = as.character(block)) |> 
  tidyr::drop_na()

Exploratory data visualizations indicate that the dependent variable, seed yield, varied greatly overall and certain locations had smaller variance compared to others.

hist(var_ex1$yield, main=NA)
boxplot(yield ~ site, data = var_ex1)

The study is not fully crossed; all sites did not include all varieties, although there is substantial overlap. As a result, only variety and the site-by-variety interaction are included in the statistical model.

While it may be possible to implement heterogenous error structures with lme4, it requires a good working knowledge of that R package. is requires extensive comfort programming in lme4. However, the package nlme contains relatively straightforward implementations that are illustrated below.

m1_a <- lme(yield ~ site:variety + variety, 
                random = ~ 1 |site/block, 
                na.action = na.exclude, 
                data = var_ex1)

The residual plot indicates some association between the residuals and fitted values, violating the i.i.d. model assumption.

plot(m1_a)

We can add a term to model the variance by site.

\[Var(\epsilon_{ij}) = \sigma^2 \delta^2_{s_{ij}} \] Details on the implementation can be found in (Pinheiro and Bates 2000).

m1_b <- update(m1_a, weights = varIdent(form = ~1|site))

The function varIdent() is used to set the stratifying variable at which to estimate variance. Like many functions in R, there are additional arguments to consider for more complex scenarios (type ?varIdent in an R console to check).

m1_b <- update(m1_a, weights = varIdent(form = ~1|site))

is equivalent to

m1_b <- lme(yield ~ site:variety + variety, 
                random = ~ 1 |site/block,
                weights = varIdent(form = ~1|site), 
                na.action = na.exclude, 
                data = var_ex1)

The residual plot is now much cleaner. The result is a better-fitting model and with that, better inference for variety at the site level.

plot(m1_b)

anova(m1_a, m1_b)
     Model  df      AIC      BIC    logLik   Test  L.Ratio p-value
m1_a     1 345 14826.99 16524.62 -7068.493                        
m1_b     2 353 14499.68 16236.68 -6896.840 1 vs 2 343.3059  <.0001
m1_c <- lmer(yield ~ site:variety + variety + (1|site/block),
                #weights = varIdent(form = ~1|site), 
                na.action = na.exclude, 
                data = var_ex1)

14.4.2 Case 2: Variance is related to the fitted values

Below is the infamous ‘horn’ pattern in the residuals-vs-fitted values plot. This analysis is from a canola variety trial of 38 cultivars conducted at single location for one field season.

var_ex2 <- read.csv(here::here("data", "single_trial_variance.csv")) |> 
  dplyr::mutate(block = as.character(block))

A histogram does indicate there is anything unusual about the dependent variable (seed yield), except that is varies quite a bit, ranging from 24 to nearly 950 units.

hist(var_ex2$yield, main = NA)

Since this experiment has a single fixed effect and is arranged using a RCBD, the model is same as described in the RCBD chapter.

m2_a <- lme(yield ~ variety, 
               random = ~ 1 |block, 
               na.action = na.exclude, 
               data = var_ex2)

An inspection of the residual plot indicates a clear mean-variance relationship.

plot(m2_a)

This mean-variance relationship can be mitigated by modelling the variance directly as a function of any covariate in the model using a power function.

\[ Var(\epsilon_{ij}) = \sigma^2|\nu_{ij}|^{2\delta}\]

We can accomplish this using the nlme function varPower(). This function can take other covariates, but when there is no argument provided, it defaults to using the fitted values.

m2_b <- update(m2_a, weights = varPower())

The model fit is substantially improved according to visual inspection of the residuals and the results of a likelihood ratio test.

plot(m2_b)

anova(m2_a, m2_b)
     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m2_a     1 40 1545.596 1655.044 -732.7978                        
m2_b     2 41 1519.199 1631.383 -718.5996 1 vs 2 28.39636  <.0001

There are many other ways of using these functions for modeling heteroscedasticity. For example, varIdent() can include a covariate, and varPower() can include a stratifying variable. All or some of the parameters can be fixed at set values. It’s worth reading the documentation to understand what is possible.

14.4.3 Case 3: Variance is related to a factor under other complex circumstances

Recall the repeated measures/split-split plot example that indicated some evidence of heteroscedasticity.

check_model(fit1, check = c('qq', 'linearity'), detrend=FALSE)

Figure 14.1: Boxplot of P concentration by stage

We can model the variance as a function of time point by using the varIdent() function shown earlier.

fit1_b <- update(fit1, weights = varIdent(form = ~1|time))
check_model(fit1_b, check = c('qq', 'linearity'), line_size = 0, detrend=FALSE)

A comparison of models indicates that ’fit1_b` (with heteroscedascity modelled) is a better fit.

anova(fit1, fit1_b)
       Model df      AIC      BIC    logLik   Test  L.Ratio p-value
fit1       1 66 2511.987 2722.722 -1189.993                        
fit1_b     2 68 2508.314 2725.435 -1186.157 1 vs 2 7.672945  0.0216

The main impact of this change are the standard error for time:

emmeans(fit1, ~ time)
NOTE: Results may be misleading due to involvement in interactions
 time emmean   SE df lower.CL upper.CL
 PT1    3096 47.2  3   2945.3     3246
 PT2    2270 47.2  3   2119.3     2420
 PT3     198 47.2  3     47.4      348

Results are averaged over the levels of: Ptrt, Inoc, Cv 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 
emmeans(fit1_b, ~ time)
NOTE: Results may be misleading due to involvement in interactions
 time emmean   SE df lower.CL upper.CL
 PT1    3096 39.9  3   2968.5     3223
 PT2    2270 39.5  3   2143.9     2395
 PT3     198 37.6  3     78.1      317

Results are averaged over the levels of: Ptrt, Inoc, Cv 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 

14.4.3.1 Inference

anova(fit1, type = "marginal")
                  numDF denDF   F-value p-value
(Intercept)           1   120 1458.2396  <.0001
time                  2   120  518.4432  <.0001
Ptrt                  1     3    3.3677  0.1638
Inoc                  1     6    1.7697  0.2317
Cv                    4    48    7.6028  0.0001
time:Ptrt             2   120    0.6790  0.5091
time:Inoc             2   120    1.9811  0.1424
Ptrt:Inoc             1     6    2.4919  0.1655
time:Cv               8   120    2.4113  0.0189
Ptrt:Cv               4    48    0.5082  0.7299
Inoc:Cv               4    48    2.1349  0.0909
time:Ptrt:Inoc        2   120    0.8410  0.4338
time:Ptrt:Cv          8   120    0.2223  0.9863
time:Inoc:Cv          8   120    0.9462  0.4816
Ptrt:Inoc:Cv          4    48    0.4761  0.7530
time:Ptrt:Inoc:Cv     8   120    0.3886  0.9249
emmeans(fit1_b, ~ Inoc|Cv)
NOTE: Results may be misleading due to involvement in interactions
Cv = ALPOWA:
 Inoc emmean   SE df lower.CL upper.CL
 myco   1832 45.2  3     1688     1976
 none   1904 45.2  3     1760     2048

Cv = BlancaG:
 Inoc emmean   SE df lower.CL upper.CL
 myco   1919 45.2  3     1775     2063
 none   1899 45.2  3     1755     2043

Cv = LOUISE:
 Inoc emmean   SE df lower.CL upper.CL
 myco   1876 45.2  3     1732     2020
 none   1898 45.2  3     1754     2042

Cv = OTIS:
 Inoc emmean   SE df lower.CL upper.CL
 myco   1855 45.2  3     1711     1998
 none   1956 45.2  3     1812     2099

Cv = WALWORTH:
 Inoc emmean   SE df lower.CL upper.CL
 myco   1667 45.2  3     1523     1811
 none   1737 45.2  3     1593     1881

Results are averaged over the levels of: time, Ptrt 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 
emmeans(fit1_b, ~ time|Cv)
NOTE: Results may be misleading due to involvement in interactions
Cv = ALPOWA:
 time emmean   SE df lower.CL upper.CL
 PT1    3201 53.5  3   3031.1     3371
 PT2    2225 51.8  3   2059.7     2389
 PT3     178 44.1  3     37.7      319

Cv = BlancaG:
 time emmean   SE df lower.CL upper.CL
 PT1    3183 53.5  3   3013.1     3353
 PT2    2334 51.8  3   2169.3     2499
 PT3     210 44.1  3     69.2      350

Cv = LOUISE:
 time emmean   SE df lower.CL upper.CL
 PT1    3121 53.5  3   2950.9     3291
 PT2    2366 51.8  3   2201.4     2531
 PT3     174 44.1  3     33.8      315

Cv = OTIS:
 time emmean   SE df lower.CL upper.CL
 PT1    3228 53.5  3   3058.2     3398
 PT2    2253 51.8  3   2088.5     2418
 PT3     234 44.1  3     93.1      374

Cv = WALWORTH:
 time emmean   SE df lower.CL upper.CL
 PT1    2744 53.5  3   2573.9     2914
 PT2    2170 51.8  3   2004.7     2334
 PT3     193 44.1  3     52.1      333

Results are averaged over the levels of: Ptrt, Inoc 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 

14.5 Coefficient of Variation

The coefficient of variation can be manually calculated as thus:

\[ \frac {\sigma}{\mu} * 100 \]

m1_ave <- mean(var_ex1$yield, na.rm = TRUE)
m1_cv = sigma(m1_b)/m1_ave*100
round(m1_cv, 1)
[1] 23.3

However, in cases of unequal variance, the overall error term can be larger than expected under homoscedasticity (with varIdent()) or much much smaller (e.g. with varPower()). Interpret the coefficient of variation from mixed models with caution.

Pinheiro, José C., and Douglas M. Bates. 2000. Mixed-Effects Models in s and s-PLUS. New York: Springer. https://doi.org/10.1007/b98882.
13  Marginal Means and Contrasts
15  Troubleshooting

© 2025

Uidaho Logo

  • View source
  • Report an issue