Beyond GLMs: How GLMs relate to other types of models
In this chapter you will learn:
how GLMs relate to other types of models (i.e. classical models)
1 Beyond GLMs
When to use GLMs (vs. these other methods): - when you have a linear shape assumption
When to use these other methods (vs. a GLM): - when you need to relate to previous research (e.g. interpretting coefficients, comparing results across)
1.1 Other types of linear models
As we have discussed in class, you are testing your hypothesis by building a model of your hypothesis so that you can fit the model to data and determine the evidence for your hypothesis. Remember that models are an approximation of what is going on in the real world and that “all models are wrong, but some are useful”. It follows then that there could be more than one useful model that can represent your hypothesis.
Here we will talk about classical models or tests. These are other types of models that can be used to test your hypothesis when you have a linear shape assumption. By having these other tests “on your radar” you will be better able to communicate with those who might be using different methods than you.
A few things to note about these “classical” models:
With classical models, assessing if your model is valid often comes before you apply the model (e.g. checking to see that your response is normal before you apply the model)
Some classical models are mathematically identical to a GLM, e.g.
a simple linear model (lm()) is the same as a GLM when your error distribution assumption is normal
an Analysis of Variance or ANOVA is the same as a GLM when your error distribution assumption is normal, and your predictors are all categorical
For some types of observations, there are no equivalent “classical” models - you need a GLM. This is particularly the case when you have non-normal error distribution assumptions that are discrete, such as poisson (count data) or binomial.
When the effect of your predictor on your response is big, your results from a GLM will agree with the classical model equivalent. When the effect of your predictor on your response is small, you might conclude different things about your hypothesis depending on the model you use to test it. This can be frustrating/confusing, but it is just the nature of the process: you are using models that are only approximations of the real world.
Here we will compare the method we have been using (GLM) with a classical model for a number of situations:
Resp1 ~ Cat1 + 1 (where you have a categorical predictor with only two factor levels)
Resp2 ~ Cat2 + 1 (where you have a categorical predictor with more than two factor levels)
Resp3 ~ Num3 + 1 (where you have a numeric predictor)
Resp4 ~ Cat4 + Num5 + 1 (where you have a categorical and a numeric predictor)
Resp5 ~ Cat6 + 1 (where you have a categorical predictor with only two factor levels AND observation dependence)
We will also include one more example that we haven’t talked about yet - how you test if the mean of your response is equal to a constant value (i.e. a “one-sample test”): * Resp6 ~ constant
This section may seem overwhelming, but remember that you already know how to address every hypothesis discussed here (with a GLM). This section is to make you aware of the other options available to you.
Resp1 ~ Cat1 + 1 (where Cat1 has two factor levels)
Resp1 ~ Cat1 + 1 (where Cat1 has two factor levels) with a normal error distribution assumption - GLM vs. two-sample t-test
An example: monthly weight change is different between North and South populations or WtChange ~ Pop
Using our GLM method, we would set up our starting model as:
startMod.glm <-glm(WtChange ~ Pop +1, data = myDat,family =gaussian(link ="identity"))summary(startMod.glm)
Call:
glm(formula = WtChange ~ Pop + 1, family = gaussian(link = "identity"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.9000 0.5352 63.34 <2e-16 ***
PopSouth 15.0000 0.7569 19.82 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 4.297143)
Null deviance: 1807.82 on 29 degrees of freedom
Residual deviance: 120.32 on 28 degrees of freedom
AIC: 132.81
Number of Fisher Scoring iterations: 2
and would go on with our statistical modelling framework.
Using a classical model, we can choose the two-sample t-test:
# first check to see the response is normally distributed with a shapiro test for each population:samp1<-subset(myDat, Pop =="South")$WtChangesamp2<-subset(myDat, Pop =="North")$WtChangeshapiro.test(samp1)
Shapiro-Wilk normality test
data: samp1
W = 0.9388, p-value = 0.3675
shapiro.test(samp2)
Shapiro-Wilk normality test
data: samp2
W = 0.97638, p-value = 0.9388
# then check to see if the variances of the two observations for the two populations are equal:var.test(samp1, samp2)
F test to compare two variances
data: samp1 and samp2
F = 0.94693, num df = 14, denom df = 14, p-value = 0.9202
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3179109 2.8205007
sample estimates:
ratio of variances
0.9469256
# finally we can apply our two-sample t-test: t.test(samp1, # one population samp2, # another populationvar.equal =TRUE) # the variances were equal. If they aren't equal, set to FALSE to fit a Welch's two-sample test
Two Sample t-test
data: samp1 and samp2
t = 19.817, df = 28, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
13.44949 16.55051
sample estimates:
mean of x mean of y
48.9 33.9
## Note you can also use the syntax:# t.test(WtChange~Pop, # formula# data = myDat, # data# var.equal = TRUE) # the variances were equal. If they aren't equal, set to FALSE to fit a Welch's two-sample test
Notice that the two methods lead to identical results based on P-values.
Resp1 ~ Cat1 + 1 (where Cat1 has two factor levels) with a non-normal (but continuous) error distribution assumption - GLM vs. Wilcoxon Rank Sum test
Example: tree height variation is explained by population (North vs. South) or Height ~ Pop
Using our GLM method we would set up our starting model as:
startMod.glm <-glm(Height ~ Pop +1, # hypothesisdata = myDat, # datafamily =Gamma(link ="inverse")) # error distribution assumptionsummary(startMod.glm)
Call:
glm(formula = Height ~ Pop + 1, family = Gamma(link = "inverse"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.049242 0.002856 17.24 < 2e-16 ***
PopSouth -0.038930 0.002918 -13.34 1.18e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 0.05047313)
Null deviance: 17.9989 on 29 degrees of freedom
Residual deviance: 1.2768 on 28 degrees of freedom
AIC: 222.93
Number of Fisher Scoring iterations: 4
and would then go on with our statistical modelling framework.
Using a classical model, we can choose the Wilcoxon Rank Sum test.1:
wilcox.test(Height ~ Pop +1, # hypothesisdata = myDat) # data
Wilcoxon rank sum exact test
data: Height by Pop
W = 0, p-value = 1.289e-08
alternative hypothesis: true location shift is not equal to 0
Notice that the two methods lead to similar results (both methods show evidence that the mean height of the trees is different across the populations).
Resp2 ~ Cat2 + 1 (where Cat2 has more than two factor levels)
Resp2 ~ Cat2 + 1 (where Cat2 has more than two factor levels) with a normal error distribution assumption - GLM vs. ANOVA
An example: monthly weight change is explained by species (Sp1, Sp2 and Sp3) or WtChange ~ Sp
Warning in as.numeric(forCat) * 14 + 20 + rnorm(n, 0, 3): longer object length
is not a multiple of shorter object length
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Using our GLM method we would set up our starting model as:
startMod.glm <-glm(WtChange ~ Sp +1, data = myDat,family =gaussian(link ="identity"))summary(startMod.glm)
Call:
glm(formula = WtChange ~ Sp + 1, family = gaussian(link = "identity"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.3533 0.7854 41.19 <2e-16 ***
SpSp1 16.2267 1.1107 14.61 <2e-16 ***
SpSp3 28.0000 1.1107 25.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 9.252349)
Null deviance: 6318.2 on 44 degrees of freedom
Residual deviance: 388.6 on 42 degrees of freedom
AIC: 232.72
Number of Fisher Scoring iterations: 2
and would go on with our statistical modelling framework. Here we can get a p-value associated with our predictor with:
Df Sum Sq Mean Sq F value Pr(>F)
Sp 2 5930 2964.8 320.4 <2e-16 ***
Residuals 42 389 9.3
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the ANOVA and GLM methods lead to identical results.
Other notes about ANOVAs:
With an ANOVA, you assess your model’s validity by inspecting residuals, i.e. after you fit the model (as we do with the GLM).
You might see people mentioning one-way vs. two-way ANOVA. This just describes if you have one categorical predictor in your model (one-way ANOVA) or two (two-way ANOVA).
You might see people mentioning Type I vs. Type II vs. Type III ANOVAs. These are different ANOVA methods used to handle predictor collinearity when selecting which predictors should be in your best-specified model. You can change the type of ANOVA you are fitting with the Anova()2 function in the car package.
Resp2 ~ Cat2 + 1 (where Cat2 has more than two factor levels) with a non-normal (but continuous) error distribution assumption - GLM vs. Kruskal-Wallis
Example: tree height variation is explained by species (Sp1, Sp2 and Sp3) or Height ~ Sp + 1
Using our GLM method we would set up our starting model as:
Global model call: glm(formula = Height ~ Sp + 1, family = Gamma(link = "inverse"),
data = myDat)
---
Model selection table
(Intrc) Sp df logLik AICc delta weight
1 0.04688 2 -92.095 188.6 0.00 0.875
2 0.05029 + 4 -91.459 192.5 3.88 0.125
Models ranked by AICc(x)
and would then go on with our statistical modelling framework.
Using a classical model, we can use the Kruskal-Wallis model:
kruskal.test(Height ~ Sp +1, data = myDat)
Kruskal-Wallis rank sum test
data: Height by Sp
Kruskal-Wallis chi-squared = 0.90794, df = 2, p-value = 0.6351
Notice that the two methods lead to similar results (both conclude that the mean height of the trees is NOT explained by species).
Resp3 ~ Num3 + 1
Resp3 ~ Num3 + 1 with a normal error distribution assumption - GLM vs. LM
An example: monthly weight change is explained by temperature or WtChange ~ Temp + 1
Using our GLM method we would set up our starting model as:
startMod.glm <-glm(WtChange ~ Temp +1, data = myDat,family =gaussian(link ="identity"))summary(startMod.glm)
Call:
glm(formula = WtChange ~ Temp + 1, family = gaussian(link = "identity"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.35091 1.28827 14.24 2.36e-14 ***
Temp 1.46301 0.09006 16.24 8.75e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 8.665215)
Null deviance: 2529.20 on 29 degrees of freedom
Residual deviance: 242.63 on 28 degrees of freedom
AIC: 153.85
Number of Fisher Scoring iterations: 2
Using a classical model, we can choose a simple linear model:
startMod.lm <-lm(WtChange ~ Temp +1, data = myDat)summary(startMod.lm)
Call:
lm(formula = WtChange ~ Temp + 1, data = myDat)
Residuals:
Min 1Q Median 3Q Max
-4.6278 -1.4179 -0.0812 0.9774 8.7171
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.35091 1.28827 14.24 2.36e-14 ***
Temp 1.46301 0.09006 16.24 8.75e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.944 on 28 degrees of freedom
Multiple R-squared: 0.9041, Adjusted R-squared: 0.9006
F-statistic: 263.9 on 1 and 28 DF, p-value: 8.754e-16
Notice that the simple linear model and GLM methods lead to identical results.
Resp3 ~ Num3 + 1 with a non-normal error distribution assumption - GLM!!
An example: plant height is explained by temperature or Height ~ Temp
If your error distribution assumption is continuous, you can try a normal error distribution assumption and fit a simple linear model but you need to carefully assess your residuals.
If your error distribution assumption is discrete (e.g. poisson, binomial), you need a GLM.
e.g.
startMod.glm <-glm(Height ~ Temp +1, data = myDat,family =Gamma(link ="inverse"))
Resp4 ~ Cat4 + Num5 + 1
Resp4 ~ Cat4 + Num5 + 1 with a normal error distribution assumption - GLM vs. ANCOVA
An example: monthly weight change is explained by temperature and location or WtChange ~ Temp + Location + Temp:Location
Using our GLM method we would set up our starting model as:
Notice that the ANCOVA and GLM method lead to identical results. These are the same model.
Resp4 ~ Cat4 + Num5 + 1 with a non-normal error distribution assumption - GLM!!
An example: plant height is explained by temperature and location or Height ~ Temp + Location + Temp:Location + 1
If your error distribution assumption is continuous, you can try a normal error distribution assumption and fit a ANCOVA to test your hypothesis but you need to carefully assess your residuals.
If your error distribution assumption is discrete (e.g. poisson, binomial), you need a GLM.
Resp5 ~ Cat6 + 1 (where Cat6 has two factor levels AND observation dependence)
Resp5 ~ Cat6 + 1 with a normal error distribution assumption - GLM vs. paired t-test
An example: monthly weight change before and after maturity or WtChange ~ State, but observations are dependent on one another as the weight change was measured from the same individual before and after maturity.
Using our GLM method, we could set up our starting model as:
startMod.glm <-glm(WtChange ~ State + ID +1, # hypothesisdata = myDat, # datafamily =gaussian(link ="identity")) # error distribution assumption
where individual ID is included as a fixed effect. Note that we could also set up a mixed model including ID as a random effect6
Here we can get a p-value associated with our predictor with:
anova(startMod.glm, test ="F")
Analysis of Deviance Table
Model: gaussian, link: identity
Response: WtChange
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev F Pr(>F)
NULL 29 22618.5
State 1 19405.6 28 3212.8 163.5332 5.712e-13 ***
ID 1 8.9 27 3203.9 0.0748 0.7866
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using a classical model, we can fit a paired two-sample model. We first need to change the format of our data from long:
# test if we meet the assumption of normal distribution for the **difference** between the two measures:shapiro.test(myDat.paired$Immature - myDat.paired$Mature)
Shapiro-Wilk normality test
data: myDat.paired$Immature - myDat.paired$Mature
W = 0.96462, p-value = 0.772
# we meet the assumption, so can go on to apply the paired two-sample model:t.test(myDat.paired$Immature, myDat.paired$Mature, paired =TRUE)
Paired t-test
data: myDat.paired$Immature and myDat.paired$Mature
t = 14.563, df = 14, p-value = 7.522e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
43.37517 58.35816
sample estimates:
mean difference
50.86667
Again, results between the GLM and paired two-sample test are similar (i.e. both say there is evidence of an effect of State on WtChange, while accounting for observation dependence due to ID)
Resp5 ~ Cat6 + 1 with a non-normal error distribution assumption - GLM vs. Wilcoxon Signed Rank
An example: Abundance is explained by treatment as before and after rewilding or Abund ~ Treat + 1, but observations are dependent on one another as the abundance was measured from the same location before and after rewilding.
Using our GLM method, we could set up our starting model as:
startMod.glm <-glm(Abund ~ Treat + ID +1, # hypothesisdata = myDat, # datafamily =poisson(link ="log")) # error distribution assumption
where location ID is included as a fixed effect. Note that we could also set up a mixed model including ID as a random effect7.
Here we can get a p-value associated with our predictor with:
Using a classical model, we can fit a paired two-sample model. We first need to change the format of our data from long:
head(myDat)
ID Treat Abund
1 1 Before 1
2 2 Before 8
3 3 Before 4
4 4 Before 3
5 5 Before 2
6 6 Before 2
to wide:
library(reshape2) # load the reshape packagemyDat.paired <-dcast(myDat, ID~Treat) # change orientation from long to wide
Using Abund as value column: use value.var to override.
head(myDat.paired)
ID After Before
1 1 16 1
2 2 9 8
3 3 14 4
4 4 9 3
5 5 11 2
6 6 10 2
Then we can fit a Wilcoxon signed rank model:
wilcox.test(myDat.paired$Before, # observations before myDat.paired$After, # observation afterpaired =TRUE) # the observations are paired by rows
Warning in wilcox.test.default(myDat.paired$Before, myDat.paired$After, :
cannot compute exact p-value with ties
Wilcoxon signed rank test with continuity correction
data: myDat.paired$Before and myDat.paired$After
V = 0, p-value = 0.0007211
alternative hypothesis: true location shift is not equal to 0
Again, results between the GLM and paired two-sample test are similar (i.e. both say there is evidence of an effect of Treat on Abund, while accounting for observation dependence due to ID)
Resp ~ constant
The formula notation may look like the null model but it is slightly different. This hypothesis states that our mean value is equal to a known value. In classical testing, this is sometimes called a “one sample” test. Note that there are no predictors here.
Resp ~ constant with a normal error distribution assumption - GLM vs. one-sample t-test
An example: monthly weight change is 50 g or WtChange ~ 50
Using our GLM method we need to subtract the constant from the response and would set up our starting model as:
Call:
glm(formula = WtChange - 50 ~ 1, family = gaussian(link = "identity"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.008 2.007 1.001 0.339
(Dispersion parameter for gaussian family taken to be 48.34629)
Null deviance: 531.81 on 11 degrees of freedom
Residual deviance: 531.81 on 11 degrees of freedom
AIC: 83.551
Number of Fisher Scoring iterations: 2
and would go on with our statistical modelling framework.
Using a classical model, we can choose the one-sample t-test:
shapiro.test(myDat$WtChange) # first check to see the response is normally distributed with a shapiro test
Shapiro-Wilk normality test
data: myDat$WtChange
W = 0.93208, p-value = 0.4027
t.test(myDat$WtChange, # responsemu =50) # constant to check
One Sample t-test
data: myDat$WtChange
t = 1.0006, df = 11, p-value = 0.3385
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
47.59051 56.42615
sample estimates:
mean of x
52.00833
Notice that the two methods lead to identical results.
Resp ~ constant with a non-normal (but continuous) error distribution assumption - GLM vs. one-sample Wilcoxon test
An example: tree height is 30m or Height ~ 30
Using our GLM method we need to subtract the constant from the response and would set up our starting model as:
Call:
glm(formula = Height - 30 ~ 1, family = gaussian(link = "identity"),
data = myDat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1444 1.1864 -0.122 0.906
(Dispersion parameter for gaussian family taken to be 12.66778)
Null deviance: 101.34 on 8 degrees of freedom
Residual deviance: 101.34 on 8 degrees of freedom
AIC: 51.332
Number of Fisher Scoring iterations: 2
and would go on with our statistical modelling framework. Note that the error distribution assumption here is still normal because our response is Height - 30 which could be negative or positive, and is continuous.
Using a classical model, we can choose the one-sample Wilcoxon test:
wilcox.test(myDat$Height, # responsemu =30) # constant to check
Warning in wilcox.test.default(myDat$Height, mu = 30): cannot compute exact
p-value with ties
Wilcoxon signed rank test with continuity correction
data: myDat$Height
V = 22.5, p-value = 1
alternative hypothesis: true location is not equal to 30
Notice that the two methods lead to similar results (both methods say there is evidence that the average tree height is similar to 30m)
Copyright 2025, DSP Taskforce
Footnotes
there is also a classical model called prop.test() for binomially distributed data↩︎