- how to report your hypothesis testing results (including visualizations) to communicate what your model says about your hypothesis.
Reporting your hypothesis testing results
In an upcoming section, we will discuss communicating your hypothesis testing (including results) more formally - i.e. the text and visuals you will use to write your scientific reports and papers. In the current chapter, we will gather many of the pieces needed for that task.
Remember that testing your hypothesis means looking for evidence of an effect of each predictor on your response. To report your model, you want to communicate this evidence, including:
You will report your best-specified model by reporting the terms (the predictors and any interactions) that are in your model, comparing that to the terms from your starting model.
You will report how much variability (or deviance) in your response your best-specified model is able to explain, and the relative importance of each model term (each predictor and interaction) in explaining that variability (deviance).
You will report your modelled effects - i.e. the effect of each predictor on your response. Remember that model effects are captured in your coefficients. This section will include:
how big the effect of a predictor is on your response,
how certain you are that there is an effect of the predictor on your response (is the effect bigger than zero?), and
(for categorical predictors), if the effects differ among levels of a categorical predictor.
Introducing the examples:
In this chapter, we will report model results for four example best-specified models that showcase a range of different hypotheses.
If you are re-reading this chapter with your own hypothesis you are testing, look for the example that is most similar to your own hypothesis.
The examples are:
Example 1: Resp1 ~ Cat1 + 1
Example 2: Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1
Example 3: Resp3 ~ Num4 + 1
Example 4: Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1
where
Resp# is your response variable.
Cat# indicates a categorical predictor
Num# indicates a numeric predictor
Note that each example is testing a different hypothesis by fitting a different model to a different data-set. The four examples are not related to one another.
Note also that the examples will all involve a normal error distribution assumption. We will point out (and give examples) when your process would be different for a different error distribution assumption (e.g. Gamma, poisson, and binomial)
Finally, note that we are not going through all the steps that got us to these example best-specified models, but assume we went through the previous steps (Responses, Predictors, Hypothesis, Starting model, Model validation and Hypothesis testing) as outlined in this Handbook to choose each best-specified model.
Example 1: Resp1 ~ Cat1 + 1
For example #1, assume your best-specified model says that your response variable (Resp1) is explained by a categorical predictor (Cat1):
Resp1 ~ Cat1
Your best-specified model for example #1 is in an object called bestMod1:
Call: glm(formula = Resp1 ~ Cat1 + 1, family = gaussian(link = "identity"),
data = myDat1)
Coefficients:
(Intercept) Cat1Sp2 Cat1Sp3
20.955 5.877 -4.510
Degrees of Freedom: 99 Total (i.e. Null); 97 Residual
Null Deviance: 7083
Residual Deviance: 5210 AIC: 687.1
that was fit to data in myDat1:
'data.frame': 100 obs. of 2 variables:
$ Cat1 : Factor w/ 3 levels "Sp1","Sp2","Sp3": 1 1 1 2 3 3 2 3 2 1 ...
$ Resp1: num 14.2 11.2 32.9 26.4 20.6 10.9 29 13.6 25.2 29 ...
and the dredge() table you used to pick your bestMod1 is in dredgeOut1:
Global model call: glm(formula = Resp1 ~ Cat1 + 1, family = gaussian(link = "identity"),
data = myDat1)
---
Model selection table
(Intrc) Cat1 df logLik AICc delta weight
2 20.95 + 4 -339.547 687.5 0.00 1
1 21.79 2 -354.905 713.9 26.42 0
Models ranked by AICc(x)
Example 2: Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1
For Example #2, assume your best-specified model says that your response variable (Resp2) is explained by two categorical predictors (Cat2 & Cat3) as well as the interaction between the predictors:
Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1
Your best-specified model for example #2 is in an object called bestMod2:
Call: glm(formula = Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1, family = gaussian(link = "identity"),
data = myDat2)
Coefficients:
(Intercept) Cat2TypeB Cat2TypeC
364.91 37.37 -75.94
Cat2TypeD Cat3Treat2 Cat3Control
-141.97 -43.63 64.12
Cat2TypeB:Cat3Treat2 Cat2TypeC:Cat3Treat2 Cat2TypeD:Cat3Treat2
42.71 73.61 66.02
Cat2TypeB:Cat3Control Cat2TypeC:Cat3Control Cat2TypeD:Cat3Control
81.25 -19.08 -71.72
Degrees of Freedom: 99 Total (i.e. Null); 88 Residual
Null Deviance: 1249000
Residual Deviance: 377000 AIC: 1133
that was fit to data in myDat2:
'data.frame': 100 obs. of 3 variables:
$ Resp2: num 278 335 566 341 313 ...
$ Cat2 : Factor w/ 4 levels "TypeA","TypeB",..: 3 2 2 3 4 1 1 3 4 1 ...
$ Cat3 : Factor w/ 3 levels "Treat1","Treat2",..: 3 2 3 3 3 1 1 2 1 2 ...
and the dredge() table you used to pick your bestMod2 is in dredgeOut2:
Global model call: glm(formula = Resp2 ~ Cat2 + Cat3 + Cat2:Cat3, family = gaussian(link = "identity"),
data = myDat2)
---
Model selection table
(Int) Ct2 Ct3 Ct2:Ct3 R^2 df logLik AICc delta weight
8 364.9 + + + 0.69800 13 -553.634 1137.5 0.00 0.964
4 354.9 + + 0.62540 7 -564.419 1144.1 6.55 0.036
2 388.5 + 0.55300 5 -573.243 1157.1 19.63 0.000
1 348.3 0.00000 2 -613.508 1231.1 93.64 0.000
3 331.1 + 0.03933 4 -611.502 1231.4 93.92 0.000
Models ranked by AICc(x)
Example 3: Resp3 ~ Num4 + 1
For Example #3, assume your best-specified model says that your response variable (Resp3) is explained by one numeric predictor (Num4):
Resp3 ~ Num4 + 1
Your best-specified model for example #3 is in an object called bestMod4:
Call: glm(formula = Resp3 ~ Num4 + 1, family = gaussian(link = "identity"),
data = myDat3)
Coefficients:
(Intercept) Num4
-226.9 260.1
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 276100000
Residual Deviance: 64930000 AIC: 1628
that was fit to data in myDat4:
'data.frame': 100 obs. of 2 variables:
$ Resp3: num 3668 2774 4594 1094 2100 ...
$ Num4 : num 13.93 12.15 15.4 4.87 6.64 ...
and the dredge() table you used to pick your bestMod3 is in dredgeOut3:
Global model call: glm(formula = Resp3 ~ Num4 + 1, family = gaussian(link = "identity"),
data = myDat3)
---
Model selection table
(Intrc) Num4 df logLik AICc delta weight
2 -226.9 260.1 3 -811.080 1628.4 0.00 1
1 2358.0 2 -883.457 1771.0 142.63 0
Models ranked by AICc(x)
Example 4: Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1
For Example #4, assume your best-specified model says that your response variable (Resp4) is explained by one categorical predictor (Cat4) and one numeric predictor (Num6) as well as the interaction between the predictors:
Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1
Your best-specified model for example #3 is in an object called bestMod4:
Call: glm(formula = Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1, family = gaussian(link = "identity"),
data = myDat4)
Coefficients:
(Intercept) Cat5Urban Cat5Wild Num6 Cat5Urban:Num6
98.346793 -0.236424 -0.867336 -0.002931 0.015117
Cat5Wild:Num6
0.030858
Degrees of Freedom: 99 Total (i.e. Null); 94 Residual
Null Deviance: 4454
Residual Deviance: 642.8 AIC: 483.9
that was fit to data in myDat4:
'data.frame': 100 obs. of 3 variables:
$ Resp4: num 102 102 104 110 107 ...
$ Cat5 : Factor w/ 3 levels "Farm","Urban",..: 1 2 2 3 2 1 3 2 3 2 ...
$ Num6 : num 341 490 643 412 444 ...
and the dredge() table you used to pick your bestMod4 is in dredgeOut4:
Global model call: glm(formula = Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1, family = gaussian(link = "identity"),
data = myDat4)
---
Model selection table
(Int) Ct5 Nm6 Ct5:Nm6 R^2 df logLik AICc delta weight
8 98.35 + -0.002931 + 0.8557 7 -234.930 485.1 0.00 1
4 92.25 + 0.009650 0.8166 5 -246.915 504.5 19.39 0
2 96.93 + 0.7923 4 -253.133 514.7 29.61 0
3 94.92 0.017780 0.0848 3 -327.278 660.8 175.73 0
1 104.00 0.0000 2 -331.708 667.5 182.46 0
Models ranked by AICc(x)
Up next
In the next chapter, we will discuss how you can use your model to make predictions.
Data Skills Portfolio Program CC BY-NC-SA 4.0