Statistical Modelling: Reporting

In this chapter you will learn:
  • how to report your hypothesis testing results (including visualizations) to communicate what your model says about your hypothesis.

1 Reporting your hypothesis testing results

In an upcoming section, we will discuss communicating your hypothesis testing (including results) more formally - i.e. the text and visuals you will use to write your scientific reports and papers. In the current chapter, we will gather many of the pieces needed for that task.

Remember that testing your hypothesis means looking for evidence of an effect of each predictor on your response. To report your model, you want to communicate this evidence, including:

1.1 Reporting your best-specified model

You will report your best-specified model by reporting the terms (the predictors and any interactions) that are in your model, comparing that to the terms from your starting model.

1.2 Reporting the explained deviance

You will report how much variability (or deviance) in your response your best-specified model is able to explain, and the relative importance of each model term (each predictor and interaction) in explaining that variability (deviance).

1.3 Reporting your modelled effects

You will report your modelled effects - i.e. the effect of each predictor on your response. Remember that model effects are captured in your coefficients. This section will include:

  • how big the effect of a predictor is on your response,

  • how certain you are that there is an effect of the predictor on your response (is the effect bigger than zero?), and

  • (for categorical predictors), if the effects differ among levels of a categorical predictor.

2 Introducing the examples:

In this chapter, we will report model results for four example best-specified models that showcase a range of different hypotheses.

If you are re-reading this chapter with your own hypothesis you are testing, look for the example that is most similar to your own hypothesis.

The examples are:

Example 1: Resp1 ~ Cat1 + 1

Example 2: Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1

Example 3: Resp3 ~ Num4 + 1

Example 4: Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1

where

  • Resp# is your response variable.
  • Cat# indicates a categorical predictor
  • Num# indicates a numeric predictor

Note that each example is testing a different hypothesis by fitting a different model to a different data-set. The four examples are not related to one another.

Note also that the examples will all involve a normal error distribution assumption. We will point out (and give examples) when your process would be different for a different error distribution assumption (e.g. Gamma, poisson, and binomial)

Finally, note that we are not going through all the steps that got us to these example best-specified models, but assume we went through the previous steps (Responses, Predictors, Hypothesis, Starting model, Model validation and Hypothesis testing) as outlined in this Handbook to choose each best-specified model.

Example 1: Resp1 ~ Cat1 + 1

For example #1, assume your best-specified model says that your response variable (Resp1) is explained by a categorical predictor (Cat1):

Resp1 ~ Cat1

Your best-specified model for example #1 is in an object called bestMod1:

bestMod1

Call:  glm(formula = Resp1 ~ Cat1 + 1, family = gaussian(link = "identity"), 
    data = myDat1)

Coefficients:
(Intercept)      Cat1Sp2      Cat1Sp3  
     20.955        5.877       -4.510  

Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
Null Deviance:      7083 
Residual Deviance: 5210     AIC: 687.1

that was fit to data in myDat1:

str(myDat1)
'data.frame':   100 obs. of  2 variables:
 $ Cat1 : Factor w/ 3 levels "Sp1","Sp2","Sp3": 1 1 1 2 3 3 2 3 2 1 ...
 $ Resp1: num  14.2 11.2 32.9 26.4 20.6 10.9 29 13.6 25.2 29 ...

and the dredge() table you used to pick your bestMod1 is in dredgeOut1:

dredgeOut1
Global model call: glm(formula = Resp1 ~ Cat1 + 1, family = gaussian(link = "identity"), 
    data = myDat1)
---
Model selection table 
  (Intrc) Cat1 df   logLik  AICc delta weight
2   20.95    +  4 -339.547 687.5  0.00      1
1   21.79       2 -354.905 713.9 26.42      0
Models ranked by AICc(x) 

Example 2: Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1

For Example #2, assume your best-specified model says that your response variable (Resp2) is explained by two categorical predictors (Cat2 & Cat3) as well as the interaction between the predictors:

Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1

Your best-specified model for example #2 is in an object called bestMod2:

bestMod2

Call:  glm(formula = Resp2 ~ Cat2 + Cat3 + Cat2:Cat3 + 1, family = gaussian(link = "identity"), 
    data = myDat2)

Coefficients:
          (Intercept)              Cat2TypeB              Cat2TypeC  
               364.91                  37.37                 -75.94  
            Cat2TypeD             Cat3Treat2            Cat3Control  
              -141.97                 -43.63                  64.12  
 Cat2TypeB:Cat3Treat2   Cat2TypeC:Cat3Treat2   Cat2TypeD:Cat3Treat2  
                42.71                  73.61                  66.02  
Cat2TypeB:Cat3Control  Cat2TypeC:Cat3Control  Cat2TypeD:Cat3Control  
                81.25                 -19.08                 -71.72  

Degrees of Freedom: 99 Total (i.e. Null);  88 Residual
Null Deviance:      1249000 
Residual Deviance: 377000   AIC: 1133

that was fit to data in myDat2:

str(myDat2)
'data.frame':   100 obs. of  3 variables:
 $ Resp2: num  278 335 566 341 313 ...
 $ Cat2 : Factor w/ 4 levels "TypeA","TypeB",..: 3 2 2 3 4 1 1 3 4 1 ...
 $ Cat3 : Factor w/ 3 levels "Treat1","Treat2",..: 3 2 3 3 3 1 1 2 1 2 ...

and the dredge() table you used to pick your bestMod2 is in dredgeOut2:

dredgeOut2
Global model call: glm(formula = Resp2 ~ Cat2 + Cat3 + Cat2:Cat3, family = gaussian(link = "identity"), 
    data = myDat2)
---
Model selection table 
  (Int) Ct2 Ct3 Ct2:Ct3     R^2 df   logLik   AICc delta weight
8 364.9   +   +       + 0.69800 13 -553.634 1137.5  0.00  0.964
4 354.9   +   +         0.62540  7 -564.419 1144.1  6.55  0.036
2 388.5   +             0.55300  5 -573.243 1157.1 19.63  0.000
1 348.3                 0.00000  2 -613.508 1231.1 93.64  0.000
3 331.1       +         0.03933  4 -611.502 1231.4 93.92  0.000
Models ranked by AICc(x) 

Example 3: Resp3 ~ Num4 + 1

For Example #3, assume your best-specified model says that your response variable (Resp3) is explained by one numeric predictor (Num4):

Resp3 ~ Num4 + 1

Your best-specified model for example #3 is in an object called bestMod4:

bestMod3

Call:  glm(formula = Resp3 ~ Num4 + 1, family = gaussian(link = "identity"), 
    data = myDat3)

Coefficients:
(Intercept)         Num4  
     -226.9        260.1  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:      276100000 
Residual Deviance: 64930000     AIC: 1628

that was fit to data in myDat4:

str(myDat3)
'data.frame':   100 obs. of  2 variables:
 $ Resp3: num  3668 2774 4594 1094 2100 ...
 $ Num4 : num  13.93 12.15 15.4 4.87 6.64 ...

and the dredge() table you used to pick your bestMod3 is in dredgeOut3:

dredgeOut3
Global model call: glm(formula = Resp3 ~ Num4 + 1, family = gaussian(link = "identity"), 
    data = myDat3)
---
Model selection table 
  (Intrc)  Num4 df   logLik   AICc  delta weight
2  -226.9 260.1  3 -811.080 1628.4   0.00      1
1  2358.0        2 -883.457 1771.0 142.63      0
Models ranked by AICc(x) 

Example 4: Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1

For Example #4, assume your best-specified model says that your response variable (Resp4) is explained by one categorical predictor (Cat4) and one numeric predictor (Num6) as well as the interaction between the predictors:

Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1

Your best-specified model for example #3 is in an object called bestMod4:

bestMod4

Call:  glm(formula = Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1, family = gaussian(link = "identity"), 
    data = myDat4)

Coefficients:
   (Intercept)       Cat5Urban        Cat5Wild            Num6  Cat5Urban:Num6  
     98.346793       -0.236424       -0.867336       -0.002931        0.015117  
 Cat5Wild:Num6  
      0.030858  

Degrees of Freedom: 99 Total (i.e. Null);  94 Residual
Null Deviance:      4454 
Residual Deviance: 642.8    AIC: 483.9

that was fit to data in myDat4:

str(myDat4)
'data.frame':   100 obs. of  3 variables:
 $ Resp4: num  102 102 104 110 107 ...
 $ Cat5 : Factor w/ 3 levels "Farm","Urban",..: 1 2 2 3 2 1 3 2 3 2 ...
 $ Num6 : num  341 490 643 412 444 ...

and the dredge() table you used to pick your bestMod4 is in dredgeOut4:

dredgeOut4
Global model call: glm(formula = Resp4 ~ Cat5 + Num6 + Cat5:Num6 + 1, family = gaussian(link = "identity"), 
    data = myDat4)
---
Model selection table 
   (Int) Ct5       Nm6 Ct5:Nm6    R^2 df   logLik  AICc  delta weight
8  98.35   + -0.002931       + 0.8557  7 -234.930 485.1   0.00      1
4  92.25   +  0.009650         0.8166  5 -246.915 504.5  19.39      0
2  96.93   +                   0.7923  4 -253.133 514.7  29.61      0
3  94.92      0.017780         0.0848  3 -327.278 660.8 175.73      0
1 104.00                       0.0000  2 -331.708 667.5 182.46      0
Models ranked by AICc(x) 

Up next

In the next chapter, we will discuss how you can use your model to make predictions.

Data Skills Portfolio Program CC BY-NC-SA 4.0