Statistical Modelling Handbook
Why statistics?
All science is explaining variability - explaining why observations are changing in time and/or space. These explanations1 lead us to mechanistic understandings of why the world is as we observe it.
As biologists, the variability that you are interested in relates to the biological world, but your job is really no different from other scientists/researchers/data analysts, etc. - you are all explaining variability.
You need your explanations of variability to be quantitative in order to:
- communicate how certain you are with your explanation,
- communicate how much variability still remains unexplained, and
- make useful predictions about the biological world.
Statistics provides the mathematical tools2 to accomplish these tasks. Statistics help you determine the evidence for causal3 mechanisms. And statistics help you make useful predictions4 about how a biological system might behave at a different time or location.
Statistics help you answer:
- can you explain the variability that you are seeing?
- given your hypothesis, how much variation can you explain?
- given your hypothesis, what would you predict to observe?
Why statistical modelling?
You can quantify how much variability you can explain with your research hypothesis through statistical modelling. Your statistical model represents your research hypothesis in a mathematical structure. This mathematical structure can be tested against your data to determine what evidence there is for your hypothesis:
- can I explain the variability that I am seeing? (Can I reject my hypothesis?)
- given my hypothesis, how much variation in the observations can I explain?
- given my hypothesis, what would I expect (predict) to observe under different times or locations?
Your job then is to explain observation variability in time and space by creating a “model” of what (you think) is going on - hence statistical modelling.
It is important to remember that any model is only an approximation of what is going on in the real world. As many have said before
All models are wrong but some are useful.
We will discuss how you can build useful models that you can use to test your hypotheses.
Introducing a statistical modelling framework

The focus of this section of the handbook is statistics that can be applied to your work as a biologist. For that reason, the motivation for what we are going to do together comes directly from your biological research hypotheses. As a biologist, you have a research hypothesis that you want to test. The method presented in this handbook will help you test it. You will learn how to move from biological theory to hypothesis to the statistical modelling process you will use to test your hypothesis.
You will learn this process of statistical modelling by walking through a “Statistical Modelling Framework”. This is a set of steps that you can use to go from your research hypothesis to designing a model, testing your hypothesis and communicating the results.
This handbook will walk through the parts of this framework one by one. During DSP modules throughout your degree, you will do the same in class while you practice applying the framework to case studies. In this way, you should see how the framework is generally applicable but also flexible. You will be able to apply the framework to help you in your statistical analysis in your courses, your thesis, and your future career - in every case, your process will take its starting point and focus from the hypothesis that is motivating you.
As we will discuss later in the handbook, working through this framework will also guide you in creating the “guts” of a paper or report. As well as clarifying how and why you made your analysis choices, it will guide you in describing your motivation behind your research question (why is it worthwhile to spend time explaining this variation?) as well as the mechanisms behind your research hypothesis (why do I think X is responsible for the variation I’m trying to explain?). Once through, you’ll have a solid draft that can be the basis of a report, thesis chapter or scientific paper. We will talk about how this works in the section on Communicating.
A note about our Statistical Modelling Framework: The steps in the image to the right as a linear process but it is not actually a linear process. As you will see in the examples, sometimes you will need to make best guesses5 as to what model might be useful and only after confronting the model with your data will you know if your guesses were reasonable (and useful!) - i.e. the model is a valid one that you can use to test your hypothesis. We will talk about how to find a useful model, how to choose when there are multiple options, and how to communicate your choices.
Steps in the statistical modelling framework:
Response(s)
Here you will define your research question by identifying your response variable(s).
What variability are you trying to explain?
And why is it worth explaining? (your motivation)
Note: we’ll begin by discussing how to model hypotheses with just one response variable before discussing multiple response variable(s).
Predictor(s)
Here you will choose your predictor variables.
what could explain the variability in your response?
what are the possible mechanisms behind your argument?
Hypothesis
Here you will see how your response and predictor variables come together to define your research hypothesis. And we will discuss how to write this hypothesis to begin building your statistical modelling.
Starting model
Here you will choose and fit the starting model that will be used to test your hypothesis. You will do this by choosing and communicating two key assumptions that will help you pick a useful modelling starting point. Then you will fit your model to your data (i.e. confronting your model with your data).
Model validation
Here you will investigate whether your model will be a useful one to test your hypothesis. Your steps here will include considering if you have correlated predictors or problems with observation dependence.
You will also considering if your starting model assumptions were realistic. After this step, you will have a model that you can confidently use to test your hypothesis.
Hypothesis testing
Here you will test your hypothesis by assessing the evidence supporting your model. We will discuss a number of different methods to do this, but will focus on the model selection method as a robust way to evaluate what your model is telling you about your hypothesis.
Reporting
Here you will report the results of your hypothesis testing.
You will report:
your best-specified model identified in the hypothesis testing
the effects (patterns) described by your model (including visualizing your model effects)
how well your model explains variability in your response.
Predicting
Here you will use your model to make predictions of your response under different conditions (while considering prediction limits).
Where we will begin: generalized linear models (GLMs)
To begin with, we will be discussing generalized linear models (GLMs) as models that can be useful to test many different hypotheses. Also, understanding how GLMs can be used to test your hypothesis will help you understand other, including more advanced, statistical models (Pongpipat_et_al_PracticalExtensionStatisticsForPsychology).
Beyond GLMs
Remember: the model you choose is just an approximation of the real world. This means that often times alternative models would be possible (models like t-tests, ANOVAs, ANCOVAs, etc.6). In fact, you may be collaborating with someone who wants to model your hypothesis with a different method. In this handbook, we compare GLMs to alternative model types here. And remember: the steps in the statistical modelling framework are generally applicable. Regardless of the method you apply, you need to ground your choices in good biological and statistical theory.
The examples
Here we’ve gathered TBA_examples following our statistical modelling framework structure. You can request/contribute new examples here
Communicating your results: From statistical modelling to scientific report writing
Here you can see how you can use the steps in the Statistical Modelling Framework to outline your communication of your hypothesis testing in reports and paper.
Copyright 2025, DSP Taskforce
Footnotes
here, we can also use the term “research hypothesis” instead of explanation and you’ll see we quickly switch to using this term↩︎
Math skills have often been underemphasized in many Biology educations. This has not been helpful or necessary. Biologists need math and are very capable at applying math to solve problems, but the math needs to be useful; that is, math that you can apply to your own needs as you research biology. The DSP Program aims at providing statistical modelling tools that will be useful to you as a biologist. It is a happy coincidence that the same skills can be applied to a lot of other situations as well. The programming and statistics skills you are learning here will be useful to you in the future - in your future courses, thesis-writing and a wide range of careers.↩︎
more on causal vs. correlative explanations coming soon↩︎
useful predictions will always include uncertainty↩︎
and these will be educated guesses!↩︎
Don’t worry if these terms don’t mean anything to you yet - more to come!↩︎