Data Management Handbook
The DSP Program Handbook for Data Management
A note on tools:
In general we will be encouraging you to do all data management skills using a scripted programming language like R. See more here here for more discussion about this, but in short, a scripted program language will allow you to:
keep your management separate from the data (and thereby safeguard the data),
keep a record of all your management steps (and thereby more easily find errors, make your methods reproducible and transparent), and
allow you to use powerful tools to accomplish your management tasks.
That said, there may be times where you need to do management in a spreadsheet editor, like Excel. You will find the ideas presented below are still relevant. And if you do need to analyse data in a program like Excel, remember to:
keep your management separate from your data (e.g. in a separate sheet), and
keep a record of all your management steps. This is done by:
carefully and completely adding comments explaining your steps,
and ensuring each cell contains only one step/piece of information.
A note on learning your first programming language:
A bit on strategy when learning any programming language (we’ll also cover more strategies in class): it’s important to struggle but not for too long. Learning programming means learning computational thinking, or the logic behind breaking a problem down for a computer to solve. Struggling helps us learn this logic (ensuring we truly see the patterns in the code) but struggling too long can be an energy- and time-waster and may make us lose motivation for the process. The most successful path forward is a middle way: Read through this document, try to reproduce the examples and try the exercises, but if you’ve been staring at a problem for hours, it’s time to ask for help. Ask google, ask another R user, ask me, and if you don’t understand the answers you are given, ask again.
Don’t worry about memorizing the details of this document or our discussions in class. You will always have reference material available to you (e.g. this document, the class notes, R’s help files, the internet). You can let memorization happen organically: Depending on your individual research adventures, you will use some of these tools more often than others and they will likely become committed to memory. Other tools will prove less useful to you. Memorizing this latter group would be a waste of time.
So you want to…
Here we will provide examples of how to accomplish data management tasks common to biology. These are meant to quickly connect you with programming strategies and code to handle your tasks as a biologist (and beyond).