Data Management Handbook

The DSP Program Handbook for Data Management

A note on tools:

In general we will be encouraging you to do all data management skills using a scripted programming language like R. See more here here for more discussion about this, but in short, a scripted program language will allow you to:

  • keep your management separate from the data (and thereby safeguard the data),

  • keep a record of all your management steps (and thereby more easily find errors, make your methods reproducible and transparent), and

  • allow you to use powerful tools to accomplish your management tasks.

That said, there may be times where you need to do management in a spreadsheet editor, like Excel. You will find the ideas presented below are still relevant. And if you do need to analyse data in a program like Excel, remember to:

  • keep your management separate from your data (e.g. in a separate sheet), and

  • keep a record of all your management steps. This is done by:

    • carefully and completely adding comments explaining your steps,

    • and ensuring each cell contains only one step/piece of information.

A note on learning your first programming language:

A bit on strategy when learning any programming language (we’ll also cover more strategies in class): it’s important to struggle but not for too long. Learning programming means learning computational thinking, or the logic behind breaking a problem down for a computer to solve. Struggling helps us learn this logic (ensuring we truly see the patterns in the code) but struggling too long can be an energy- and time-waster and may make us lose motivation for the process. The most successful path forward is a middle way: Read through this document, try to reproduce the examples and try the exercises, but if you’ve been staring at a problem for hours, it’s time to ask for help. Ask google, ask another R user, ask me, and if you don’t understand the answers you are given, ask again.

Don’t worry about memorizing the details of this document or our discussions in class. You will always have reference material available to you (e.g. this document, the class notes, R’s help files, the internet). You can let memorization happen organically: Depending on your individual research adventures, you will use some of these tools more often than others and they will likely become committed to memory. Other tools will prove less useful to you. Memorizing this latter group would be a waste of time.

The basics

TBA_Here we will go over essential topics when you’re first learning a programming language.. Here we cover:

  • installing R

  • scripts

  • basic syntax

  • getting help in R (how to read help files!)

  • objects and data structures

  • etc.

So you want to…

Here we will provide examples of how to accomplish common data management tasks.

TBA_… load your data into R

TBA_… validate and explore your data

TBA_… visualize your data (make a plot!)

… deal with missing values

TBA_… sort your data

TBA_… subset your data

… work with dates and times

TBA_… manipulate rows or columns in your data

TBA_… manipulate subsets of rows or columns

… merge your data with another data set

TBA_… control the flow of your program

TBA_…export your data back out of R

TBA_Best practices

Copyright 2025, DSP Taskforce