dim(TData)[1] 184 4
A first step to data analysis is exploring your data. Here you will have a first look at the data and identify any errors.
You can run through a series of checks to have a first look at your data. Here is TData as an example data set:
First thing to do when we read in data is to check that it was read correctly. One of the simplest things to do is to see if it exists in our workspace (this goes for any object, not only imported data). We can see what’s in our workspace with ls() and get a little more information with ls.str().
For example, dim() gives us the dimensions (number of rows and number of columns) of your data frame:
You can also get the number of elements in a vector with the length() function:
Typing the object name will have R trying to print the entire object to your screen, but there are some useful functions to get a look at our objects, without having to see everything.
Try head() and tail() to give you a look at the first and last (respectively) few rows of the data frame:
Station Year Month Temp
1 S27 2000 11 6.184429
2 HL2 2004 3 -1.155200
3 HL2 2000 2 1.524000
4 S27 2005 9 12.681000
5 HL2 2005 6 6.208400
6 S27 2003 8 11.737000
Station Year Month Temp
179 S27 2003 12 3.923000
180 HL2 2005 3 0.189400
181 HL2 2007 10 13.511505
182 S27 2002 6 5.227500
183 S27 2005 3 -1.014667
184 HL2 2007 11 9.963482
You can also get a quick look at your data by asking for the summary and/or structure.
The summary() function gives you a summary of each column:
Station Year Month Temp
Length:184 Min. :1999 Min. : 1.000 Min. :-1.551
Class :character 1st Qu.:2001 1st Qu.: 3.000 1st Qu.: 1.014
Mode :character Median :2003 Median : 6.000 Median : 5.167
Mean :2003 Mean : 6.457 Mean : 5.811
3rd Qu.:2005 3rd Qu.: 9.250 3rd Qu.:10.028
Max. :2007 Max. :12.000 Max. :16.472
The str() function describes the data types in each column:
An element is one data point. You can view an element by using the element’s position in the object. You do this using the object name and square brackets [ ].
Since a vector is a one-dimensional object (see also the section on Importing), you will only need to give the position of the element.
For example, to get the 3rd element of vector V3:
[1] 3.20 0.90 2.34 5.40
you would use:
Since a data frame is a 2-dimensional object, you need to give two positions representing the element’s row and column. You do this again with the square brackets [ ] as well as a comma , separating the row and column numbers.
For example, with the data frame DF1:
The Abundance value 24 is in row #2 and column #3, so you access it with:
Note that you can also access the whole 2nd row with:
and the whole 3rd column with:
Note that in data frames, you can also indicate the column by the column name. You do this using the $ symbol. You can also get the 3rd (Abundance) column with:
and the abundance value 24 is located with
Sometimes you do not know the location (row and column) of the data you want to find. Instead you have a condition for the data you want, e.g. which observations include plant height greater than 10cm?
Let’s look at some example data to answer this question:
PlotID PlantHeight
1 F 8.3
2 F 6.7
3 B 10.2
4 D 8.2
5 D 11.4
6 F 6.8
7 F 10.7
8 C 10.4
9 F 9.8
10 F 11.2
There are two ways of finding the obcervations with plant heights greater than 10cm. The first is to locate the row numbers where the condition is true (plant heights are greater than 10cm). You can do this with the which() function.
inds <- which(myDat$PlantHeight > 10) # the row numbers that fulfill the condition
myDat[inds,] # rows meeting the condition PlotID PlantHeight
3 B 10.2
5 D 11.4
7 F 10.7
8 C 10.4
10 F 11.2
Another way to find observations with plant heights greater than 10cm is to use the subset() function. This subsets the data directly to only give you the observations meeting the condition.
Useful Logical Functions (operators):
These functions will help you make conditional statements. Note that these functions do not look like “regular” functions in R. They are some times called operators instead.
| Operator | Description |
|---|---|
| > | greater than |
| >= | greater than or equal to |
| < | less than |
| <= | less than or equal to |
| == | exactly equal to |
| != | not equal to |
Note that you can include more than one condition by joining them with either & (representing AND) or | (representing OR).
For example, you might want to find observations from plant heights greater than 10 cm and only from Plot F. This can be done with:
Another example, you might want to find observations with large plants (greater than 10 cm) and small plants (less than 7 cm). You can do this by finding observations of either one OR the other condition:
PlotID PlantHeight
2 F 6.7
3 B 10.2
5 D 11.4
6 F 6.8
7 F 10.7
8 C 10.4
10 F 11.2
Another helpful function is %in%. This function checks for membership. For example, I might want to know which observations are in Plot B, D or C:
Note that this would be the same as using multiple OR (|) operators.