Objectives
Introduce the following concepts:
- Object-orientedness
- Vectors
- Functions
Working through the script
Object-oriented
\({\bf\textsf{R}}\) is “object-oriented”, which means that character strings can be used to represent values. We have two options when writing script to define objects:
<-
assigns the operation on the right to the named object on the left.=
will do the same, and is shorter.
For programming purposes I like the idea of moving one side to the other, especially when the right side has many entries or is large.
I tend to reserve =
for defining variables or long file paths in my script, and use <-
when creating data objects or storing statistical results.
Note that \({\bf\textsf{R}}\) creates objects on the fly and does not need them to be defined at the beginning of the script or session as in C++
.
answer <- 2+(2*20)
Note that you should now see answer
in the global environment pane of R studio.
Calling the object will print its content in the console:
answer
## [1] 42
This object can now be used for additional operations…
answer*2
## [1] 84
…and the creation of new objects:
new.answer <- answer*2
new.answer
## [1] 84
Functions
Functions are a special type of \({\bf\textsf{R}}\) object that instead of containing data, contain a series of operations. Functions are essentially shortcuts for common sets of operations.
For example, researchers often want to find the mean of data. Say we have the following five observations:
24
, 13
, 12
, 22
, and 15
The arithmetic mean is defined as the sum of the observations divided by the number of observations, which in \({\bf\textsf{R}}\) looks like:
(24 + 13 + 12 + 22 + 15) / 5
## [1] 17.2
Alternatively, we can assign the data to an object using the c
function, which stands for concatenate.
It joins everything between the parentheses, separated by commas, into a vector that we’ll call data
:
data <- c(24, 13, 12, 22, 15)
data
## [1] 24 13 12 22 15
To find the mean of data
, one might first think we can simply divide the object by 5…
data / 5
## [1] 4.8 2.6 2.4 4.4 3.0
…but this is obviously incorrect. Here, \({\bf\textsf{R}}\) has applied the “divide by five” operation to each value in the vector. This is an example of how \({\bf\textsf{R}}\) is vectorized: it is designed to perform its operations along vectors. Although it will be awhile before you feed \({\bf\textsf{R}}\) large enough datasets to notice the difference, vectorization optimizes performance and makes \({\bf\textsf{R}}\) computations quick.
Calculating the mean is a two-step process, and we need to define both.
Thus, we must first find the sum of the data, for which we can use the shortcut function sum
:
sum(data)
## [1] 86
Then we divide the sum by 5 to calculate the mean:
sum(data) / 5
## [1] 17.2
This is an example of hard-coding: we’ve specified the divisor in this operation as a fixed value (5). But what if the value varies – say your technician (definitely not you!) inadvertently lost or failed to enter some data, and a given set of replicates do not have the number of observations you expect? Hard-coding your count creates problems:
data2 <- c(24, 13, 12, 22)
sum(data2) / 5
## [1] 14.2
The calculated mean is too low, because our hard-coded operation divided the sum of four observations by five.
It is preferable to have \({\bf\textsf{R}}\) determine the count for each operation, so if counts differ, \({\bf\textsf{R}}\) can automatically account for it.
We can use the length
function to determine how many observations are in the set:
length(data)
## [1] 5
If length sounds odd, remember data
is a vector comprised of individual values.
The number of entries determines how long the vector is, and so length is a convenient way to count the number of observations.
This is a core concept in \({\bf\textsf{R}}\) that we will return to frequently.
Let’s see how this combination of functions performs:
sum(data) / length(data)
## [1] 17.2
length(data2)
## [1] 4
sum(data2) / length(data2)
## [1] 17.75
Of course, calculating the mean of a vector is a very common operation, and \({\bf\textsf{R}}\) has a built-in function that combines the sum
, length
, and /
operations into one shortcut:
mean(data)
## [1] 17.2
Custom functions
\({\bf\textsf{R}}\) has a lot of functions built in, and thousands of packages supply additional functions. But one often still encounters a situation where one’s life–or at least one’s script–is made more simple with a custom function.
Writing your own functions is easy.
They are a special type of object in \({\bf\textsf{R}}\) that can be defined and added to the global environment.
The function()
function helps create them: one simply assigns arguments between the ( )
and specifies the operation between curly brackets { }
.
Even though \({\bf\textsf{R}}\) already has mean()
, let’s make our own alternative, called Meaner()
:
Meaner <- function(x) { sum(x) / length(x) }
We can call it without any arguments to see what is stored in the object:
Meaner
## function(x) { sum(x) / length(x) }
Then we can call it on our data:
Meaner(data)
## [1] 17.2
Our custom Meaner()
function performs the same as the base mean()
.
Let’s make it truly custom, and add a little excitement to the operation:
Meaner <- function(x) {
m = sum(x) / length(x)
m1 = paste(m, "!", sep="")
return(m1)
}
Meaner(data)
## [1] "17.2!"
Notice how the function created two objects, m
and m1
, that were not added to the global environment but instead only existed while the operation was running on your computer’s processor.
These two objects existed only temporarily during the calculation; return()
specified what should be returned back to \({\bf\textsf{R}}\) when the operation was complete.
Comments
Note that anything preceded by
#
is ignored by \({\bf\textsf{R}}\). We call it a comment operator, and it is useful for adding explanation to the script.At a very basic level \({\bf\textsf{R}}\) is a fancy calculator. It will chug arithmetic operations:
\({\bf\textsf{R}}\) follows proper order of operations, including parentheses: