Chapter 3 Objects, Functions and Packages - Week 3A
You should aim to complete this chapter in Week 3, Semester 1
3.1 Creating objects
As we saw in Chapter 2, R can be used to perform calculations. However, we can use R to perform tasks that are much more complex. In order to do this we are going to have to learn about objects.
Log in to your posit Cloud account and go into your testing_R
project. Create a new R script and save it under the name objects
. You should see it appear as object.R
under the Files tab in the bottom right panel. Run the following command using your script.
<- 50 A
What you have essentially done here, is told R to create an object called A
and to store the number 50
inside it using the syntax <-
. After running the command you should see your object A
and 50
pop up under Environment and Values in the top right hand panel, so you can see that 50
has been stored in A
.
Lets create some more objects
<- 6 B
Now see what happens if you run the following command
* b A
Oh dear… we have an error…
> A * b
Error: object 'b' not found
This is because R has absolutely no flexibility with typos. It is looking for an object called b
when there is no object called b
in your environment. There is an object called B
, but to R, B
and b
are completely different things. Try running this instead.
* B A
Hopefully you should now have an output that looks like this
> A * B
[1] 300
This is a very simple example. But lets try to demonstrate why saving values within objects may be useful. Lets say you are interested in looking at the number of students who get freshers flu in Week 1 of teaching at UEA. You have a class size of 200 students and 15 of them report that they have contracted freshers flu. Use the following script to calculate the percentage of students with freshers flu.
# Create two objects, one for your total class and one for those that have flu
# Notice that neither of my object names contain spaces
# If you need a space always use an underscore or full stop
<- 200
total_class <- 15
with_flu
# Calculate the percentage of students with flu in week 1
<- (with_flu/total_class)*100 percentage_with_flu
If you look in your Environment in the top right panel you should see the percentage of students with flu calculated and stored under percentage_with_flu
. You can use the following command to see it printed in the console.
print(percentage_with_flu)
Some students have been late in reporting their symptoms. A week later you hear that 7
more students also had freshers flu in Week 1. If you had been typing into the console and not using objects you would have to type this script out all over again. But you don’t need to. You just need to change your entry for with_flu
from 15
to 22
. Do that now and re-run the script. By editing the 15
, R will overwrite the object with_flu
to represent the new value of 22
. If you run the last line of code then the percentage of students with flu will also update.
This may seem like a very simple example, and it is, but we are still building up your knowledge of R so you will have to take my word for it that objects will be very helpful to you as we progress :). You will also see, that objects can contain a range of different types of data. We will cover some of these in lectures, but for now lets leave objects and have a think about functions and packages.
3.2 What are functions and packages?
If you have the coding skills it is possible to do pretty much whatever you like with your data in R. However, why reinvent the wheel trying to write your own complex scripts, when a lot of very clever coders have already written lots of functional bits of code, known as functions, for general use. Functions can be thought of as a piece of code that is designed to perform a set task. R comes with lots of functions already built in, but there are also lots of additional functions that are stored in packages.
Packages are containers that can hold sets of functions or data and as the course progresses you will use a range of packages that contain useful functions and data. For example the data visualisation package (which you will become very familiar with later on) ggplot2
contains ranges of functions which allow you to define how your plot or graph will look, for example geom_bar
is a function within the ggplot2
package that contains the instructions required to build a bar chart.
3.2.1 Using functions
We actually have already had some exposure to functions. You used the print function earlier when you asked R to print out the value contained within the percentage_with_flu
object. But lets have a go at using another function.
# First of all we need some data
# Notice the syntax, by using c() I have told R to prepare for a list of values
# Each value is separated by a comma
<- c(2,4,6,8,10,12,14,16,18,20)
data
# Now I want to know what the total value of data is if I add all the values stored within it together
<- sum(data) total
In the above piece of code we have created some data, as a list of values, and stored them under the object name data. We have then used the function sum()
. We have essentially told R that we wish it to perform the function sum()
on everything contained within the object data
and store it in another object called total
. You can see the value within total
in your environment, can you work out how to get R to print the value of total
out?
If you ever need help working out how to use a function you can use another function help()
. Try inputing the code below into your console.
help(sum)
This should bring up a help file in your bottom right hand panel that describes and explains the use of the function sum()
.
3.3 Installing packages
The sum
function is part of the base R package that comes with any R installation. However there are lots of other useful functions that are held in additional packages. In order to use these functions, you must have the necessary packages installed in your work space. I already mentioned that we would be using a package called ggplot2
, lets try installing and loading it in your work space.
With any package you wish to use in posit Cloud you will need to go through a two step process, first of all you need to install the package using the install.packages()
function (see below for usage) and then you will need to load it using the library()
function. We only need to install and load a package once per posit Cloud project but I would generally keep the commands for installing and loading packages at the top of the script I’m working on.
Try copying and running the following lines in your script;
install.packages("ggplot2") # Install the ggplot package
library(ggplot2) # Load the ggplot package
Now we can try to make a simple plot using some functions from the ggplot2
package and using an example data set called iris
that is stored in the base R datasets
package.
Copy the following command into your script and run it
# First load the base R datasets in your workspace
library("datasets")
# Then call the ggplot function and tell it that the data set you wish to use is called iris
# We then need to tell ggplot how to map our variables
# The aes function is an aesthetic tool that allows us to define variables for the x and y axes
# We then use the geom_point function to instruct R to built a scatter plot
# A second use of the aes function tells R to color points by species
ggplot(data = iris,aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(colour=Species))
Those sharped eyed among you may have noticed that the short script defining our plot goes over two lines and contains a +
and an indentation on the second line. The +
and indentation is essentially allowing you to pipe one line of code into the next. This allows you to add more detailed instructions on how you would like your graph to be built and linking together multiple functions. Don’t worry if this is a lot to take in at this stage, as we progress with ggplot2
we will revisit this in more detail.
3.4 What do I do if I get an error message?
Error messages are almost guaranteed when working regularly with R, they are super common, don’t panic when you eventually run into one. In fact you may remember that we already encountered one earlier in this chapter, that particular error was caused by a typo. Generally the most common causes of error messages are;
- A typo (you can reduce risk of this by consistently labeling your files and objects)
- A mistake in syntax (missing or un-closed;
()
,""
, or missing,
) - Missing dependencies (maybe you didnt install or load a package correctly, or maybe if your running a longer script, you are trying to call an object that you haven’t created yet)
This is by no means an exhaustive list but when I run into errors its normally because of one of the above. The trick is don’t panic and read the error message carefully, it often tells you what is wrong. Then look over your script slowly to troubleshoot. And if you are genuinely stuck and the error doesn’t make sense, run it through Google, I can almost guarantee you wont have been the first person to have that error message.
Well done everyone. I hope you have enjoyed this weeks session. Next week is our first workshop and we will start to use some of these new found skills in tandem and start to look at using them on some new data.
3.5 References
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.