Chapter 8 Lost the plot? Some more data visualisations - Week 9 - Workshop 2

This is our second workshop hopefully you are gaining some confidence with how the underlying functionality of R works. For this workshop we will be building plots to explore possible relationships and differences within the data. Don’t forget to ask for help if you are stuck on any of the material covered so far.

8.1 Looking at differences

As we discussed in the Chapter 4 wing case length is a good proxy for overall beetle size in Tribolium. So we will be comparing beetle wing case length in male and female beetles. Because we are looking for possible differences we know that we probably want to build a box plot. Login to your posit Cloud account and copy over the following chunk of code into your tribolium_parasites.r script. Run it and see what happens.

# Making a box plot and saving it in an object
# Call the ggplot function and direct it to your data and define your x axis and y axis
size_differences_by_sex_boxplot <- ggplot(data = tribolium_combo,aes(x = sex, y wing_case_size)) + 
  geom_boxplot() # Tell ggplot that you want it to build a boxplot
print(size_differences_by_sex_boxplot) # Print your new figure

Oh dear… Looks like there may be something wrong with the script…

geom_boxplot: outlier.colour = NULL, outlier.fill = NULL, outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE, na.rm = FALSE, orientation = NA
stat_boxplot: na.rm = FALSE, orientation = NA
position_dodge2 

When you get errors like this it generally means that ggplot is missing something essential from the script. There is a typo in the original script, can you spot it and fix it? Once you have made the figure, take a look at it, do you think that there is a difference in size between male and female beetles? How would you interpret this figure? Make sure you save your box plot to your figures folder, you may need to adjust the sizing variables using the some of the arguments we learned in Chapter 7.3.

# Saving outputs
ggsave("figures/size_differences_by_sex_boxplot.pdf", # Give R a path to save to and a file name
       plot = size_differences_by_sex_boxplot, # Tell R what to save - in this case your object
       device = "pdf") # Tell R what file type to create, in this case a pdf

Our other categorical variable is treatment. Using the chunk of code within your script try to write a new chunk to compare the differences between wing_case_size and treatment in our sample of Tribolium.

Oh dear… R still isn’t happy… You have probably been presented with the following message

Warning message:
Continuous x aesthetic -- did you forget aes(group=...)? 

This essentially means that R is expecting a categorical variable (e.g. blue, red, green or as our example is 30 or 35 degrees centigrade) but what it sees is a continuous variable (e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9). This has come about because although our treatment is categorical it is expressed as a numeric and has confused R. Use the glimpse() command to see how R has interpreted the treatment variable. This should confirm that R sees treatment as a double (dbl) vector. We missed this when we were checking our data in Chapter 4, but thankfully it is an easy fix. We want to tell R to treat the variable treatment as a factor (categorical data) and not as continuous data. Try adding this to your script;

# Instruct R to treat the treatment variable as a factor 
tribolium_combo$treatment <- as.factor(tribolium_combo$treatment) # Note this will edit your data frame

Now run glimpse() again. Can you spot the difference?

Try to make your box plot for wing_case_size and treatment again. What does this plot tell you about the data set? Make sure you save it to your figures folder.

It would be really useful to show both sex and treatment on the x axis so that you can compare between both categorical variables and size. We can do this with a fairly simple addition to our chunk. Add the expression fill = sex to the aes function of your last chunk. This tells R to colour your box plot by sex. You should therefore have 4 boxes. Save this figure under a different name in your figures folder.

You should have five figures (2 histogram and 3 box plots) saved there now. Look over your box plots, can you see how your interpretation of the data might change depending on which plot you look at? Which plot gives you the most complete picture?

We will discus this further in lectures. But for now lets have a look at visualising a relationship.

8.2 Looking at relationships

To look for possible relationships we need two continuous variables. In our data set we have wing_case_size and parasite_burden. There are several theories in the field of parasitology that suggest larger organisms tend to harbor more parasites. We can see if this is a relationship that persists in our Tribolium. Copy, complete and run the following chunk. You may notice that x and y axis variables have been left blank, which variable do you think belongs on which axis?

# Making a scatter plot and saving it in an object
# Call the ggplot function and direct it to your data and define your x axis and y axis
size_vs_parasites_point <- ggplot(data = tribolium_combo,aes(x = , y = )) + 
  geom_point() # geom_point is ggplots scatter plot
print(size_vs_parasites_point)

What do you think this plot is showing you? Try adding another function to this chunk, add geom_smooth(method="lm"). This will add a simple regression line to your plot, I suggest you add a suitable comment to that effect to your chunk. Does this new addition to your plot support any initial observations you may have made from the plot? Make sure you save your plot in your figures folder.

Well done, you have started building two very commonly used plots using R. These plots are functionally sound, but lets be honest, they’re not pretty. A plot should always be pretty. Next week we will start learning how to make plots visually pleasing as well as functional.

8.3 Before you leave!

You should have two histograms from last week (Chapter 7) along with three boxplots (Chapter 8.1) and one scatter plot (Chapter 8.2) from today saved in your figures folder. Check that they are saved and look as you would expect. Make sure to log out of posit Cloud and make sure you save your script!

8.4 References

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.