8 min read

Introducing R Markdown

You are expected to prepare all homework assignments in R Markdown. These are files with extension .Rmd that are easily created, edited, and processed in R studio; the basics for doing so are described below. Basic proficiency in using R markdown is a specific objective of this course, to enhance your ability to fully integrate \({\bf\textsf{R}}\) into a workflow that includes interpreting, presenting, and sharing your results.

Resources

The web is full of support for R Markdown. Here are a few of the most important links:

R Markdown: What and why

R Markdown combines text and \({\bf\textsf{R}}\) script such that one can create documents that include tables and graphs generated from raw data chugged in \({\bf\textsf{R}}\). Tables are not typed in by hand or pasted from Excel; graphs are not saved elsewhere and inserted or pasted into the final document.

The central tenet of R Markdown is reproducibility: When the content is compiled from scratch, it is fairly easy to trace the steps taken to get a particular result. It helps keep track of what data, and when, went into each graph or table, rather than just having a folder of output saved with no description of what data went into them.

Likewise, once an R markdown file has been developed, it is very easy to update information simply by re-compiling the document. Say your advisor wants you to add or subtract a year from a multi-year dataset, or a journal wants you to convert your color graphs to greyscale. Make the necessary edits to the \({\bf\textsf{R}}\) script and re-compile. It really is as easy as that.

Generating an output file from one’s .Rmd file is a process called knitting, for which R studio provides a button and takes care of all the work behind the scenes for you.1 R Markdown supports a wide array of products from three file formats: .html, .docx, and .pdf. By default, R studio creates .html and .docx files. To create .pdf files, one must have the typesetting program \(\LaTeX\) installed. This is a level of technical proficiency beyond the scope of this course, but once students become comfortable with creating documents in .docx and .html files, I encourage them to take a swing at installing \(\LaTeX\) and knitting to .pdf. In this course, each term, at least one student has successfully transitioned to submitting homework as .pdf files created from R studio.

Templates add diversity to the appearance and functionality of R markdown products. This entire blog was created in R markdown. So was my wildland fire science website FireScienceDIY. .html files can be published for free to Rpubs; such products can either be simple ways to share preliminary graphs and tables, research briefs for stakeholders that summarize a scientific paper, or even an entire scientific paper shared in accordance with journal copyright restrictions. One can even create slideshows and make presentations without PowerPoint. And don’t forget .pdf files can be created from .Rmd files as well.

Getting started

Video on YouTube

This video on YouTube walks through the three ways you can get started with R Markdown.

Opening a new file

While there are generally two options, this class offers a third:

  • Use R studio’s template: File > New File > R markdown... Enter title, select Word. Save, edit, and knit.
    • Advantages: Straightforward and always opens and saves with the .Rmd file extension.
    • Disadvantages: The example script. It is helpful to get started, but once you become familiar with initiating .Rmd files removing all the default examples will be a pain.
  • Open a blank file and convert it to .Rmd: File > New File > Text file, then in the righthand corner of the editor pane, click the little Text File and select R Markdown from the menu that pops up. Paste or create a yaml header.
    • Advantages: By far the quickest and most straightforward way to get a fresh .Rmd file.
    • Disadvantages: For a beginner, pretty much every step in the description, right, especially if yaml sounds like gibberish to you? And even veterans can make a mistake on indentation, etc. in yaml headers; their spacing needs are quite specific.
  • The third option available to you in this class is to use the homework template. Download it, save it locally, open in R studio and simply rename for each assignment. The yaml header is all set up and a few example headings and code chunks are included, but its much more bare-bones than the template R studio pops up and thus easier to clean out.

Producing output

One really just needs to click the Knit button at the top of the editor pane, or File > Knit Document, and a Word document will pop up for you to save as-is, or edit as a Word document and either save as .docx or .pdf in Word.

If your \({\bf\textsf{R}}\) script has errors that prevent a chunk from completing, you’ll get an error in the R studio R markdown tab of the console. This will happen to you and it will be really frustrating and you’re going to learn how to handle such errors like a boss.

Components of an R markdown file

There are two necessary components: the yaml header and code chunks.

yaml header

At a minimum, any R markdown file must tell R studio what sort of file should be created (.html, .docx, or .pdf). This information is included right at the top of the .Rmd file, in a header that contains yaml script (YAML = Yet Another Markup Language2).

Thus the minimum yaml header would appear as such:

---
output: word_document
---

Here is the yaml header for the homework template:

---
title: "Assignment title"
author: "your name goes here"
date: '03 September 2020'
output: word_document
---

A lot of customization can be fed via the yaml header. Here’s the header for the write-up on my research on fracking in South Africa:

---
title: Local perceptions of hydraulic fracturing ahead of exploratory drilling in eastern South Africa 
author: |
  | Devan Allen McGranahan $*$ & Kevin P. Kirkman $**$ 
  |
  | $*$ North Dakota State University, Fargo, ND USA
  | $**$ University of KwaZulu-Natal, Pietermaritzburg
  |
date: |
  | Version date: 03 September 2020
output: 
   tufte::tufte_html:
    tufte_variant: "envisioned"
header-includes:
  - \renewcommand{\themarginnote}{\roman{footnote}}
bibliography: KZN_fracking.bib
link-citations: no
csl: superscripted.csl
---

Code chunks

For R studio et al.3 to know you want some \({\bf\textsf{R}}\) script chugged, you need to identify the script as such. This is what’s called a code chunk and absolutely must have the following setup:

```{r}

```
  • Both sets of triple backticks must be flush to the left. On most keyboards, the backtick is above Tab, to the left of 1.
  • The {r} is what identifies it as an \({\bf\textsf{R}}\) code chunk.

All other text outside of the code chunk appears in the final document just like these words here–as regular old text.

Note that some operands have different meanings inside vs. outside code chunks. For example, in a code chunk, # is a comment and will be ignored when the \({\bf\textsf{R}}\) script is chugged; outside of a code chunk, # is the Markdown shortcut for a top-level section heading,4 followed by ## and ### as sub-section and sub-subsection, respectively.

Important

If you need to use a function from a package in a code chunk, be sure to include script for loading the package in the R markdown file, even if you have it loaded in the session. R markdown files are like self-contained \({\bf\textsf{R}}\) sessions, so it only knows what you tell it.

Chunk options

There are a lot of options one can add to the code chunk to fine-tune its behavior and output. At this point, you really don’t need to wory about any of them. Eventually, you will need to know how to use at least two:

  • echo= specifies whether the script in the chunk will appear in the document (TRUE) or not (FALSE). For the first half of the course, always set echo = TRUE so we can see the script used to create a graph, etc. But you wouldn’t do this for your thesis or paper (yes, you can create both in R Markdown). Later on, we will set echo = FALSE and learn how to combine all the script together in an appendix at the end of the document.

  • eval= determines whether to chug (TRUE) or ignore (FALSE) the script. Use eval=FALSE when you want to demonstrate how to make a graph, say, but not produce output:

```{r eval = FALSE}
plot(1:10)
```

Two others that are handy include fig.height= and fig.width= for adjusting the size of your figures, which helps prevent axis labels from getting cut off, etc.


  1. Knitting refers to having to use the knitr package, which is an esoteric play-on-words based on the fact that before the open-source \({\bf\textsf{R}}\) there was a proprietary environment called S, and one would weave S script to make documents in a process still known in \({\bf\textsf{R}}\) as Sweave, so knitr is the new (and much more straightforward) Sweave. Get it?

  2. Markup languages convey information about what sort of formatting and style should be used to display plain text when it appears on a screen or in a document. HTML, or Hypertext Markup Language, is common for (basic) webpages. Markdown is itself a play on the markup language, in that the idea of Markdown is to simplify the formatting commands (thus, marking down, not up); R Markdown is just a special flavor of Markdown that works really well for \({\bf\textsf{R}}\).

  3. Behind-the-curtain applications like pandoc actually do the work to produce your Word doc or whatever; R studio just passes your .Rmd file off to them using system connections it established automatically upon installation.

  4. as in, <h1> in html and \section{} in \(\LaTeX\).