5 Pipes

If we want to run several functions in turn, for example, if we wanted to find the mean bill length for each penguin species with group_by and summarise (you will find out more about these functions in Chapter 7). You could nest one function inside the other.

summarise(group_by(penguins, species), mean = mean(bill_length_mm, na.rm = TRUE))
## # A tibble: 3 × 2
##   species    mean
##   <fct>     <dbl>
## 1 Adelie     38.8
## 2 Chinstrap  48.8
## 3 Gentoo     47.5

With more larger problems using more function, this solution becomes almost impossible to read and it is very easy to make mistakes and forget which brackets belong to which function.

Another strategy is to make and use intermediate objects

penguins_grouped <- group_by(penguins, species)
summarise(penguins_grouped, mean = mean(bill_length_mm, na.rm = TRUE))
## # A tibble: 3 × 2
##   species    mean
##   <fct>     <dbl>
## 1 Adelie     38.8
## 2 Chinstrap  48.8
## 3 Gentoo     47.5

This works better, but can generate many intermediates and it can be difficult to ensure that the correct one is used.

A popular alternative is to use pipes. The R code for a pipe is |> Pipes pass the results from one function directly into the next function.

penguins |> 
  group_by(species) |>
  summarise(mean = mean(bill_length_mm, na.rm = TRUE))
## # A tibble: 3 × 2
##   species    mean
##   <fct>     <dbl>
## 1 Adelie     38.8
## 2 Chinstrap  48.8
## 3 Gentoo     47.5

You never need to use pipes, but they can make code more readable.

The old pipe %>%

The |> pipe was introduced in R version 4.1. Previously, the magrittr package pipe %>% was widely used, especially with tidyverse functions. You will see the %>% in a lot of code on stackoverflow and other help sites. In most cases the old and new pipes work in exactly the same way. Advantages of the |> pipe are that it is

  • slightly faster
  • doesn’t need any packages loading
  • easier to debug

5.1 A recipe for mashed potato

The pipe can be read as “and then”. This recipe for mashed potato can be read as buy 1kg potatoes, and then peel them, and then boil them, and so on.

buy("potatoes", kg = "1") |> 
  peel() |> 
  boil(minutes = "15") |> 
  drain() |> 
  mash(add = list("salt", "milk", "butter")) |> 
  serve(decorate = "parsley")

5.2 The native R pipe |>

The pipe passes result of code on left of pipe to the function on right, and puts it in the first available argument.

So

f <- "file.csv"
read_csv(file = f)

and be rewritten as

f |>
 read_csv()

If you want to put the object passed through the pipe into the second argument, you need to name the first, so that it is not available. So if we want to pipe penguins into lm to fit a linear model, we need penguins to be put into the data argument, which is the second argument of lm. We can force this by naming the formula argument, so that data is the first available argument.

# un-named argument = fails
penguins |>
  lm(bill_length ~ species)

# named first argument, penguins pipes into second argument
penguins |>
  lm(formula = bill_length ~ species)

More complex arrangements, for example piping the same object to two separate commands can be done by writing a function. You probably won’t have to do this very often.

5.3 Making a pipe

You can make a pipe either by typing it directly, or by using the RStudio keyboard shortcut Ctrl + Shift + m (on a Mac, cmd + Shift + m). You may need to set the RStudio options. Go to Tools > Global Options > Code and tick Use native pipe operator, |>. To make your code readable, put a line break after each pipe.

Contributors

Richard J. Telford