16 Using functions and packages
16.1 Functions
Everything that does something in R
is a function.
For most functions the function name is followed by brackets. Within the brackets are zero or more arguments separated by commas.
Missing commas out is a common mistake and will give an error.
rnorm(n = 10, mean = 5 sd = 1)
## Error: <text>:1:24: unexpected symbol
## 1: rnorm(n = 10, mean = 5 sd
## ^
To get help on a function, use a ?
.
?rnorm
The examples at the bottom of the help file can be useful to understand how to use the function.
You can run the examples either by copying and pasting them, or using example()
.
example(rnorm)
16.1.1 infix functions
There is a special type of function called an infix function that does not use brackets but is placed between two objects.
For example, in
5 < 3
## [1] FALSE
<
is the infix function that tests if the first number is smaller than the second and returns a TRUE or FALSE.
Some other infix functions are
5 > 3 # Greater than
## [1] TRUE
## [1] TRUE FALSE
7 %% 4 # modulus (finds the remainder)
## [1] 3
7 %/% 4 # integer division
## [1] 1
To get help of an infix function surround it with backticks.
?`%in%`
16.2 Packages
All functions arranged into packages.
Some R
packages, for example, stats
and utils
, are automatically loaded when you start R
.
There are also several recommended packages that are installed by default, for example the mgcv
package for fitting generalised additive models.
16.3 Loading packages
If you want to use the functions in a package, you need to load the package with the function library()
.
library("mgcv")
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## This is mgcv 1.8-36. For overview type 'help("mgcv-package")'.
16.4 Installing extra packages
There are lots of extra R
packages available from CRAN (R
s homepage).
You can install or update a package from CRAN with install.packages()
install.packages("vegan")
You only need to do this once (unless you need to install a new version of a package), so you should run this directly in the console and not keep it in your script (otherwise it will install the package every time you run the code).
Once you have installed the package, you can use library()
to load it.
You need to do this every time use use it.
Often you know what package you want to install. If you don’t know the name of the package you need for your analyses the task views can help. For example, the Environmetrics task view describes packages for the analysis of ecological and environmental data.
Not all R packages are available on CRAN.
Some packages in development are only available on github.com.
Packages on github can be installed with the remotes
package.
install.packages("remotes")
#ggvegan for plotting ordinations is only on github
remotes::install_github("gavinsimpson/ggvegan")
16.5 Debugging failed package installation
Sometimes packages fail to install properly. This can be frustrating.
Some recommendations
- Check exactly which package won’t install. It may be a dependency of the package you want. Try to install it again.
- Restart
R
(in Session menu in RStudio) and try again. - Google any error message. Someone else may have had the same problem.
16.6 Name conflicts
16.6.1 The problem
Sometimes two packages have functions with the same names.
For example, both MASS
and dplyr
have a select
function which does completely different things.
If both packages are loaded at the same time there is a conflict and the function that was loaded last takes priority.
This can cause big problems, with difficult to interpret error messages.
library(palmerpenguins)#load data
library(dplyr)
library(MASS) # R will report that select is being masked
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
penguins |> select(species)
## Error in select(penguins, species): unused argument (species)
If you have code that worked one day and fails the next with a weird error messages, it might be because of a name conflict. If you start typing a function name into RStudio, it will show which package the function comes from.
There are three solutions.
16.6.2 Loading order
Be very careful about the order in which packages are loaded.
If the example above had loaded MASS
before dplyr
the select
function in MASS
would have been masked and the code would have worked.
This solution can work in a script that you source, but is fragile in interactive sessions when it is easy to load packages in the wrong order.
16.6.3 package::function
Use the package::function
notation to specify which package a function comes from.
This is safe and can make code easier to understand by showing which packages the functions are from. The code above could be written safely as
penguins |> dplyr::select(species)
## # A tibble: 344 × 1
## species
## <fct>
## 1 Adelie
## 2 Adelie
## 3 Adelie
## # … with 341 more rows
This gets ugly fairly quickly, so is best used with packages that you only need a few functions from once or twice, not functions you need many times.
16.6.4 conflicted
package
The safest solution is to use the conflicted
package.
The conflicted
package converts any conflicts between packages into errors.
This might seem like a bad idea, but it is much easier to diagnose an error from conflicted
than the weird error of a masked function.
## Error: [conflicted] `select` found in 2 packages.
## Either pick the one you want with `::`
## * MASS::select
## * dplyr::select
## Or declare a preference with `conflict_prefer()`
## * conflict_prefer("select", "MASS")
## * conflict_prefer("select", "dplyr")
As the error message suggests, we can resolve the error either by using the package::function
notation, or use the function conflict_prefer
to say which function we want to use by default.
library(dplyr)
library(MASS)
library(conflicted)
conflict_prefer("select", "dplyr")
## [conflicted] Will prefer dplyr::select over any other package
peguins |> select(Species)
## Error in select(peguins, Species): object 'peguins' not found
If there are many functions that need preferences recording, for example if you have loaded tidylog
, you can iterate over them with purrr::map
.
getNamespaceExports("tidylog") |>
map(~conflict_prefer(.x, winner = "tidylog"))
16.7 Citing packages
When you use a package it is important to cite it in a manuscript or thesis both to acknowledge the author’s work in making the package and to increase reproducibility.
The correct citation can be seen with the function citation
.
citation("lme4")
##
## To cite lme4 in publications use:
##
## Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
## Fitting Linear Mixed-Effects Models Using lme4. Journal of
## Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Fitting Linear Mixed-Effects Models Using {lme4}},
## author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
## journal = {Journal of Statistical Software},
## year = {2015},
## volume = {67},
## number = {1},
## pages = {1--48},
## doi = {10.18637/jss.v067.i01},
## }
It is also important to cite the version used.
packageVersion("lme4")
## [1] '1.1.27.1'
You should also cite R
.
Again, the citation
function can be used
citation()
##
## To cite R in publications use:
##
## R Core Team (2021). R: A language and environment for statistical
## computing. R Foundation for Statistical Computing, Vienna, Austria.
## URL https://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2021},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
The R
version can be obtained with R.version.string
.
This is a variable not a function so it does not take brackets.
R.version.string
## [1] "R version 4.1.1 (2021-08-10)"
16.8 Packages change over time
R
packages often get updated.
This is good as functions get improved and bugs get fixed.
However, it also means that code written last year might not work next year with all the latest packages.
This is a big problem for reproducibility.
The solution is to make sure you re-run your code with the same packages.
That is not easy to do by hand.
The renv
package keeps track of all the packages you are using (and all the packages they depend on).
I use renv
for my analyses.
The workflow when working with renv
is:
Call
renv::init()
to initialise a private R library for the projectWork in the project as normal, installing R packages as needed in the project
Call
renv::snapshot()
to save the state of the project libraryContinue working on your project, installing and updating R packages as needed. Use
renv::install()
to install packages from CRAN or github.If the changes were successful, call
renv::snapshot()
again. If the updated packages introduced problems, callrenv::restore()
to revert to the previous state of the library.
16.9 Writing your own function/packages
If you find that you need to run the same code several times, it can be useful to write a function.
To make a function, you need to use the reserved word function
followed by brackets with zero or more arguments. After the brackets, braces encompass the body of the function.
Here is a function that multiples two numbers together.
mutliply <- function(x, y = 1){ #The default value of y is 1
x * y
}
multiply(x = 6, y = 7)
Once you have written a function, it can be useful to make your own package.
This makes it easy to use in your own analysis and easy to share with other users.
Information on how to make a package using the usethis
and devtools
packages can be found in the package writing book or at https://r-pkgs.org/whole-game.html.