Functional programming and purrr

Dr. Alexander Fisher

Duke University

Announcements

  • check lab solutions on Sakai

  • teams for labs

    • see announcement on slack
    • message me by Friday, Feb 10 if you’d like to be pseudo-randomly assigned a team or
    • reach out to any member of the teaching team if you’d like to form a specific team.
  • quiz 03

Functionals

Function as objects

Functions are first class objects (like vectors).

f = function(x) {
  x ^ 2
}
g = f
g(2)
[1] 4
l = list(f = f, g = g)
l$f(3)
[1] 9
l[[2]](4)
[1] 16
l[1](3)
Error in eval(expr, envir, enclos): attempt to apply non-function

Functions as arguments

A functional is a function that takes a function as an input and returns a vector as output.

Example: lapply() and sapply() accept function arguments.

  • lapply(), as the name suggests applies a function over a list.
x = list( c(1,2,3), b = c(10, 20, 30, 40, 50))
lapply(x, mean) # output is a list
[[1]]
[1] 2

$b
[1] 30
  • sapply() works the same but returns a simpler output
sapply(x, mean) # output is a vector of doubles
    b 
 2 30 

Functions as output

We can make a function return another function.

f = function (n) {
  # function returns 
  # function that raises its argument to the n power
  g = function(x) {
    return(x ^ n)
  }
  return(g)
}

f(3)(2) # 2 ^ 3
[1] 8

Anonymous functions (lambdas)

These are short functions that are created without ever assigning a name

function(x) {x + 1}
function(x) {x + 1}
(function(y) {y - 1})(10)
[1] 9

Idea: won’t create an object we don’t need. This is especially useful for passing a function as an argument.

Example: numerical derivatives

integrate(function(x) x, 0, 1)
0.5 with absolute error < 5.6e-15
integrate(function(x) (x * x) - (2 * x) + 1, 0, 1)
0.3333333 with absolute error < 3.7e-15

Base R lambda shorthand

Along with the base pipe (|>), R v4.1.0 introduced a shortcut for anonymous functions using \(), we won’t be using this for the same reason but it is useful to know that it exists.

f = \(x) {1 + x}
f(1:5)
[1] 2 3 4 5 6
(\(x) x ^ 2)(10)
[1] 100
integrate(\(x) sin(x) ^ 2, 0, 1)
0.2726756 with absolute error < 3e-15

Use of this with the base pipe is meant avoid the need for ., e.g.

data.frame(x = runif(10), y = runif(10)) |>
  {\(d) lm(y ~ x, data = d)}()

Call:
lm(formula = y ~ x, data = d)

Coefficients:
(Intercept)            x  
     0.5540      -0.3096  

apply (base R)

apply functions

The apply functions are a collection of tools for functional programming in base R, they are variations of the map function found in many other languages and apply a function over the elements of the input (vector).

??base::apply
---
## 
## Help files with alias or concept or title matching ‘apply’ using fuzzy
## matching:
## 
## base::apply             Apply Functions Over Array Margins
## base::.subset           Internal Objects in Package 'base'
## base::by                Apply a Function to a Data Frame Split by Factors
## base::eapply            Apply a Function Over Values in an Environment
## base::lapply            Apply a Function over a List or Vector
## base::mapply            Apply a Function to Multiple List or Vector Arguments
## base::rapply            Recursively Apply a Function to a List
## base::tapply            Apply a Function Over a Ragged Array
  • applies function in an iterative format

lapply and sapply

lapply(1:4, function(x, pow) x ^ pow, pow = 2) %>% str()
List of 4
 $ : num 1
 $ : num 4
 $ : num 9
 $ : num 16
lapply(1:4, function(x, pow) x ^ pow, x = 2) %>% str()
List of 4
 $ : num 2
 $ : num 4
 $ : num 8
 $ : num 16
sapply(1:8, function(x) (x + 1) ^ 2)
[1]  4  9 16 25 36 49 64 81
sapply(1:8, function(x) c(x, x ^ 2, x ^ 3))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    2    3    4    5    6    7    8
[2,]    1    4    9   16   25   36   49   64
[3,]    1    8   27   64  125  216  343  512

What happens if the returned lengths don’t match?

sapply(1:3, seq) %>% str()
List of 3
 $ : int 1
 $ : int [1:2] 1 2
 $ : int [1:3] 1 2 3
lapply(1:3, seq) %>% str()
List of 3
 $ : int 1
 $ : int [1:2] 1 2
 $ : int [1:3] 1 2 3

What happens if the types don’t match?

type coercion!

l = list(a = 1:3, b = 4:6, c = 7:9, d = list(10, 11, "A"))
sapply(l, function(x) x[1]) %>% str()
List of 4
 $ a: int 1
 $ b: int 4
 $ c: int 7
 $ d: num 10
  • type consistency issue: can’t quickly see return type

*apply and data frames

Common use case: data frames

  • recall: a data frames is just a fancy list
df = data.frame(
  a = 1:6, 
  b = letters[1:6], 
  c = c(TRUE,FALSE)
)
lapply(df, class) %>% str()
List of 3
 $ a: chr "integer"
 $ b: chr "character"
 $ c: chr "logical"
sapply(df, class)
          a           b           c 
  "integer" "character"   "logical" 

A more useful example

Penalized regression: the lasso

\[ \min_{\beta \in \mathcal{R^p}} ||y - X\beta||_2^2 + \lambda||\beta||_1 \]

It only makes sense to “shrink” the \(\beta_i\)s if the predictors are on the same scale. Therefore we want to standardize the data in matrix X, e.g.

for each column j in X: 
  for each row i:
    recompute x[i, j] = x[i, j] - mean(x[,j]) / sd(x[,j])
    

We can solve this elegantly with an *apply.

X = data.frame(height = c(72, 60, 64),
               bpm = c(82, 55, 60))
apply(X, 2, function(x) (x - mean(x)) / sd(x)) # returns matrix
         height        bpm
[1,]  1.0910895  1.1370777
[2,] -0.8728716 -0.7425813
[3,] -0.2182179 -0.3944963
lapply(X, function (x) (x - mean(x)) / sd(x)) %>% 
  as.data.frame() 
      height        bpm
1  1.0910895  1.1370777
2 -0.8728716 -0.7425813
3 -0.2182179 -0.3944963

other less common apply functions

  • apply() - applies a function over the rows or columns of a data frame, matrix or array

  • vapply() - is similar to sapply, but has a enforced return type and size

  • mapply() - like sapply but will iterate over multiple vectors at the same time.

  • rapply() - a recursive version of lapply, behavior depends largely on the how argument

  • eapply() - apply a function over an environment.

purrr

Map functions

  • replacements for lapply/sapply/vapply

  • map() - returns a list (same as lapply)

  • map_lgl() - returns a logical vector.

  • map_int() - returns a integer vector.

  • map_dbl() - returns a double vector.

  • map_chr() - returns a character vector.

  • map_dfr() - returns a data frame by row binding.

  • map_dfc() - returns a data frame by column binding.

  • walk() - returns nothing, used exclusively for function side effects

Type consistency

R is a weakly / dynamically typed language which means there is no syntactic way to define a function which enforces argument or return types. This flexibility can be useful at times, but often it makes it hard to reason about your code and requires more verbose code to handle edge cases.

set.seed(123)
x = list(rnorm(1e3), rnorm(1e3), rnorm(1e3))
map_dbl(x, mean)
[1]  0.01612787  0.04246525 -0.02011253
map_chr(x, mean)
Warning: Automatic coercion from double to character was deprecated in purrr 1.0.0.
ℹ Please use an explicit call to `as.character()` within `map_chr()` instead.
[1] "0.016128"  "0.042465"  "-0.020113"
map_int(x, mean)
Error in `map_int()`:
ℹ In index: 1.
Caused by error:
! Can't coerce from a double vector to an integer vector.
map(x, mean) %>% str()
List of 3
 $ : num 0.0161
 $ : num 0.0425
 $ : num -0.0201

Working with Data Frames

map_dfr and map_dfc are particularly useful when working with and/or creating data frames. Example:

X = data.frame(height = c(72, 60, 64),
               bpm = c(82, 55, 60),
               age = c(25, 30, 35))
standardize = function(x) (x - mean(x)) / sd(x)
map_dfc(X, standardize)
# A tibble: 3 × 3
  height    bpm   age
   <dbl>  <dbl> <dbl>
1  1.09   1.14     -1
2 -0.873 -0.743     0
3 -0.218 -0.394     1
map_dfr(X, function(x) x[1:2])
# A tibble: 2 × 3
  height   bpm   age
   <dbl> <dbl> <dbl>
1     72    82    25
2     60    55    30
map_dfr(X, function(x) x)
# A tibble: 3 × 3
  height   bpm   age
   <dbl> <dbl> <dbl>
1     72    82    25
2     60    55    30
3     64    60    35

Shortcut - purrr style lambdas

purrr lets us write anonymous functions using one sided formulas where the argument is given by . or .x for map and related functions.

map_dbl(1:5, function(x) x / (x + 1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

Read ~ as “function” and . or .x as “input”

map_dbl(1:5, ~ . / (. + 1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333
map_dbl(1:5, ~ .x / (.x + 1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

Generally, the latter option is preferred to avoid confusion with magrittr.

Multiargument anonymous functions

Functions with the map2 prefix work the same as the map functions but they iterate over two objects instead of one. Arguments in an anonymous function are given by .x and .y (or ..1 and ..2) respectively.

map2_dbl(1:5, 1:5, function(x,y) x / (y+1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333
map2_dbl(1:5, 1:5, ~ .x/(.y+1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333
map2_dbl(1:5, 1:5, ~ ..1/(..2+1))
[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333
map2_chr(LETTERS[1:5], letters[1:5], paste0)
[1] "Aa" "Bb" "Cc" "Dd" "Ee"

Prioritize readability of your code! For complicated functions, use syntax like the first example.

Lookups (sw_people)

library(repurrrsive)

sw_people from the repurrrsive package

str(sw_people[1:5])
List of 5
 $ :List of 16
  ..$ name      : chr "Luke Skywalker"
  ..$ height    : chr "172"
  ..$ mass      : chr "77"
  ..$ hair_color: chr "blond"
  ..$ skin_color: chr "fair"
  ..$ eye_color : chr "blue"
  ..$ birth_year: chr "19BBY"
  ..$ gender    : chr "male"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ vehicles  : chr [1:2] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/"
  ..$ starships : chr [1:2] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/"
  ..$ created   : chr "2014-12-09T13:50:51.644000Z"
  ..$ edited    : chr "2014-12-20T21:17:56.891000Z"
  ..$ url       : chr "http://swapi.co/api/people/1/"
 $ :List of 14
  ..$ name      : chr "C-3PO"
  ..$ height    : chr "167"
  ..$ mass      : chr "75"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "gold"
  ..$ eye_color : chr "yellow"
  ..$ birth_year: chr "112BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:6] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:10:51.357000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.309000Z"
  ..$ url       : chr "http://swapi.co/api/people/2/"
 $ :List of 14
  ..$ name      : chr "R2-D2"
  ..$ height    : chr "96"
  ..$ mass      : chr "32"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "white, blue"
  ..$ eye_color : chr "red"
  ..$ birth_year: chr "33BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/8/"
  ..$ films     : chr [1:7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:11:50.376000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.311000Z"
  ..$ url       : chr "http://swapi.co/api/people/3/"
 $ :List of 15
  ..$ name      : chr "Darth Vader"
  ..$ height    : chr "202"
  ..$ mass      : chr "136"
  ..$ hair_color: chr "none"
  ..$ skin_color: chr "white"
  ..$ eye_color : chr "yellow"
  ..$ birth_year: chr "41.9BBY"
  ..$ gender    : chr "male"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:4] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ starships : chr "http://swapi.co/api/starships/13/"
  ..$ created   : chr "2014-12-10T15:18:20.704000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.313000Z"
  ..$ url       : chr "http://swapi.co/api/people/4/"
 $ :List of 15
  ..$ name      : chr "Leia Organa"
  ..$ height    : chr "150"
  ..$ mass      : chr "49"
  ..$ hair_color: chr "brown"
  ..$ skin_color: chr "light"
  ..$ eye_color : chr "brown"
  ..$ birth_year: chr "19BBY"
  ..$ gender    : chr "female"
  ..$ homeworld : chr "http://swapi.co/api/planets/2/"
  ..$ films     : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ vehicles  : chr "http://swapi.co/api/vehicles/30/"
  ..$ created   : chr "2014-12-10T15:20:09.791000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.315000Z"
  ..$ url       : chr "http://swapi.co/api/people/5/"

Lookups

Very often we want to extract only certain (named) values from a list, purrr provides a shortcut for this operation - if instead of a function you provide either a character or numeric vector, those values will be used to sequentially subset the elements being iterated.

purrr::map_chr(sw_people, "name") %>% head()
[1] "Luke Skywalker" "C-3PO"          "R2-D2"          "Darth Vader"   
[5] "Leia Organa"    "Owen Lars"     
purrr::map_chr(sw_people, 1) %>% head()
[1] "Luke Skywalker" "C-3PO"          "R2-D2"          "Darth Vader"   
[5] "Leia Organa"    "Owen Lars"     
purrr::map_chr(sw_people, list("films", 1)) %>% head(n=10)
 [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
 [3] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
 [5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
 [7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/1/"
 [9] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"

Length coercion?

purrr::map_chr(sw_people, list("starships", 1))
Error in `purrr::map_chr()`:
ℹ In index: 2.
Caused by error:
! Result must be length 1, not 0.
sw_people[[2]]$name
[1] "C-3PO"
sw_people[[2]]$starships
NULL
purrr::map(sw_people, list("starships", 1)) %>% head(n = 3) %>% str()
List of 3
 $ : chr "http://swapi.co/api/starships/12/"
 $ : NULL
 $ : NULL
purrr::map_chr(sw_people, list("starships", 1), .default = NA) %>% head()
[1] "http://swapi.co/api/starships/12/" NA                                 
[3] NA                                  "http://swapi.co/api/starships/13/"
[5] NA                                  NA                                 

manual unnesting

  • how many starships does each character have?
(chars = tibble(
  name = purrr::map_chr(sw_people, "name"),
  starships = purrr::map(sw_people, "starships")
))
# A tibble: 87 × 2
   name               starships
   <chr>              <list>   
 1 Luke Skywalker     <chr [2]>
 2 C-3PO              <NULL>   
 3 R2-D2              <NULL>   
 4 Darth Vader        <chr [1]>
 5 Leia Organa        <NULL>   
 6 Owen Lars          <NULL>   
 7 Beru Whitesun lars <NULL>   
 8 R5-D4              <NULL>   
 9 Biggs Darklighter  <chr [1]>
10 Obi-Wan Kenobi     <chr [5]>
# … with 77 more rows
chars %>%
  mutate(n = map_int(starships, length))
# A tibble: 87 × 3
   name               starships     n
   <chr>              <list>    <int>
 1 Luke Skywalker     <chr [2]>     2
 2 C-3PO              <NULL>        0
 3 R2-D2              <NULL>        0
 4 Darth Vader        <chr [1]>     1
 5 Leia Organa        <NULL>        0
 6 Owen Lars          <NULL>        0
 7 Beru Whitesun lars <NULL>        0
 8 R5-D4              <NULL>        0
 9 Biggs Darklighter  <chr [1]>     1
10 Obi-Wan Kenobi     <chr [5]>     5
# … with 77 more rows
  • much more efficient if you only need a subset of the columns to be “unnested”

Exercises

Exercise 1

draw_points = function(n) {
  list(
    x = runif(n, -1, 1),
    y = runif(n, -1, 1)
  )
}
  • Use the function above to draw n = 1000 points from a box of area 4. Save your output as an object called points.

  • Use map or an appropriate version to determine which points \((x, y)\) are within the unit circle centered at the origin.

  • What proportion of points are within the unit circle?

  • Can you approximate \(\pi\) like this? How?

  • How can you make your estimate more precise?

Exercise 2

Use mtcars and a single map or map variant to

  • get the type of each variable

  • get the fourth row such that result is a character vector

  • compute the mean of each variable

  • compute the mean and median for each variable such that the result is a data frame with the mean values in row 1 and the median values in row 2.