Functional programming and purrr

Dr. Alexander Fisher

Duke University

Announcements

check lab solutions on Sakai
teams for labs
- see announcement on slack
- message me by Friday, Feb 10 if you’d like to be pseudo-randomly assigned a team or
- reach out to any member of the teaching team if you’d like to form a specific team.
quiz 03

Functionals

Function as objects

Functions are first class objects (like vectors).

f = function(x) {
  x ^ 2
}
g = f
g(2)

[1] 4

l = list(f = f, g = g)
l$f(3)

[1] 9

l[[2]](4)

[1] 16

l[1](3)

Error in eval(expr, envir, enclos): attempt to apply non-function

Functions as arguments

A functional is a function that takes a function as an input and returns a vector as output.

Example: lapply() and sapply() accept function arguments.

lapply(), as the name suggests applies a function over a list.

x = list( c(1,2,3), b = c(10, 20, 30, 40, 50))
lapply(x, mean) # output is a list

[[1]]
[1] 2

$b
[1] 30

sapply() works the same but returns a simpler output

sapply(x, mean) # output is a vector of doubles

    b 
 2 30

Functions as output

We can make a function return another function.

f = function (n) {
  # function returns 
  # function that raises its argument to the n power
  g = function(x) {
    return(x ^ n)
  }
  return(g)
}

f(3)(2) # 2 ^ 3

[1] 8

Anonymous functions (lambdas)

These are short functions that are created without ever assigning a name

function(x) {x + 1}

function(x) {x + 1}

(function(y) {y - 1})(10)

[1] 9

Idea: won’t create an object we don’t need. This is especially useful for passing a function as an argument.

Example: numerical derivatives

integrate(function(x) x, 0, 1)

0.5 with absolute error < 5.6e-15

integrate(function(x) (x * x) - (2 * x) + 1, 0, 1)

0.3333333 with absolute error < 3.7e-15

Base R lambda shorthand

Along with the base pipe (|>), R v4.1.0 introduced a shortcut for anonymous functions using \(), we won’t be using this for the same reason but it is useful to know that it exists.

f = \(x) {1 + x}
f(1:5)

[1] 2 3 4 5 6

(\(x) x ^ 2)(10)

[1] 100

integrate(\(x) sin(x) ^ 2, 0, 1)

0.2726756 with absolute error < 3e-15

Use of this with the base pipe is meant avoid the need for ., e.g.

data.frame(x = runif(10), y = runif(10)) |>
  {\(d) lm(y ~ x, data = d)}()


Call:
lm(formula = y ~ x, data = d)

Coefficients:
(Intercept)            x  
     0.5540      -0.3096

apply (base R)

apply functions

The apply functions are a collection of tools for functional programming in base R, they are variations of the map function found in many other languages and apply a function over the elements of the input (vector).

??base::apply
---
## 
## Help files with alias or concept or title matching ‘apply’ using fuzzy
## matching:
## 
## base::apply             Apply Functions Over Array Margins
## base::.subset           Internal Objects in Package 'base'
## base::by                Apply a Function to a Data Frame Split by Factors
## base::eapply            Apply a Function Over Values in an Environment
## base::lapply            Apply a Function over a List or Vector
## base::mapply            Apply a Function to Multiple List or Vector Arguments
## base::rapply            Recursively Apply a Function to a List
## base::tapply            Apply a Function Over a Ragged Array

applies function in an iterative format

`lapply` and `sapply`

lapply(1:4, function(x, pow) x ^ pow, pow = 2) %>% str()

List of 4
 $ : num 1
 $ : num 4
 $ : num 9
 $ : num 16

lapply(1:4, function(x, pow) x ^ pow, x = 2) %>% str()

List of 4
 $ : num 2
 $ : num 4
 $ : num 8
 $ : num 16

sapply(1:8, function(x) (x + 1) ^ 2)

[1]  4  9 16 25 36 49 64 81

sapply(1:8, function(x) c(x, x ^ 2, x ^ 3))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    2    3    4    5    6    7    8
[2,]    1    4    9   16   25   36   49   64
[3,]    1    8   27   64  125  216  343  512

What happens if the returned lengths don’t match?

sapply(1:3, seq) %>% str()

List of 3
 $ : int 1
 $ : int [1:2] 1 2
 $ : int [1:3] 1 2 3

lapply(1:3, seq) %>% str()

List of 3
 $ : int 1
 $ : int [1:2] 1 2
 $ : int [1:3] 1 2 3

What happens if the types don’t match?

type coercion!

l = list(a = 1:3, b = 4:6, c = 7:9, d = list(10, 11, "A"))

sapply(l, function(x) x[1]) %>% str()

List of 4
 $ a: int 1
 $ b: int 4
 $ c: int 7
 $ d: num 10

type consistency issue: can’t quickly see return type

*apply and data frames

Common use case: data frames

recall: a data frames is just a fancy list

df = data.frame(
  a = 1:6, 
  b = letters[1:6], 
  c = c(TRUE,FALSE)
)

lapply(df, class) %>% str()

List of 3
 $ a: chr "integer"
 $ b: chr "character"
 $ c: chr "logical"

sapply(df, class)

          a           b           c 
  "integer" "character"   "logical"

A more useful example

Penalized regression: the lasso

\[ \min_{\beta \in \mathcal{R^p}} ||y - X\beta||_2^2 + \lambda||\beta||_1 \]

It only makes sense to “shrink” the \(\beta_i\)s if the predictors are on the same scale. Therefore we want to standardize the data in matrix X, e.g.

for each column j in X: 
  for each row i:
    recompute x[i, j] = x[i, j] - mean(x[,j]) / sd(x[,j])

We can solve this elegantly with an *apply.

X = data.frame(height = c(72, 60, 64),
               bpm = c(82, 55, 60))

apply
lapply

apply(X, 2, function(x) (x - mean(x)) / sd(x)) # returns matrix

         height        bpm
[1,]  1.0910895  1.1370777
[2,] -0.8728716 -0.7425813
[3,] -0.2182179 -0.3944963

lapply(X, function (x) (x - mean(x)) / sd(x)) %>% 
  as.data.frame()

      height        bpm
1  1.0910895  1.1370777
2 -0.8728716 -0.7425813
3 -0.2182179 -0.3944963

other less common apply functions

apply() - applies a function over the rows or columns of a data frame, matrix or array
vapply() - is similar to sapply, but has a enforced return type and size
mapply() - like sapply but will iterate over multiple vectors at the same time.
rapply() - a recursive version of lapply, behavior depends largely on the how argument
eapply() - apply a function over an environment.

purrr

Map functions

replacements for lapply/sapply/vapply
map() - returns a list (same as lapply)
map_lgl() - returns a logical vector.
map_int() - returns a integer vector.
map_dbl() - returns a double vector.
map_chr() - returns a character vector.
map_dfr() - returns a data frame by row binding.
map_dfc() - returns a data frame by column binding.
walk() - returns nothing, used exclusively for function side effects

Type consistency

R is a weakly / dynamically typed language which means there is no syntactic way to define a function which enforces argument or return types. This flexibility can be useful at times, but often it makes it hard to reason about your code and requires more verbose code to handle edge cases.

set.seed(123)
x = list(rnorm(1e3), rnorm(1e3), rnorm(1e3))

map_dbl(x, mean)

[1]  0.01612787  0.04246525 -0.02011253

map_chr(x, mean)

Warning: Automatic coercion from double to character was deprecated in purrr 1.0.0.
ℹ Please use an explicit call to `as.character()` within `map_chr()` instead.

[1] "0.016128"  "0.042465"  "-0.020113"

map_int(x, mean)

Error in `map_int()`:
ℹ In index: 1.
Caused by error:
! Can't coerce from a double vector to an integer vector.

map(x, mean) %>% str()

List of 3
 $ : num 0.0161
 $ : num 0.0425
 $ : num -0.0201

Working with Data Frames

map_dfr and map_dfc are particularly useful when working with and/or creating data frames. Example:

X = data.frame(height = c(72, 60, 64),
               bpm = c(82, 55, 60),
               age = c(25, 30, 35))

standardize = function(x) (x - mean(x)) / sd(x)

map_dfc(X, standardize)

# A tibble: 3 × 3
  height    bpm   age
   <dbl>  <dbl> <dbl>
1  1.09   1.14     -1
2 -0.873 -0.743     0
3 -0.218 -0.394     1

map_dfr(X, function(x) x[1:2])

# A tibble: 2 × 3
  height   bpm   age
   <dbl> <dbl> <dbl>
1     72    82    25
2     60    55    30

map_dfr(X, function(x) x)

# A tibble: 3 × 3
  height   bpm   age
   <dbl> <dbl> <dbl>
1     72    82    25
2     60    55    30
3     64    60    35

Shortcut - purrr style lambdas

purrr lets us write anonymous functions using one sided formulas where the argument is given by . or .x for map and related functions.

map_dbl(1:5, function(x) x / (x + 1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

Read ~ as “function” and . or .x as “input”

map_dbl(1:5, ~ . / (. + 1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

map_dbl(1:5, ~ .x / (.x + 1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

Generally, the latter option is preferred to avoid confusion with magrittr.

Multiargument anonymous functions

Functions with the map2 prefix work the same as the map functions but they iterate over two objects instead of one. Arguments in an anonymous function are given by .x and .y (or ..1 and ..2) respectively.

map2_dbl(1:5, 1:5, function(x,y) x / (y+1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

map2_dbl(1:5, 1:5, ~ .x/(.y+1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

map2_dbl(1:5, 1:5, ~ ..1/(..2+1))

[1] 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333

map2_chr(LETTERS[1:5], letters[1:5], paste0)

[1] "Aa" "Bb" "Cc" "Dd" "Ee"

Prioritize readability of your code! For complicated functions, use syntax like the first example.

Lookups (`sw_people`)

library(repurrrsive)

sw_people from the repurrrsive package

str(sw_people[1:5])

List of 5
 $ :List of 16
  ..$ name      : chr "Luke Skywalker"
  ..$ height    : chr "172"
  ..$ mass      : chr "77"
  ..$ hair_color: chr "blond"
  ..$ skin_color: chr "fair"
  ..$ eye_color : chr "blue"
  ..$ birth_year: chr "19BBY"
  ..$ gender    : chr "male"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ vehicles  : chr [1:2] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/"
  ..$ starships : chr [1:2] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/"
  ..$ created   : chr "2014-12-09T13:50:51.644000Z"
  ..$ edited    : chr "2014-12-20T21:17:56.891000Z"
  ..$ url       : chr "http://swapi.co/api/people/1/"
 $ :List of 14
  ..$ name      : chr "C-3PO"
  ..$ height    : chr "167"
  ..$ mass      : chr "75"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "gold"
  ..$ eye_color : chr "yellow"
  ..$ birth_year: chr "112BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:6] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:10:51.357000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.309000Z"
  ..$ url       : chr "http://swapi.co/api/people/2/"
 $ :List of 14
  ..$ name      : chr "R2-D2"
  ..$ height    : chr "96"
  ..$ mass      : chr "32"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "white, blue"
  ..$ eye_color : chr "red"
  ..$ birth_year: chr "33BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/8/"
  ..$ films     : chr [1:7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:11:50.376000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.311000Z"
  ..$ url       : chr "http://swapi.co/api/people/3/"
 $ :List of 15
  ..$ name      : chr "Darth Vader"
  ..$ height    : chr "202"
  ..$ mass      : chr "136"
  ..$ hair_color: chr "none"
  ..$ skin_color: chr "white"
  ..$ eye_color : chr "yellow"
  ..$ birth_year: chr "41.9BBY"
  ..$ gender    : chr "male"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:4] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ starships : chr "http://swapi.co/api/starships/13/"
  ..$ created   : chr "2014-12-10T15:18:20.704000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.313000Z"
  ..$ url       : chr "http://swapi.co/api/people/4/"
 $ :List of 15
  ..$ name      : chr "Leia Organa"
  ..$ height    : chr "150"
  ..$ mass      : chr "49"
  ..$ hair_color: chr "brown"
  ..$ skin_color: chr "light"
  ..$ eye_color : chr "brown"
  ..$ birth_year: chr "19BBY"
  ..$ gender    : chr "female"
  ..$ homeworld : chr "http://swapi.co/api/planets/2/"
  ..$ films     : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ vehicles  : chr "http://swapi.co/api/vehicles/30/"
  ..$ created   : chr "2014-12-10T15:20:09.791000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.315000Z"
  ..$ url       : chr "http://swapi.co/api/people/5/"

Lookups

Very often we want to extract only certain (named) values from a list, purrr provides a shortcut for this operation - if instead of a function you provide either a character or numeric vector, those values will be used to sequentially subset the elements being iterated.

purrr::map_chr(sw_people, "name") %>% head()

[1] "Luke Skywalker" "C-3PO"          "R2-D2"          "Darth Vader"   
[5] "Leia Organa"    "Owen Lars"

purrr::map_chr(sw_people, 1) %>% head()

[1] "Luke Skywalker" "C-3PO"          "R2-D2"          "Darth Vader"   
[5] "Leia Organa"    "Owen Lars"

purrr::map_chr(sw_people, list("films", 1)) %>% head(n=10)

 [1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
 [3] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
 [5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
 [7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/1/"
 [9] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"

Length coercion?

purrr::map_chr(sw_people, list("starships", 1))

Error in `purrr::map_chr()`:
ℹ In index: 2.
Caused by error:
! Result must be length 1, not 0.

sw_people[[2]]$name

[1] "C-3PO"

sw_people[[2]]$starships

NULL

purrr::map(sw_people, list("starships", 1)) %>% head(n = 3) %>% str()

List of 3
 $ : chr "http://swapi.co/api/starships/12/"
 $ : NULL
 $ : NULL

purrr::map_chr(sw_people, list("starships", 1), .default = NA) %>% head()

[1] "http://swapi.co/api/starships/12/" NA                                 
[3] NA                                  "http://swapi.co/api/starships/13/"
[5] NA                                  NA

manual unnesting

how many starships does each character have?

(chars = tibble(
  name = purrr::map_chr(sw_people, "name"),
  starships = purrr::map(sw_people, "starships")
))

# A tibble: 87 × 2
   name               starships
   <chr>              <list>   
 1 Luke Skywalker     <chr [2]>
 2 C-3PO              <NULL>   
 3 R2-D2              <NULL>   
 4 Darth Vader        <chr [1]>
 5 Leia Organa        <NULL>   
 6 Owen Lars          <NULL>   
 7 Beru Whitesun lars <NULL>   
 8 R5-D4              <NULL>   
 9 Biggs Darklighter  <chr [1]>
10 Obi-Wan Kenobi     <chr [5]>
# … with 77 more rows

chars %>%
  mutate(n = map_int(starships, length))

# A tibble: 87 × 3
   name               starships     n
   <chr>              <list>    <int>
 1 Luke Skywalker     <chr [2]>     2
 2 C-3PO              <NULL>        0
 3 R2-D2              <NULL>        0
 4 Darth Vader        <chr [1]>     1
 5 Leia Organa        <NULL>        0
 6 Owen Lars          <NULL>        0
 7 Beru Whitesun lars <NULL>        0
 8 R5-D4              <NULL>        0
 9 Biggs Darklighter  <chr [1]>     1
10 Obi-Wan Kenobi     <chr [5]>     5
# … with 77 more rows

much more efficient if you only need a subset of the columns to be “unnested”

Exercises

Exercise 1

draw_points = function(n) {
  list(
    x = runif(n, -1, 1),
    y = runif(n, -1, 1)
  )
}

Use the function above to draw n = 1000 points from a box of area 4. Save your output as an object called points.
Use map or an appropriate version to determine which points \((x, y)\) are within the unit circle centered at the origin.
What proportion of points are within the unit circle?
Can you approximate \(\pi\) like this? How?
How can you make your estimate more precise?

Exercise 2

Use mtcars and a single map or map variant to

get the type of each variable
get the fourth row such that result is a character vector
compute the mean of each variable
compute the mean and median for each variable such that the result is a data frame with the mean values in row 1 and the median values in row 2.