[1] 4
Dr. Alexander Fisher
Duke University
check lab solutions on Sakai
teams for labs
quiz 03
Functions are first class objects (like vectors).
A functional is a function that takes a function as an input and returns a vector as output.
Example: lapply()
and sapply()
accept function arguments.
lapply()
, as the name suggests applies a function over a list.[[1]]
[1] 2
$b
[1] 30
sapply()
works the same but returns a simpler outputWe can make a function return another function.
These are short functions that are created without ever assigning a name
Idea: won’t create an object we don’t need. This is especially useful for passing a function as an argument.
Along with the base pipe (|>
), R v4.1.0 introduced a shortcut for anonymous functions using \()
, we won’t be using this for the same reason but it is useful to know that it exists.
Use of this with the base pipe is meant avoid the need for .
, e.g.
The apply functions are a collection of tools for functional programming in base R, they are variations of the map
function found in many other languages and apply a function over the elements of the input (vector).
??base::apply
---
##
## Help files with alias or concept or title matching ‘apply’ using fuzzy
## matching:
##
## base::apply Apply Functions Over Array Margins
## base::.subset Internal Objects in Package 'base'
## base::by Apply a Function to a Data Frame Split by Factors
## base::eapply Apply a Function Over Values in an Environment
## base::lapply Apply a Function over a List or Vector
## base::mapply Apply a Function to Multiple List or Vector Arguments
## base::rapply Recursively Apply a Function to a List
## base::tapply Apply a Function Over a Ragged Array
lapply
and sapply
List of 4
$ : num 1
$ : num 4
$ : num 9
$ : num 16
List of 4
$ : num 2
$ : num 4
$ : num 8
$ : num 16
What happens if the returned lengths don’t match?
What happens if the types don’t match?
Common use case: data frames
Penalized regression: the lasso
\[ \min_{\beta \in \mathcal{R^p}} ||y - X\beta||_2^2 + \lambda||\beta||_1 \]
It only makes sense to “shrink” the \(\beta_i\)s if the predictors are on the same scale. Therefore we want to standardize the data in matrix X, e.g.
for each column j in X:
for each row i:
recompute x[i, j] = x[i, j] - mean(x[,j]) / sd(x[,j])
We can solve this elegantly with an *apply.
apply()
- applies a function over the rows or columns of a data frame, matrix or array
vapply()
- is similar to sapply
, but has a enforced return type and size
mapply()
- like sapply
but will iterate over multiple vectors at the same time.
rapply()
- a recursive version of lapply
, behavior depends largely on the how
argument
eapply()
- apply a function over an environment.
replacements for lapply
/sapply
/vapply
map()
- returns a list (same as lapply
)
map_lgl()
- returns a logical vector.
map_int()
- returns a integer vector.
map_dbl()
- returns a double vector.
map_chr()
- returns a character vector.
map_dfr()
- returns a data frame by row binding.
map_dfc()
- returns a data frame by column binding.
walk()
- returns nothing, used exclusively for function side effects
R is a weakly / dynamically typed language which means there is no syntactic way to define a function which enforces argument or return types. This flexibility can be useful at times, but often it makes it hard to reason about your code and requires more verbose code to handle edge cases.
map_dfr
and map_dfc
are particularly useful when working with and/or creating data frames. Example:
purrr lets us write anonymous functions using one sided formulas where the argument is given by .
or .x
for map
and related functions.
Read ~
as “function” and .
or .x
as “input”
Functions with the map2
prefix work the same as the map
functions but they iterate over two objects instead of one. Arguments in an anonymous function are given by .x
and .y
(or ..1
and ..2
) respectively.
Prioritize readability of your code! For complicated functions, use syntax like the first example.
sw_people
)sw_people
from the repurrrsive
package
List of 5
$ :List of 16
..$ name : chr "Luke Skywalker"
..$ height : chr "172"
..$ mass : chr "77"
..$ hair_color: chr "blond"
..$ skin_color: chr "fair"
..$ eye_color : chr "blue"
..$ birth_year: chr "19BBY"
..$ gender : chr "male"
..$ homeworld : chr "http://swapi.co/api/planets/1/"
..$ films : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
..$ species : chr "http://swapi.co/api/species/1/"
..$ vehicles : chr [1:2] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/"
..$ starships : chr [1:2] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/"
..$ created : chr "2014-12-09T13:50:51.644000Z"
..$ edited : chr "2014-12-20T21:17:56.891000Z"
..$ url : chr "http://swapi.co/api/people/1/"
$ :List of 14
..$ name : chr "C-3PO"
..$ height : chr "167"
..$ mass : chr "75"
..$ hair_color: chr "n/a"
..$ skin_color: chr "gold"
..$ eye_color : chr "yellow"
..$ birth_year: chr "112BBY"
..$ gender : chr "n/a"
..$ homeworld : chr "http://swapi.co/api/planets/1/"
..$ films : chr [1:6] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
..$ species : chr "http://swapi.co/api/species/2/"
..$ created : chr "2014-12-10T15:10:51.357000Z"
..$ edited : chr "2014-12-20T21:17:50.309000Z"
..$ url : chr "http://swapi.co/api/people/2/"
$ :List of 14
..$ name : chr "R2-D2"
..$ height : chr "96"
..$ mass : chr "32"
..$ hair_color: chr "n/a"
..$ skin_color: chr "white, blue"
..$ eye_color : chr "red"
..$ birth_year: chr "33BBY"
..$ gender : chr "n/a"
..$ homeworld : chr "http://swapi.co/api/planets/8/"
..$ films : chr [1:7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
..$ species : chr "http://swapi.co/api/species/2/"
..$ created : chr "2014-12-10T15:11:50.376000Z"
..$ edited : chr "2014-12-20T21:17:50.311000Z"
..$ url : chr "http://swapi.co/api/people/3/"
$ :List of 15
..$ name : chr "Darth Vader"
..$ height : chr "202"
..$ mass : chr "136"
..$ hair_color: chr "none"
..$ skin_color: chr "white"
..$ eye_color : chr "yellow"
..$ birth_year: chr "41.9BBY"
..$ gender : chr "male"
..$ homeworld : chr "http://swapi.co/api/planets/1/"
..$ films : chr [1:4] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/"
..$ species : chr "http://swapi.co/api/species/1/"
..$ starships : chr "http://swapi.co/api/starships/13/"
..$ created : chr "2014-12-10T15:18:20.704000Z"
..$ edited : chr "2014-12-20T21:17:50.313000Z"
..$ url : chr "http://swapi.co/api/people/4/"
$ :List of 15
..$ name : chr "Leia Organa"
..$ height : chr "150"
..$ mass : chr "49"
..$ hair_color: chr "brown"
..$ skin_color: chr "light"
..$ eye_color : chr "brown"
..$ birth_year: chr "19BBY"
..$ gender : chr "female"
..$ homeworld : chr "http://swapi.co/api/planets/2/"
..$ films : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
..$ species : chr "http://swapi.co/api/species/1/"
..$ vehicles : chr "http://swapi.co/api/vehicles/30/"
..$ created : chr "2014-12-10T15:20:09.791000Z"
..$ edited : chr "2014-12-20T21:17:50.315000Z"
..$ url : chr "http://swapi.co/api/people/5/"
Very often we want to extract only certain (named) values from a list, purrr
provides a shortcut for this operation - if instead of a function you provide either a character or numeric vector, those values will be used to sequentially subset the elements being iterated.
[1] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
[3] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/6/"
[5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/5/"
[7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/1/"
[9] "http://swapi.co/api/films/1/" "http://swapi.co/api/films/5/"
Error in `purrr::map_chr()`:
ℹ In index: 2.
Caused by error:
! Result must be length 1, not 0.
(chars = tibble(
name = purrr::map_chr(sw_people, "name"),
starships = purrr::map(sw_people, "starships")
))
# A tibble: 87 × 2
name starships
<chr> <list>
1 Luke Skywalker <chr [2]>
2 C-3PO <NULL>
3 R2-D2 <NULL>
4 Darth Vader <chr [1]>
5 Leia Organa <NULL>
6 Owen Lars <NULL>
7 Beru Whitesun lars <NULL>
8 R5-D4 <NULL>
9 Biggs Darklighter <chr [1]>
10 Obi-Wan Kenobi <chr [5]>
# … with 77 more rows
# A tibble: 87 × 3
name starships n
<chr> <list> <int>
1 Luke Skywalker <chr [2]> 2
2 C-3PO <NULL> 0
3 R2-D2 <NULL> 0
4 Darth Vader <chr [1]> 1
5 Leia Organa <NULL> 0
6 Owen Lars <NULL> 0
7 Beru Whitesun lars <NULL> 0
8 R5-D4 <NULL> 0
9 Biggs Darklighter <chr [1]> 1
10 Obi-Wan Kenobi <chr [5]> 5
# … with 77 more rows
Use the function above to draw n = 1000
points from a box of area 4. Save your output as an object called points
.
Use map
or an appropriate version to determine which points \((x, y)\) are within the unit circle centered at the origin.
What proportion of points are within the unit circle?
Can you approximate \(\pi\) like this? How?
How can you make your estimate more precise?
Use mtcars
and a single map
or map variant to
get the type of each variable
get the fourth row such that result is a character vector
compute the mean of each variable
compute the mean and median for each variable such that the result is a data frame with the mean values in row 1 and the median values in row 2.