Lab 1: if it’s a function, fix it

Due Monday January 30 at 5:00pm

By the end of the lab, you will…

Getting started

  • If you did not complete lab 0, then go back and do so to set up your SSH key, git, and find instructions on how to clone a project.

  • Next in the terminal tab, type cd ~ to navigate to your home directory. Next git clone git@github.com:sta323-sp23/lab-1-username.git where username is replaced with your username (this is the copied SSH URL from your lab-1 GitHub repo).

  • Navigate to your lab-1 folder and open the .Rproj file.

  • Open the Quarto (.qmd) file, change the author name to your name.

Exercises

For all exercises, you should respond in the space provided in the template lab-1.qmd and show all your work. In all answers of type double, three significant figures suffices.

1. Robust quadratic root finding

The quadratic formula finds \(x\) that satisfy the following equation:

\[ ax^2 + bx + c = 0. \]

In general, \(x^*\) such that \(f(x^*) = 0\), then \(x^*\) is said to be a root of \(f(x)\). You can find an example of basic implementation of the quadratic formula from lecture 2: control flow, loops and functions.

Write a function named robustQuadraticRoots() that accepts three arguments: a, b and c and returns both roots (even if they are repeated) while satisfying the additional cases below. Show the output of your function robustQuadraticRoots() on each of the cases below.

# case i 
robustQuadraticRoots(a = 0, b = 3, c = 2)

# case ii
robustQuadraticRoots(a = 0, b = 0, c = 0)

# case iii
robustQuadraticRoots(a = 1, b = 2, c = 3)

# case iv
robustQuadraticRoots(a = 4.3162933554e-11,
                       b = 1.0361118521e6,
                       c = -4.5813932360e5)

# case v
robustQuadraticRoots(a = 0, b = 5, c = 0)
Hint
  • for case (ii) you should return an error. Since errors will stop your .qmd from rendering, you should include in your code chunk #| error: true. Read quarto docs here for a full list of code chunk execution options.
  • for case (iii) you need to deal with complex numbers read here
  • for case (iv) the correct answer is \(x_1 = -2.4e16\) and \(x_2 = .44\) when rounded to two significant figures.
  • for case (v), you should use warning() to warn the user that both a and c are 0.

In each case above, describe why quadraticRoots() from the lecture fails.

2. Stirling numbers of the second kind

Stirling numbers of the second kind count the number of possible partitions of a set of \(n\) objects into \(k\) disjoint blocks. For example, the number of ways to partition a set of 3 objects in 2 disjoint blocks is \(S(3, 2) = 3\) because the set \(\{1, 2, 3\}\) can be partitioned into two disjoint blocks in 3 ways:

\[ \begin{aligned} &\{1, 2\} \cup \{3\},\\ &\{1, 3\} \cup \{2\},\\ &\{2, 3\} \cup \{1\} \end{aligned} \]

We can generate Stirling numbers of the second kind, \(S(n,k)\), recursively:

\[ S(n,k) = S(n-1, k-1) + kS(n-1, k) \]

where we start the sequence with \(S(n, 1) = 1\) and the boundary condition \(S(n, k) = 0\) for \(k>n\).

Create a function ssk() with arguments n and k that computes Stirling numbers of the second kind. Print the result of your function for the following inputs:

# case i
ssk(n = 3, k = 2)

# case ii
ssk(n = 10, k = 1)

# case iii
ssk(n = 10, k = 10)

# case iv
ssk(n = 15, k = 8)

# case v
ssk(n = 6, k = 7)

3. Pareto distribution

Introduction

R provides functions that return useful characteristics of many common probability distributions. The naming convention for these functions is a prefix, which identifies what the function does, followed by an abbreviation of the probability distribution’s name. These prefixes are:

  • p for “probability”, the cumulative distribution function (CDF)
  • q for “quantile”, the inverse CDF
  • d for “density”, the density function (PDF)
  • r for “random”, to sample a random variable having the specified distribution.

For the normal distribution, these functions are pnorm, qnorm, dnorm, and rnorm, where the norm portion reminds us this is for the normal distribution. For the binomial distribution, these functions are pbinom, qbinom, dbinom, and rbinom. Click here for a list of probability distributions in the Base R package.

The Pareto distribution is not available in base R, so we’re going to code it ourselves. For this lab, we’ll just code the quantile function, i.e., qpareto(). Here’s a bit of background on deriving the Pareto’s quantile function.

The Pareto family of distributions is parameterized by \(\alpha\) and \(x_0\), and has probability density function

\[ f(x) = \begin{cases} \frac{(\alpha - 1)x_0^{\alpha - 1}}{x^{\alpha}}, &x > x_0,\\ 0, &x \leq x_0. \end{cases} \]

From the PDF it is relatively easy to compute the CDF, which is given by

\[ F(x) = \begin{cases} 0 & x < x_0\\ 1 - \left(\frac{x_0}{x} \right)^{\alpha - 1} & x \geq x_0. \end{cases} \]

The quantile function is defined for \(0 \le p \le 1\), and it returns the value \(x_p\) such that \(F(x_p) = p\). For the Pareto distribution, the quantile function is given by

\[ Q(p) = Q(p, \alpha, x_0) = {x_0}{(1-p)^{-\frac{1}{\alpha - 1}}}. \]

Using the definition of \(Q(p)\), we can compute the \(p\)th quantile for specific values of \(p\) manually. For example, to get the median (\(p = 0.5\)) of Pareto distributions with \(x_0 = 1, \alpha = 3.5\):

1 * (1 - 0.5) ^ (-1/(3.5 - 1))

It would be helpful to have a function that automated this process, both so we don’t have to remember the form of the quantile function for the Pareto distribution, and so we avoid making mistakes.

We will build our function, qpareto(), in a sequence of steps.

Step 1 qpareto_1

Write a function called qpareto_1() that takes arguments p, alpha, and x0 and returns \(Q(p, \alpha, x_0)\) as defined above.

Hint

check the result of your function manually (as in the example above) to ensure you have the correct answer.

qpareto_1(p = 0.5, alpha = 3.5, x0 = 1)
qpareto_1(p = 0.5, alpha = 2.34, x0 = 6e8)
qpareto_1(p = 0.92, alpha = 2.5, x0 = 1e6)

Step 2

qpareto_2()

Most of the quantile functions in R have an argument lower.tail that is either TRUE or FALSE. If TRUE, the function returns the \(p\)th quantile. If FALSE, the function returns the \((1-p)\)th quantile, i.e., returns the value \(x_p\) such that \(F(x_p) = 1 - p\).

Create a function qpareto_2() that has an additional argument lower.tail which is by default set to TRUE. Your qpareto_2 function should test whether lower.tail is FALSE. If it is FALSE, the function should replace \(p\) by \(1-p\). Then pass either \(p\) or \(1-p\) to qpareto_1() to compute the appropriate quantile, i.e., qpareto_1() is called from inside of qpareto_2(). Test your function with the two function calls below.

qpareto_2(p = 0.5, alpha = 3.5, x0 = 1)
qpareto_2(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)

There is a downside to writing the function the way we have. We need qpareto_1() to be in the work space when qpareto_2() is called, but there is a big advantage. If we discover a better way to calculate quantiles of the Pareto distribution, we can rewrite qpareto_1() and the new version will automatically be used in qpareto_2().

Step 3

qpareto()

Next, let’s add some check with regards to the function’s arguments. In the case of the Pareto quantile function, we need \(0\leq p\leq 1\), \(\alpha > 1\), and \(x_0 > 0\).

Write a function named qpareto() that adds these checks to your code from function qpareto_2().

Quarto markdown will not compile if your R function stops due to the stopifnot() function. Include #| error: true to allow your document to render. See the hint to exercise 1 for more details.

Test your function on the five function calls below.

qpareto(p = 0.5, alpha = 3.5, x0 = 1)
qpareto(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
qpareto(p = 1.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
qpareto(p = 0.5, alpha = 0.5, x0 = -4)
qpareto(p = 0.5, alpha = 2, x0 = -4)

Is your function vectorized? For which parameters is it vectorized?

Style guidelines

All assignments in this course must employ proper coding style, as outlined below:

  • All code should obey the 80 character limit per line (i.e. no code should run off the page when rendering or require scrolling). To enable a vertical line in the RStudio IDE that helps guide this, see the style guidelines from lab 0 or ask a member of the teaching team for help.

  • All commas should be followed by a space.

  • All binary operators should be surrounded by space. For example x + y is appropriate. x+y is not.

  • All pipes %>% or |> as well as ggplot layers + should be followed by a new line.

  • You should be consistent with stylistic choices, e.g. only use 1 of = vs <- and %>% vs |>

  • Your name should be at the top (in the YAML) of each document under “author:”

  • All code chunks should be named (with names that don’t have spaces, e.g. ex-1, ex-2 etc.)

  • File names in your GitHub repo such as lab-x.qmd must not be changed and left as provided. Additionally, your repo must pass certain basic checks. The results of these checks are visible on GitHub via the badges at the top of your README and the actions tab. These are meant to give you feedback around the structure and reproducibility of your repository and assignment - they do not assess the correctness of your work. You should consider them a necessary but not sufficient condition when turning in your work - passing all of the checks simply means your have met a minimum standard of reproducibility for the assignment.

Fundamentally, the check is making sure 1) you only have the files you should in your repository, 2) your .qmd renders.

If you have any questions about style, please ask a member of the teaching team.

Submitting your lab

To submit your assignment, simply commit and push your completed lab-x.qmd to your GitHub repo. Your most recent commit 48 hours after the assignment deadline will be graded, and any applicable late penalty will be applied (see the syllabus). For this reason, do not push commits after you are satisfied with your work, or a late penalty will be applied.