Lab 5: Where’s zero?

Due Friday, March 10 5:00pm

By the end of the lab, you will…

Warning

Make sure everyone in your team knows when to stop committing! After 5pm on the due date will result in late penalty applying for the team, even if commit is “accidentally” pushed.

Getting started

  • In the terminal tab, type cd ~ to navigate to your home directory. Next git clone git@github.com:sta323-sp23/lab-5-team_name.git where team_name is replaced with your team name (see the excel signup sheet in box or your github).

  • Navigate to your lab-5 folder and open the .Rproj file.

  • Open the Quarto (.qmd) file, change the author name to your team name followed by a colon and then the names of the team members. Note that to make sure your teammates names all render within the 80 character code limit, you can write

author: |
    | author 1 
    | author 2
    | author 3

Exercises

For all exercises, you should respond in the space provided in the template lab-5.qmd and show all your work. In all answers of type double, three significant figures suffices.

1. Weibull MLE

x = c(1.811328, 0.9210481, 1.753609, 0.1971982, 1.957823, 0.9607706, 0.3611815,
      0.9710981, 1.684854, 1.49811, 0.4599071, 1.18149, 0.04501055, 0.2688005,
      0.9070787, 0.1067991, 0.8979944, 0.1442192, 0.3690044, 1.250377,
      0.7919656, 0.2785386, 1.312061, 1.794021, 1.867813, 1.374013, 0.7282234,
      0.6067082, 1.062187, 0.08889603, 0.5668522, 0.743226, 0.6237324,
      0.2951361, 0.2185725, 0.8271643, 0.4682848, 0.7791434, 0.2655503,
      0.5545753, 0.7419484, 0.1427752, 0.3778655, 1.327472, 0.93939,
      0.9402052, 0.8784793, 0.605618, 0.7390297, 0.7402008)

Assume

\[ x_i \overset{\mathrm{iid}}{\sim} Weibull(\lambda, k) \]

Find the maximum likelihood estimates \(\hat{k}_{MLE}\) and \(\hat{\lambda}_{MLE}\) using Newton’s method. In your solution, code Newton’s method and all derivatives yourself. Do not use a package or library for this. Show all your work (including writing down any relevant math).

Initialize your algorithm with \(k = 2\). Set your stopping tolerance to 0.001 and maximum iterations to 500.

Report your estimates \(\hat{k}_{MLE}\) and \(\hat{\lambda}_{MLE}\).

Hint
  • You only need to perform Newton’s method on the parameter \(k\) here.

2. visualize Weibull gradient w.r.t \(k\)

Following up on the previous exercise, let’s gain insight into when and how Newton’s method can fail.

  • Set your starting \(k = 4\). What value \(k\) converge to? Is this a valid value of \(k\)?

  • To gain insight into this, plot the derivative of the log-likelihood of \(k\) from 0.01 to 5. Why does starting at 4 change the result?

  • Next plot the derivative of the log-likelihood from -5 to -0.01. Be sure to appropriately label your plots, add relevant titles, etc. Combine your plots using patchwork.

  • Finally, edit your Newton-Raphson algorithm to “reflect” negative values, i.e. if the updated iterate is negative, force it to be positive. Re-run your Newton’s method starting at \(k=4\). What does \(k\) converge to? Why?

Hint

Write the gradient with respect to \(k\) as a function of \(k\) alone (as in lecture).

3. zero-inflated Poisson

Zero-inflated models allow for frequent zero-valued observations. You observe 100 people at the beach fishing. The number of fish each individual catches is reported in the data fishing_count below.

fishing_count = c(0, 0, 0, 3, 0, 3, 5, 0, 8, 1,
                  0, 0, 0, 0, 3, 4, 0, 4, 8, 0,
                  0, 0, 4, 0, 0, 6, 4, 0, 4, 0,
                  0, 0, 0, 0, 4, 8, 4, 0, 0, 0,
                  7, 3, 0, 2, 0, 0, 7, 7, 5, 0,
                  0, 0, 7, 6, 0, 3, 0, 0, 0, 5,
                  4, 1, 0, 0, 0, 0, 0, 0, 0, 8,
                  0, 5, 0, 3, 0, 0, 3, 0, 0, 2,
                  0, 5, 0, 7, 7, 0, 1, 0, 2, 7,
                  0, 2, 4, 0, 0, 1, 5, 0, 0, 0)
  • Visualize the distribution of the data using an appropriate visualization. Label axes and describe your plot.

The data can be described as being generated from a zero-inflated mixture model. Let \(Y_i\) be the number of fish an individual catches at the beach,

\[ \begin{aligned} p(Y_i = 0) &= p + (1-p) e^{-\lambda}\\ p(Y_i = y_i) &= \frac{(1 - p)\lambda^{y_i}e^{-\lambda}}{y_i!}, \ \ y_i = 1, 2, 3, \ldots \end{aligned} \]

  • Assume observations \(\{Y_i\}\) are independent. Write down the log-likelihood and visualize the log-likelihood as a function of \(p\) while fixing \(\lambda = 5\). Repeat for \(\lambda = 4\) (on the same plot). Based on your plot, which value of \(\lambda\) is more likely? Also, assuming these \(\lambda\) are sufficiently close to the true \(\lambda\), what is (approximately) the MLE of \(p\) based on your plot?

Using this mixture model, compute the maximum likelihood estimates \(\hat{p}_{MLE}\) and \(\hat{\lambda}_{MLE}\) using Newton’s method. In your solution, code Newton’s method and all derivatives yourself. Do not use a package or library for this. Show all your work (including writing down any relevant math).

Initialize your algorithm with \(\lambda = 2\). Set your stopping tolerance to 0.001 and maximum iterations to 500.

Hint
  • To begin, write the likelihood of the data. This is your objective function that you wish to maximize.

  • You only need to perform Newton’s method on the parameter \(\lambda\) here.

Style guidelines

All assignments in this course must employ proper coding style, as outlined below:

  • All code should obey the 80 character limit per line (i.e. no code should run off the page when rendering or require scrolling). To enable a vertical line in the RStudio IDE that helps guide this, see the style guidelines from lab 0 or ask a member of the teaching team for help.

  • All commas should be followed by a space.

  • All binary operators should be surrounded by space. For example x + y is appropriate. x+y is not.

  • All pipes %>% or |> as well as ggplot layers + should be followed by a new line.

  • You should be consistent with stylistic choices, e.g. only use 1 of = vs <- and %>% vs |>

  • Your name should be at the top (in the YAML) of each document under “author:”

  • All code chunks should be named (with names that don’t have spaces, e.g. ex-1, ex-2 etc.)

  • File names in your GitHub repo such as lab-x.qmd must not be changed and left as provided. Additionally, your repo must pass certain basic checks. The results of these checks are visible on GitHub via the badges at the top of your README and the actions tab. These are meant to give you feedback around the structure and reproducibility of your repository and assignment - they do not assess the correctness of your work. You should consider them a necessary but not sufficient condition when turning in your work - passing all of the checks simply means your have met a minimum standard of reproducibility for the assignment.

Fundamentally, the check is making sure 1) you only have the files you should in your repository, 2) your .qmd renders.

If you have any questions about style, please ask a member of the teaching team.

Submitting your lab

To submit your assignment, simply commit and push your completed lab-x.qmd to your GitHub repo. Your most recent commit 48 hours after the assignment deadline will be graded, and any applicable late penalty will be applied (see the syllabus). For this reason, do not push commits after you are satisfied with your work, or a late penalty will be applied.