Dr. Alexander Fisher
Duke University
January 13, 2023
Broadly, it’s turning data into knowledge using the computer.
Scrape data off the web
Interact with databases
Extract useful parts of massive datasets in the blink of an eye using regular expressions
Optimize code in R
Model data with complicated likelihood functions and then write algorithms to maximize the likelihood
Build shiny web apps
{fig.align = “center”}
By the end of this course you will be able to…
write efficient R code to (1) wrangle, explore and analyze data, (2) program algorithms to make inference under a variety of data generative models
conduct independent data analysis and subsequently write and present results effectively
Assignment | Description |
---|---|
Labs (45%) | Biweekly lab assignments. |
Exams (35%) | Two take-home open-notes exams. |
Final Project (15%) | Written report and presentation. |
Quizzes (5%) | In-class pop quizzes. |
Uphold the Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised.
Any violations in academic honesty standards as outlined in the Duke Community Standard and those specific to this course will automatically result in a 0 for the assignment and will be reported to the Office of Student Conduct for further action.
The final project and several labs will be completed in teams. All group members are expected to participate equally. Commit history may be used to give individual team members different grades. Your grade may differ from the rest of your group.
Unless explicitly stated otherwise, this course’s policy is that you may make use of any online resources (e.g. Google, existing StackOverflow answers, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s).
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source.
Narrative (non-code solutions) should always be entirely your own.
Homeworks and labs can be turned in within 48 hours of the deadline for grade penalty (5% off per day).
Exams and the final project cannot be turned in late and can only be excused under exceptional circumstances.
The Duke policy for illness requires a short-term illness report or a letter from the Dean; except in emergencies, all other absenteeism must be approved in advance (e.g., an athlete who must miss class may be excused by prior arrangement for specific days). For emergencies, email notification is needed at the first reasonable time.
Last minute coding/rendering issues will not be granted extensions.
Resource | Description |
---|---|
course website | course notes, deadlines, assignments, office hours, syllabus |
Sakai | class recordings, solutions and announcements |
course organization | assignments, collaboration |
slack | primary communication |
RStudio containers* | online coding platform |
You are welcome to install R and RStudio locally on your computer. If working locally you should make sure that your environment meets the following requirements:
latest R version
latest RStudio
working git installation
ability to create ssh keys (for GitHub authentication)
All R packages updated to their latest version from CRAN
If you have questions about homework/lab exercises, debugging, or any question about course materials
When you miss a class:
Check your email / Sakai announcements for slack invite.
Post on slack
Create a GitHub account (unless you already have one) on https://github.com/
Tell me your username by taking this survey. This is essential to receive credit on future assignments
04:00
The fundamental building block of data in R is a vector (collections of related values, objects, other data structures, etc).
R has two types of vectors:
I will use the term component or element when referring to a value inside a vector.
R has six atomic vector types:
logical, integer, double, character, complex, raw
In this course we will mostly work with the first four. You will rarely work with the last two types - complex and raw.
If you try to combine components of different types into a single atomic vector, R will try to coerce all elements so they can be represented as the simplest type. The ordering is logical
< integer
< double
< character
, where logical
is considered the “simplest”.
Operator | Definition | Vectorized? |
---|---|---|
x | y | or | yes |
x & y | and | yes |
!x | not | yes |
x || y | or | no |
x && y | and | no |
xor(x,y) | exclusive or | yes |
Operator | Definition | Vectorized? |
---|---|---|
x < y | less than | yes |
x <= y | less than or equal to | yes |
x != y | not equal to | yes |
x == y | equal to | yes |
x %in% y | is x contained in y | yes (over x) |
The shorter of two atomic vectors in an operation is recycled until it is the same length as the longer atomic vector.
What do each of the following return? Run the code to check your answer.
Exercise 1.
Exercise 2.
Exercise 3.