Class Number (Section): 26313 (3DTA)
Meets: MWF 3:00 – 3:50 PM (Period 8)
Location: FLO 100
Web: https://ufl.instructure.com/courses/498971
Instructor: Dr. Brett Presnell
Teaching Assistant: Dipshi Roychowdhury

Contact Information

Dr. Brett Presnell Dipshi Roychowdhury
Email: Email:
Web: https://www.stat.ufl.edu/~presnell/ Office: FLO 117D
Office: FLO 225 Virtual Office: Zoom 658 792 6980
Virtual Office: Zoom 940 1233 3509 Office Hrs: Tue 2-3:30 (in person)
Office Hrs: MW 4-5 PM Fri 1-2:30 (online)

Course Description

An introduction to statistical computing and programming with data. Topics include basic programming in R; data types and data structures in R; importing and cleaning data; specifying statistical models in R; statistical graphics; statistical simulation using pseudo-random numbers; reproducible research and the documentation of statisical analyses.

Prerequisites

STA 3032 (B-) or STA 2023 (B) or AP Statistics (4).

Course Objectives

You will learn to do the following:

  1. Import data into R and prepare the data for analysis.

  2. Write functions in R making effective use of data structures and control structures.

  3. Formulate statistical models in the R language.

  4. Perform, document, and interpret common statistical analyses.

  5. Carry out statistical/probabalistic simulations.

  6. Determine statistical graphics appropriate to a statistical analysis and produce them using R.

  7. Document and report the results of data analyses and simulations in a reproducible way.

Text Books and Other Source Materials

We will use a variety of on-line texts and other resources. Class notes and other materials will be made available on the course website. Most readings will be taken from the following (free, on-line) texts, which students are encouraged to peruse on their own:

Bibliography

Chang, Winston. 2018. R Graphics Cookbook: Practical Recipes for Visualizing Data. 2nd ed. Sebastopol, California: OReilly Media, Inc. https://r-graphics.org/.
Grolemund, Garrett. 2014. Hands-on Programming with R: Write Your Own Functions and Simulations. Sebastopol, CA: OReilly Media, Inc. https://rstudio-education.github.io/hopr/.
Healy, Kieran. 2018. Data Visualization: A Practical Introduction. Princeton University Press. https://socviz.co/.
Peng, Roger D. 2016. R Programming for Data Science. 5+ ed. Lulu.com. https://bookdown.org/rdpeng/rprogdatascience/.
Wickham, Hadley. 2019. Advanced R. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://adv-r.hadley.nz/.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, California: OReilly Media, Inc. https://r4ds.hadley.nz/.
Wickham, Hadley, Danielle Navarro, and Thomas Lin Pedersen. 2022. Ggplot2: Elegant Graphics for Data Analysis. 3rd ed. Springer. https://ggplot2-book.org/.

Course Policies

Grading

There will be regular online quizzes to help you refine your knowledge and understanding of the course material. Homework assignments and projects will put this knowledge to use. These will be weighted in the final course average (percentage) as follows:

  • 80% Homework/Projects
  • 20% Quizzes

Letter grades in the course will be determined from the final course average according to the following scale (after rounding to the nearest integer):

A A- B+ B B- C+ C D E
94-100 90-93 87-89 84-86 80-83 77-79 67-76 60-66 0-59

Further information may be found in the university’s grades and grading policies.

Late Submissions

Homework and projects must be submitted on time, and it is the student’s reponsibility to allocate sufficient time to complete each assignment by the due date.

Late assignments will be accepted in cases of documented emergency or illness, but you must inform the instructor in advance of any illness which may lead to a late submission.

In all other cases, acceptance of late assignments will be at the discretion of the instructor. Scores on late submissions which are accepted will be reduced by 10% plus an additional 5% for each additional day between the due date and the time of submission.

Nota bene, it is the student’s responsibility to correctly submit their work for every assignment, so always double check that you have submitted the correct file(s) for each assignment. Similarly, losing the internet connection in your residence at the last minute is not an acceptable excuse for a late submission. (If you insist on submitting your assigments at the last hour, then be sure that you know how to use your mobile phone as a WIFI hotspot.)

If you have not submitted the correct file(s) by the due date, then any subsequent submission will be treated as a late submission.

Grade Appeals

If you feel that an error has been made in grading an assignment, please first contact your TA during their office hours or by email. If, after consulting with the TA, you still feel that your assignment has been graded incorrectly, you may submit a written (typed, not handwritten) appeal to the instructor detailing precisely how your assignment was misgraded.

Academic Misconduct

Students will be held accountable to the UF Honor Code.

Unless otherwise specified in writing by the instructor, students are expected to work independently or in assigned groups. General discussion of the course material is encouraged, but offering or accepting solutions from others is plagiarism. When in doubt, direct your questions to the instructor or TA.

Recording of Class Sessions

As in all courses at UF, unauthorized recording and unauthorized sharing of recorded materials by students or any other party is prohibited.

Except in special circumstances, class sessions will not be recorded by the instructor. In case a class session is recorded, students who participate with their camera engaged or who utilize a profile image are agreeing to have their video or image recorded. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voice recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the “chat” feature, which allows students to type questions and comments live. The chat will not be recorded or shared.

Accommodations for Students with Disabilities

Students requesting accommodation for disabilities must register with UF’s Disability Resource Center. The DRC will provide documentation to the students who must then provide this documentation to the instructor when requesting information. You must submit this documentation prior to submitting any assignments or taking any exam or quiz for which you are requesting accommodation.

Course Evaluations

Students are expected to provide feedback on the quality of instruction in this course by completing course evaluations online via GatorEvals. Guidance on how to give feedback in a professional and respectful manner is available at https://gatorevals.aa.ufl.edu/students/. Students will be notified when the evaluation period opens, and can complete evaluations through the email they receive from GatorEvals, in their Canvas course menu under GatorEvals, or via https://ufl.bluera.com/ufl/. Summaries of course evaluation results are available to students at https://gatorevals.aa.ufl.edu/public-results/.

Class Schedule

Tentative Outline

This is an aspirational schedule for the course. It may be altered or rearranged to adapt to the backgrounds, abilities, and interests of the students in the class. There are 43 scheduled class meetings.

Week 1 (Jan 8 – Jan 12)

  • Getting started

  • Vectors and Vectorized Operations

  • Introduction to R Markdown

Week 2 (Jan 17 – Jan 19)

  • Distributions and Descriptive Statistics

  • Writing Your Own Functions

Week 3 (Jan 22 – Jan 26)

  • Matrices and Arrays
  • Lists

Week 4 (Jan 29 – Feb 2)

  • Data frames (and tibbles)

  • Importing and Exporting Data

Week 5 (Feb 5 – Feb 9)

  • Column and row operations on data frames

  • Pipes and more operations on data frames

  • Joining/Merging Data Frames

Week 6 (Feb 12 – Feb 16)

  • Dates and times in base R

  • The lubridate package

Week 7 (Feb 19 – Feb 23)

  • Tidy Data and Pivoting
  • Character strings and the stringr package

  • String matching with regular expressions

Week 8 (Feb 26 – Mar 1)

  • Detecting string matches

  • Extracting string matches

Week 9 (Mar 4 – Mar 8)

  • String replacement and string splitting

  • An extended example with character strings

  • Introduction to Data Scraping

Week 10 (Mar 18 – Mar 22)

  • Factors in base R

  • The forcats package

Week 11 (Mar 25 – Mar 29)

  • Elementary statistical inference

  • Simple linear regression

  • Multiple regression

Week 12 (Apr 1 – Apr 5)

  • Factors and dummy variables in regression

  • Interactions

  • Simple logistic regression

Week 13 (Apr 8 – Apr 12)

  • Multiple logistic regression

  • More graphics in R

Week 14 (Apr 15 – Apr 19)

  • Working with lists: the purrr package

Week 15 (Apr 22 – Apr 24)

  • More on data scraping and/or simulation.

The Daily Record

Links to slides and reading assignments for each lecture will be added here throughout the semester.

  • Day 8 (Fri, Jan 26)
    • Matrices (PDF) (R code)
      • New R functions: matrix, is.matrix, attributes, dim, dimnames, crossprod, %*%, apply, t, diag, solve, cbind, rbind, attr, colnames, list, rownames, drop
    • Readings
  • Day 9 (Mon, Jan 29)
    • Finish Matrices.
  • Day 11 (Fri, Feb 2)
  • Day 15 (Mon, Feb 12):
    • Discussion and practice with apply() and anonymous functions.
    • Finish Random Numbers and Simulation.
  • Day 17 (Fri, Feb 16):
    • Evaluating Simulations (PDF) (R code)
      • New R functions: binom.test, prop.test, simtosses, seq_along, curve, repeat, invisible, readline, as.numeric, break, paste0, dt
  • Day 25 (Wed, Mar 6):
    • Finish “Importing and Exporting Data”
  • Day 28 (Wed, Mar 20):
  • Day 32 (Fri, Mar 29):
    • Finish Regular Expressions
    • In-class regular expression exercises
  • Day 34 (Wed, Apr 3):
  • Day 35 (Fri, Apr 5):
    • Finish Meta String
    • Work on web scraping example.
  • Day 36 (Mon, Apr 8):
    • Factors (PDF) (R code)
      • Data: hcv.csv
      • New R functions: levels, options, count, fct_rev, saveRDS, relevel, ordered, cumsum, optim, cut, qweibull, labs, rweibull, as_factor, fct_collapse, fct_lump_min, fct_lump_prop, fct_lump_n, fct_infreq, fct_lump_lowfreq, fct_relevel, ggplot, geom_function
    • Readings
  • Day 37 (Wed, Apr 10):
    • Continue Factors
  • Day 38 (Fri, Apr 12):
    • Finish Factors
    • Linear Regression (PDF) (R code)
      • Data: houseSalesGNV2020.rds
      • New R functions: source, knitr::include_graphics, coef, confint, anova, =, readRDS, geom_histogram, geom_density, formatC, update, scale_colour_viridis_d, expand_grid, predict, str_to_upper, aes, nrow, fmt, geom_line, geom_point, bind_cols
  • Day 39 (Mon, Apr 15):
    • Continue Linear Regression
  • Day 40 (Wed, Apr 17):
    • Continue Linear Regression
  • Day 41 (Fri, Apr 19):
    • Finish Linear Regression
    • Discuss Assignment 080
  • Day 43 (Wed, Apr 24):
    • Finish Logistic Regression