STA 3100: Programming with Data
Class Number (Section): 26313 (3DTA) |
---|
Meets: MWF 3:00 – 3:50 PM (Period 8) |
Location: FLO 100 |
Web: https://ufl.instructure.com/courses/498971 |
Instructor: Dr. Brett Presnell |
Teaching Assistant: Dipshi Roychowdhury |
Dr. Brett Presnell | Dipshi Roychowdhury |
---|---|
Email: presnell@ufl.edu | Email: droychowdhury@ufl.edu |
Web: https://www.stat.ufl.edu/~presnell/ | Office: FLO 117D |
Office: FLO 225 | Virtual Office: Zoom 658 792 6980 |
Virtual Office: Zoom 940 1233 3509 | Office Hrs: Tue 2-3:30 (in person) |
Office Hrs: MW 4-5 PM | Fri 1-2:30 (online) |
An introduction to statistical computing and programming with data. Topics include basic programming in R; data types and data structures in R; importing and cleaning data; specifying statistical models in R; statistical graphics; statistical simulation using pseudo-random numbers; reproducible research and the documentation of statisical analyses.
STA 3032 (B-) or STA 2023 (B) or AP Statistics (4).
You will learn to do the following:
Import data into R and prepare the data for analysis.
Write functions in R making effective use of data structures and control structures.
Formulate statistical models in the R language.
Perform, document, and interpret common statistical analyses.
Carry out statistical/probabalistic simulations.
Determine statistical graphics appropriate to a statistical analysis and produce them using R.
Document and report the results of data analyses and simulations in a reproducible way.
We will use a variety of on-line texts and other resources. Class notes and other materials will be made available on the course website. Most readings will be taken from the following (free, on-line) texts, which students are encouraged to peruse on their own:
r4ds2e : R for Data Science (2e): Visualize, Model, Transform, Tidy, and Import Data
rp4ds : R Programming for Data Science
hopr : Hands-On Programming with R : Write Your Own Functions and Simulations
advr : Advanced R (2nd Ed)
rgraphics : R Graphics Cookbook, 2nd edition
There will be regular online quizzes to help you refine your knowledge and understanding of the course material. Homework assignments and projects will put this knowledge to use. These will be weighted in the final course average (percentage) as follows:
Letter grades in the course will be determined from the final course average according to the following scale (after rounding to the nearest integer):
A | A- | B+ | B | B- | C+ | C | D | E |
---|---|---|---|---|---|---|---|---|
94-100 | 90-93 | 87-89 | 84-86 | 80-83 | 77-79 | 67-76 | 60-66 | 0-59 |
Further information may be found in the university’s grades and grading policies.
Homework and projects must be submitted on time, and it is the student’s reponsibility to allocate sufficient time to complete each assignment by the due date.
Late assignments will be accepted in cases of documented emergency or illness, but you must inform the instructor in advance of any illness which may lead to a late submission.
In all other cases, acceptance of late assignments will be at the discretion of the instructor. Scores on late submissions which are accepted will be reduced by 10% plus an additional 5% for each additional day between the due date and the time of submission.
Nota bene, it is the student’s responsibility to correctly submit their work for every assignment, so always double check that you have submitted the correct file(s) for each assignment. Similarly, losing the internet connection in your residence at the last minute is not an acceptable excuse for a late submission. (If you insist on submitting your assigments at the last hour, then be sure that you know how to use your mobile phone as a WIFI hotspot.)
If you have not submitted the correct file(s) by the due date, then any subsequent submission will be treated as a late submission.
If you feel that an error has been made in grading an assignment, please first contact your TA during their office hours or by email. If, after consulting with the TA, you still feel that your assignment has been graded incorrectly, you may submit a written (typed, not handwritten) appeal to the instructor detailing precisely how your assignment was misgraded.
Students will be held accountable to the UF Honor Code.
Unless otherwise specified in writing by the instructor, students are expected to work independently or in assigned groups. General discussion of the course material is encouraged, but offering or accepting solutions from others is plagiarism. When in doubt, direct your questions to the instructor or TA.
As in all courses at UF, unauthorized recording and unauthorized sharing of recorded materials by students or any other party is prohibited.
Except in special circumstances, class sessions will not be recorded by the instructor. In case a class session is recorded, students who participate with their camera engaged or who utilize a profile image are agreeing to have their video or image recorded. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voice recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the “chat” feature, which allows students to type questions and comments live. The chat will not be recorded or shared.
Students requesting accommodation for disabilities must register with UF’s Disability Resource Center. The DRC will provide documentation to the students who must then provide this documentation to the instructor when requesting information. You must submit this documentation prior to submitting any assignments or taking any exam or quiz for which you are requesting accommodation.
Students are expected to provide feedback on the quality of instruction in this course by completing course evaluations online via GatorEvals. Guidance on how to give feedback in a professional and respectful manner is available at https://gatorevals.aa.ufl.edu/students/. Students will be notified when the evaluation period opens, and can complete evaluations through the email they receive from GatorEvals, in their Canvas course menu under GatorEvals, or via https://ufl.bluera.com/ufl/. Summaries of course evaluation results are available to students at https://gatorevals.aa.ufl.edu/public-results/.
This is an aspirational schedule for the course. It may be altered or rearranged to adapt to the backgrounds, abilities, and interests of the students in the class. There are 43 scheduled class meetings.
Getting started
Vectors and Vectorized Operations
Distributions and Descriptive Statistics
Writing Your Own Functions
Data frames (and tibbles)
Importing and Exporting Data
Column and row operations on data frames
Pipes and more operations on data frames
Joining/Merging Data Frames
Dates and times in base R
The lubridate package
Character strings and the stringr package
String matching with regular expressions
Detecting string matches
Extracting string matches
String replacement and string splitting
An extended example with character strings
Factors in base R
The forcats package
Elementary statistical inference
Simple linear regression
Multiple regression
Factors and dummy variables in regression
Interactions
Simple logistic regression
Multiple logistic regression
More graphics in R
Links to slides and reading assignments for each lecture will be added here throughout the semester.
+
, -
, *
,
/
, ^
, %/%
, %%
,
c
, <-
, length
,
abs
, round
, log
,
log10
, log2
, sin
,
asin
, factorial
, choose
,
lfactorial
, exp
, gamma
,
incr
, sqrt
, function
print
, writeLines
,
is.double
, is.integer
,
is.character
, is.logical
,
is.list
, get
, typeof
,
as.double
, sum
, mean
,
>
, paste
, as.character
,
as.logical
, rnorm
:
, seq
,
rep
[
, names
,
sample
, sort
dnorm
, pnorm
,
qnorm
, set.seed
summary
, median
,
min
, max
, range
,
quantile
, sd
, IQR
,
t.test
, ecdf
, qt
<
, <=
,
>=
, ==
, !=
, !
,
|
, &
, xor
,
table
, all
, any
,
runif
matrix
, is.matrix
,
attributes
, dim
, dimnames
,
crossprod
, %*%
, apply
,
t
, diag
, solve
,
cbind
, rbind
, attr
,
colnames
, list
, rownames
,
drop
[[
, $
, str
,
lm
, as.Date
getS3method
, methods
sample.int
, %in%
,
for
, noquote
, replicate
,
rle
, vector
, unclass
,
rmax_run_len
apply()
and anonymous
functions.system.time
, fmaxrl
,
if
, is.null
binom.test
, prop.test
,
simtosses
, seq_along
, curve
,
repeat
, invisible
, readline
,
as.numeric
, break
, paste0
,
dt
data.frame
, factor
library
, tibble
,
as_tibble
, I
subset
, order
,
head
, tail
, tapply
,
with
, aggregate
, transform
,
union
, setdiff
filter
, select
,
arrange
, slice
, slice_head
,
slice_tail
, slice_max
,
slice_sample
, summarise
,
group_vars
, is.array
, summarize
,
mutate
, pivot_wider
, rename
,
relocate
, desc
, group_by
,
n
, ungroup
, xtabs
merge
, inner_join
,
anti_join
, full_join
, left_join
,
right_join
, tribble
,
as.data.frame
, ifelse
, join_by
,
case_match
cat
, glimpse
,
write_csv
, write_rds
, identical
,
readLines
, read.csv
, read_csv
,
read.table
, read_excel
, fill
,
read_sheet
, read_rds
, which
problems
, spec
,
cols_condense
, cols
, hour
,
minute
, second
, cols_only
,
col_character
, pick
, col_integer
,
col_datetime
, col_double
, vapply
,
character
ISOdate
, Sys.timezone
,
as.POSIXct
, difftime
,
as.difftime
, is.numeric
,
Sys.Date
, Sys.time
,
as.POSIXlt
as_date
, mdy
,
dmy
, make_date
, pull
,
ymd_hm
, strftime
, year
,
quarter
, month
, day
,
wday
, mday
, qday
,
yday
, is.factor
, is.ordered
,
dyears
, years
, dhours
,
hours
, isS4
, int_overlaps
,
intersect
, today
, now
,
ymd_hms
, interval
, ymd_h
,
%--%
, int_shift
, int_end
,
str_replace
, days
, str_c
,
weeks
separate_wider_regex
,
rows_update
, str_replace_all
,
dminutes
, coalesce
Sys.getlocale
, locale
,
distinct
, read_tsv
, pivot_longer
,
janitor::clean_names
, rename_with
str_length
, str_flatten
,
str_sub
, rev
str_view
, str_view_all
,
rphone
str_detect
, str_which
,
str_subset
, str_count
, slice_min
,
str_extract
, str_extract_all
,
str_match
, str_match_all
,
str_split
, regex
, boundary
,
list_c
, str_to_lower
separate_wider_position
,
unique
, apropos
, semi_join
,
separate_wider_delim
, read_lines
,
unnest_longer
, str_split_i
,
as.integer
, map_int
levels
, options
,
count
, fct_rev
, saveRDS
,
relevel
, ordered
, cumsum
,
optim
, cut
, qweibull
,
labs
, rweibull
, as_factor
,
fct_collapse
, fct_lump_min
,
fct_lump_prop
, fct_lump_n
,
fct_infreq
, fct_lump_lowfreq
,
fct_relevel
, ggplot
,
geom_function
source
,
knitr::include_graphics
, coef
,
confint
, anova
, =
,
readRDS
, geom_histogram
,
geom_density
, formatC
, update
,
scale_colour_viridis_d
, expand_grid
,
predict
, str_to_upper
, aes
,
nrow
, fmt
, geom_line
,
geom_point
, bind_cols
contr.treatment
,
contr.poly
, glm