Exploration

Loading required packages and cleaned data.

#load required packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
here() starts at C:/Users/Katie/Documents/2022-2023/MADA/katiewells-MADA-portfolio
#load data
flu2 <- readRDS(here("fluanalysis", "data", "flu2.rds"))

Summary statistics for BodyTemp and Nausea

#provide summary data for important variables
flu2 %>% pull(Nausea) %>% summary()
 No Yes 
475 255 
flu2 %>% pull(BodyTemp) %>% summary()
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  97.20   98.20   98.50   98.94   99.30  103.10 

Distribution of BodyTemp

#look at the distribution of BodyTemp
flu2 %>% ggplot(aes(x=BodyTemp)) + geom_histogram() 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Looks like of the body temperatures cluster around 98.5 degrees with some right skew.

Lets take a look ath the relationship between BodyTemp and some predictors.

flu2 %>% ggplot(aes(x=Nausea, y=BodyTemp)) + geom_boxplot()

Seems like median body temperature is just slightly higher in those with nausea than without.

flu2 %>% ggplot(aes(x=CoughIntensity, y=BodyTemp)) + geom_boxplot()

Median body temperature looks to increase as cough intensity increases.

flu2 %>% ggplot(aes(x=Sneeze, y=BodyTemp)) + geom_boxplot()

Looks like people who did not report sneezing have a higher median body temperature. Weird.

flu2 %>% ggplot(aes(x=Weakness, y=BodyTemp)) + geom_boxplot()

Median body temperature seems to increase as weakness increases.