Homework

Question 1:

I will use a study where frog eggs of one specific species were taken from a pond and raised in containers with three different temperature groups (Hot = 25°C, Moderate = 19°C, Cool = 13°C). They hypothesized that the eggs of this species hatch fastest in warmer water and slowest in colder water. The study was interested in whether there were differences in hatching time (days) across the three different temperature treatments.

Question 2:

Here I specify reasonable sample sizes, means, and standard deviations for each temperature treatment if their hypothesis was true. The sample size of 20 was chosen so that they had enough replicates for each treatment. The means and standard deviations were chosen based on the expected result that the eggs hatch fastest in warmer water.

Hot: Sample Size = 20 eggs, Mean = 16 days, Standard Deviation = 1.25 days
Moderate: Sample Size = 20 eggs, Mean = 21 days, Standard Deviation = 1.1 days
Cool: Sample Size = 20 eggs, Mean = 26 days, Standard Deviation = 1.1 days

Question 3:

nGroup <- 3 # number of groups
nName <- c("Hot","Moderate", "Cool") # names of groups
nSize <- c(20,20,20) # number of observations in each group
nMean <- c(16,21,26) # mean of each group
nSD <- c(1.25,1.1,1.1) # standard deviation of each group

ID <- 1:(sum(nSize)) # create id vector for each row
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
            rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
            rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchData

##    ID Temperature Hatch_time
## 1   1         Hot   13.58011
## 2   2         Hot   16.99417
## 3   3         Hot   15.95344
## 4   4         Hot   17.61499
## 5   5         Hot   13.95419
## 6   6         Hot   17.48118
## 7   7         Hot   16.64046
## 8   8         Hot   18.06704
## 9   9         Hot   15.00726
## 10 10         Hot   15.82201
## 11 11         Hot   15.97374
## 12 12         Hot   16.89383
## 13 13         Hot   14.41048
## 14 14         Hot   14.79469
## 15 15         Hot   14.33632
## 16 16         Hot   14.03215
## 17 17         Hot   17.42459
## 18 18         Hot   17.18788
## 19 19         Hot   17.06558
## 20 20         Hot   16.67917
## 21 21    Moderate   23.14714
## 22 22    Moderate   22.03043
## 23 23    Moderate   20.03120
## 24 24    Moderate   21.30403
## 25 25    Moderate   22.62368
## 26 26    Moderate   21.22625
## 27 27    Moderate   21.89163
## 28 28    Moderate   20.72427
## 29 29    Moderate   21.95276
## 30 30    Moderate   21.86992
## 31 31    Moderate   20.05354
## 32 32    Moderate   21.55825
## 33 33    Moderate   21.23760
## 34 34    Moderate   21.29300
## 35 35    Moderate   21.23048
## 36 36    Moderate   20.73168
## 37 37    Moderate   19.81526
## 38 38    Moderate   20.36147
## 39 39    Moderate   22.20317
## 40 40    Moderate   22.44482
## 41 41        Cool   27.20292
## 42 42        Cool   24.18945
## 43 43        Cool   25.91652
## 44 44        Cool   25.94768
## 45 45        Cool   24.24685
## 46 46        Cool   28.30268
## 47 47        Cool   25.15520
## 48 48        Cool   25.31582
## 49 49        Cool   26.41716
## 50 50        Cool   27.59574
## 51 51        Cool   24.29964
## 52 52        Cool   25.32316
## 53 53        Cool   25.07848
## 54 54        Cool   25.70093
## 55 55        Cool   25.12378
## 56 56        Cool   25.56388
## 57 57        Cool   27.05987
## 58 58        Cool   27.08247
## 59 59        Cool   24.04286
## 60 60        Cool   23.73452

Question 4:

EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
print(EggHatchModel)

## Call:
##    aov(formula = Hatch_time ~ Temperature, data = EggHatchData)
## 
## Terms:
##                 Temperature Residuals
## Sum of Squares     939.0816   85.0212
## Deg. of Freedom           2        57
## 
## Residual standard error: 1.221311
## Estimated effects may be unbalanced

summary(EggHatchModel)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## Temperature  2  939.1   469.5   314.8 <2e-16 ***
## Residuals   57   85.0     1.5                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])

## $Fval
## F value1 
## 314.7901 
## 
## $probF
##      Pr(>F)1 
## 1.573007e-31

ggplot(data=EggHatchData,aes(x=Temperature,y=Hatch_time,fill=Temperature)) + geom_boxplot() + scale_fill_manual(values= c("blue","red","green"))

Here we find that there is a significant difference in hatching times between temperature groups.

Question 5:

First run: F = 467.53, p = 4.377084e-36
Second run: F = 398.39, p = 3.156156e-34
Third run: F = 344.89, p = 1.434916e-32

After rerunning my analysis several times with different random numbers, we can see that we consistently get a very large F-value and a very small p-value, indicating that there is quite a significant difference regardless of the input data. We do see some variation within both values across the different runs, however we consistently come up with the same conclusion: there is a significant difference in hatching times between temperature groups.

Question 6

Now I will begin adjusting the means to see how small the differences between the groups can be (the “effect size”) for me to still detect a significant difference. I will do this by making the means for each group closer and closer together, until we no longer get a significant result.

Hot: mean = 18 days
Moderate: mean = 21 days
Cool: mean = 23 days

nGroup <- 3
nName <- c("Hot","Moderate", "Cool")
nSize <- c(20,20,20)
nMean <- c(18,21,23) # changed the means
nSD <- c(1.25,1.1,1.1)

ID <- 1:(sum(nSize))
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
            rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
            rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])

## $Fval
## F value1 
## 142.0062 
## 
## $probF
##      Pr(>F)1 
## 7.219152e-23

After adjusting the means once and rerunning my analysis several times, I still consistently find a significant difference, however not as large of a difference. Let’s try adjusting the means again:

Hot: mean = 20.5 days
Moderate: mean = 21 days
Cool: mean = 21.5 days

nGroup <- 3
nName <- c("Hot","Moderate", "Cool")
nSize <- c(20,20,20)
nMean <- c(20.5,21,21.5) # changed the means again
nSD <- c(1.25,1.1,1.1)

ID <- 1:(sum(nSize))
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
            rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
            rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])

## $Fval
## F value1 
## 1.501956 
## 
## $probF
##   Pr(>F)1 
## 0.2313747

My original means were:

Hot: Mean = 16 days, Moderate: Mean = 21 days, Cool: Mean = 26 days

My final adjusted means: Hot: mean = 20.5 days, Moderate: mean = 21 days, Cool: mean = 21.5 days

After changing the means several times, changing them to where there is only a 0.5 day difference between them seems to be the smallest difference (the “effect size”) where I still consistently detect a significant difference (p < 0.05). It does result in the occasional insignificant result, however most of the time it produces a significant p-value.

Question 7:

Now, using my original means, I will adjust the sample size to find the minimum sample size that I need to still detect a significant difference.

nGroup <- 3 # number of groups
nName <- c("Hot","Moderate", "Cool") # names of groups
nSize <- c(2,2,2) # number of observations in each group
nMean <- c(16,21,26) # mean of each group
nSD <- c(1.25,1.1,1.1) # standard deviation of each group

ID <- 1:(sum(nSize)) # create id vector for each row
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
            rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
            rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])

## $Fval
## F value1 
## 216.6112 
## 
## $probF
##      Pr(>F)1 
## 0.0005703216

After changing the sample sizes several times, reducing my sample sizes all the way down to 2 for each group still results in significant differences. As I decreased the sample size, my p-value increased until it became closer and closer to 0.05. This indicates that this study did not need such a large sample size to find a significant difference, however large sample sizes can often increase the robustness of statistical tests, as it does in this instance.

Question 8

See text in above sections.

Homework_7

Kyle Grasso

2024-02-28