I will use a study where frog eggs of one specific species were taken from a pond and raised in containers with three different temperature groups (Hot = 25°C, Moderate = 19°C, Cool = 13°C). They hypothesized that the eggs of this species hatch fastest in warmer water and slowest in colder water. The study was interested in whether there were differences in hatching time (days) across the three different temperature treatments.
Here I specify reasonable sample sizes, means, and standard deviations for each temperature treatment if their hypothesis was true. The sample size of 20 was chosen so that they had enough replicates for each treatment. The means and standard deviations were chosen based on the expected result that the eggs hatch fastest in warmer water.
Hot: Sample Size = 20 eggs, Mean = 16 days, Standard Deviation = 1.25
days
Moderate: Sample Size = 20 eggs, Mean = 21 days, Standard Deviation =
1.1 days
Cool: Sample Size = 20 eggs, Mean = 26 days, Standard Deviation = 1.1
days
nGroup <- 3 # number of groups
nName <- c("Hot","Moderate", "Cool") # names of groups
nSize <- c(20,20,20) # number of observations in each group
nMean <- c(16,21,26) # mean of each group
nSD <- c(1.25,1.1,1.1) # standard deviation of each group
ID <- 1:(sum(nSize)) # create id vector for each row
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchData
## ID Temperature Hatch_time
## 1 1 Hot 13.58011
## 2 2 Hot 16.99417
## 3 3 Hot 15.95344
## 4 4 Hot 17.61499
## 5 5 Hot 13.95419
## 6 6 Hot 17.48118
## 7 7 Hot 16.64046
## 8 8 Hot 18.06704
## 9 9 Hot 15.00726
## 10 10 Hot 15.82201
## 11 11 Hot 15.97374
## 12 12 Hot 16.89383
## 13 13 Hot 14.41048
## 14 14 Hot 14.79469
## 15 15 Hot 14.33632
## 16 16 Hot 14.03215
## 17 17 Hot 17.42459
## 18 18 Hot 17.18788
## 19 19 Hot 17.06558
## 20 20 Hot 16.67917
## 21 21 Moderate 23.14714
## 22 22 Moderate 22.03043
## 23 23 Moderate 20.03120
## 24 24 Moderate 21.30403
## 25 25 Moderate 22.62368
## 26 26 Moderate 21.22625
## 27 27 Moderate 21.89163
## 28 28 Moderate 20.72427
## 29 29 Moderate 21.95276
## 30 30 Moderate 21.86992
## 31 31 Moderate 20.05354
## 32 32 Moderate 21.55825
## 33 33 Moderate 21.23760
## 34 34 Moderate 21.29300
## 35 35 Moderate 21.23048
## 36 36 Moderate 20.73168
## 37 37 Moderate 19.81526
## 38 38 Moderate 20.36147
## 39 39 Moderate 22.20317
## 40 40 Moderate 22.44482
## 41 41 Cool 27.20292
## 42 42 Cool 24.18945
## 43 43 Cool 25.91652
## 44 44 Cool 25.94768
## 45 45 Cool 24.24685
## 46 46 Cool 28.30268
## 47 47 Cool 25.15520
## 48 48 Cool 25.31582
## 49 49 Cool 26.41716
## 50 50 Cool 27.59574
## 51 51 Cool 24.29964
## 52 52 Cool 25.32316
## 53 53 Cool 25.07848
## 54 54 Cool 25.70093
## 55 55 Cool 25.12378
## 56 56 Cool 25.56388
## 57 57 Cool 27.05987
## 58 58 Cool 27.08247
## 59 59 Cool 24.04286
## 60 60 Cool 23.73452
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
print(EggHatchModel)
## Call:
## aov(formula = Hatch_time ~ Temperature, data = EggHatchData)
##
## Terms:
## Temperature Residuals
## Sum of Squares 939.0816 85.0212
## Deg. of Freedom 2 57
##
## Residual standard error: 1.221311
## Estimated effects may be unbalanced
summary(EggHatchModel)
## Df Sum Sq Mean Sq F value Pr(>F)
## Temperature 2 939.1 469.5 314.8 <2e-16 ***
## Residuals 57 85.0 1.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])
## $Fval
## F value1
## 314.7901
##
## $probF
## Pr(>F)1
## 1.573007e-31
ggplot(data=EggHatchData,aes(x=Temperature,y=Hatch_time,fill=Temperature)) + geom_boxplot() + scale_fill_manual(values= c("blue","red","green"))
Here we find that there is a significant difference in hatching times between temperature groups.
First run: F = 467.53, p = 4.377084e-36
Second run: F = 398.39, p = 3.156156e-34
Third run: F = 344.89, p = 1.434916e-32
After rerunning my analysis several times with different random numbers, we can see that we consistently get a very large F-value and a very small p-value, indicating that there is quite a significant difference regardless of the input data. We do see some variation within both values across the different runs, however we consistently come up with the same conclusion: there is a significant difference in hatching times between temperature groups.
Now I will begin adjusting the means to see how small the differences between the groups can be (the “effect size”) for me to still detect a significant difference. I will do this by making the means for each group closer and closer together, until we no longer get a significant result.
Hot: mean = 18 days
Moderate: mean = 21 days
Cool: mean = 23 days
nGroup <- 3
nName <- c("Hot","Moderate", "Cool")
nSize <- c(20,20,20)
nMean <- c(18,21,23) # changed the means
nSD <- c(1.25,1.1,1.1)
ID <- 1:(sum(nSize))
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])
## $Fval
## F value1
## 142.0062
##
## $probF
## Pr(>F)1
## 7.219152e-23
After adjusting the means once and rerunning my analysis several times, I still consistently find a significant difference, however not as large of a difference. Let’s try adjusting the means again:
Hot: mean = 20.5 days
Moderate: mean = 21 days
Cool: mean = 21.5 days
nGroup <- 3
nName <- c("Hot","Moderate", "Cool")
nSize <- c(20,20,20)
nMean <- c(20.5,21,21.5) # changed the means again
nSD <- c(1.25,1.1,1.1)
ID <- 1:(sum(nSize))
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])
## $Fval
## F value1
## 1.501956
##
## $probF
## Pr(>F)1
## 0.2313747
My original means were:
Hot: Mean = 16 days, Moderate: Mean = 21 days, Cool: Mean = 26 days
My final adjusted means: Hot: mean = 20.5 days, Moderate: mean = 21 days, Cool: mean = 21.5 days
After changing the means several times, changing them to where there is only a 0.5 day difference between them seems to be the smallest difference (the “effect size”) where I still consistently detect a significant difference (p < 0.05). It does result in the occasional insignificant result, however most of the time it produces a significant p-value.
Now, using my original means, I will adjust the sample size to find the minimum sample size that I need to still detect a significant difference.
nGroup <- 3 # number of groups
nName <- c("Hot","Moderate", "Cool") # names of groups
nSize <- c(2,2,2) # number of observations in each group
nMean <- c(16,21,26) # mean of each group
nSD <- c(1.25,1.1,1.1) # standard deviation of each group
ID <- 1:(sum(nSize)) # create id vector for each row
Hatch_time <- c(rnorm(n=nSize[1],mean=nMean[1],sd=nSD[1]),
rnorm(n=nSize[2],mean=nMean[2],sd=nSD[2]),
rnorm(n=nSize[3],mean=nMean[3],sd=nSD[3]))
Temperature <- rep(nName,nSize)
EggHatchData <- data.frame(ID,Temperature,Hatch_time)
EggHatchModel <- aov(Hatch_time~Temperature,data=EggHatchData)
z <- summary(EggHatchModel)
list(Fval=unlist(z)[7],probF=unlist(z)[9])
## $Fval
## F value1
## 216.6112
##
## $probF
## Pr(>F)1
## 0.0005703216
After changing the sample sizes several times, reducing my sample sizes all the way down to 2 for each group still results in significant differences. As I decreased the sample size, my p-value increased until it became closer and closer to 0.05. This indicates that this study did not need such a large sample size to find a significant difference, however large sample sizes can often increase the robustness of statistical tests, as it does in this instance.
See text in above sections.