12  Inference II

A confidence interval (CI) provides a range of plausible values for a population parameter, along with a confidence level indicating that if the same procedure were repeated many times, that proportion of intervals would contain the true parameter. For example, a 95% CI means that if we repeated the sampling process 100 times, about 95 of the intervals would contain the true population parameter and 5 would not.

We can now define two important concepts. First, the confidence level is the proportion of times that the constructed interval would contain the true population parameter if the sampling procedure were repeated indefinitely. Second, the significance level is the probability of rejecting a true null hypothesis — also known as a Type I error (i.e., You conclude there is an effect, but there isn’t or false positive). The significance level is captured by the greek letter \(\alpha\) while and the confidence level is given by \(1-\alpha\).

Example:

CIs account for sampling variability and are wider for lower confidence levels or smaller samples. There are two competing goals when building a confidence interval:

Choosing a very low \(\alpha\) (like 0.01) prioritizes accuracy at the expense of precision, giving you a wide interval that is highly likely to include the true parameter but tells you very little about where exactly it is.

A lower significance level does not necessarily mean a better estimate — it means a safer but vaguer one. The goal is to choose an α that balances the risk of error with the need for a practically useful, precise estimate. That’s why \(\alpha=0.5\) has become the conventional standard in most fields — it strikes a reasonable balance between the two.

12.1 Constructing Confidence Intervals for Means

For a population mean, assuming normality (via CLT), the CI is:

\[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\] or, if \((\sigma)\) is unknown (common), use the t-distribution:

\[\bar{x} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}}\]

where \((\bar{x})\): Sample mean, \((z_{\alpha/2})\) or \((t_{\alpha/2})\): Critical value from standard normal or t-distribution (e.g., 1.96 for 95% CI with large n), \((s)\): Sample standard deviation and \((n)\): Sample size. The term after \((\pm)\) is the margin of error.

Example: Suppose the average life expectancy sample mean is \(\bar{x} = 78.1\) years, with population standard deviation \(\sigma = 4.5\), and \(n = 50\). For a 90% CI \(z_{0.05} \approx 1.645\):

\[\frac{\sigma}{\sqrt{n}} = \frac{4.5}{\sqrt{50}} \approx 0.637\]

\[78.1 \pm 1.645 \times 0.637 \approx 78.1 \pm 1.05\]

Lower limit (LL): 77.05
Upper limit (UL): 79.15

We are 90% confident that the true population mean life expectancy is between 77.05 and 79.15 years.

Example: A random sample of \(n=25\) college students reports an average weekly study time of \(\bar{x} = 14.3\) hours, with a sample standard deviation of \(s = 4.2\) hours. The population standard deviation is unknown. To construct a 95% confidence interval for the true mean weekly study time we start by identifying the degrees of freedom as \(df=n−1=24\).

At a \(\alpha = 0.05\) the critical value is \(t_{0.025, \, 24} \approx 2.064\). The standard error is given by:

\[\frac{s}{\sqrt{n}} = \frac{4.2}{\sqrt{25}} = \frac{4.2}{5} = 0.84\]

Hence the confidence interval is given by: \[14.3 \pm 1.7314.3±2.064×0.84≈14.3±1.73\]

Lower Limit (LL): 12.57 Upper Limit (UL): 16.03

We are 95% confident that the true mean weekly study time is between 12.57 and 16.03 hours.

12.2 Constructing Confidence Intervals for Proportions

For a population proportion \((p)\), the CI is:

\[\bar{p} \pm z_{\alpha/2} \cdot \sqrt{\frac{\bar{p}(1-\bar{p})}{n}}\] - \(\bar{p}\): Sample proportion (successes / n) - Assumes normality via CLT for proportions.

Example: A random sample of 100 yields 40 successes, so \(\bar{p} = 0.4\). For a 90% CI \(z_{0.05} \approx 1.645\):

\[SE = \sqrt{\frac{0.4 \times 0.6}{100}} = \sqrt{0.0024} \approx 0.049\]

\[0.4 \pm 1.645 \times 0.049 \approx 0.4 \pm 0.081\]

Lower Limit (LL): 0.319
Upper Limit (UL): 0.481

We are 90% confident that the true population proportion is between 31.9% and 48.1%.

12.3 P-Hacking

A researcher wants to estimate the average body length of an ant (in mm) for a species found in a backyard. The true population mean is unknown.

The researcher collects a sample of 30 ants and computes a 95% CI:

CI: [1.1, 2.9] mm

This interval is very wide and includes values the researcher doesn’t find interesting. They were hoping to show the ants are unusually large (say, larger than 2.5 mm). The result is not significant. So instead of reporting this, they start manipulating the process:

Hack 1: Keep re-sampling They go out and collect new samples repeatedly until they happen to get a sample of ants that looks bigger than average — maybe they unknowingly sampled near a food source where ants are larger.

Hack 2: Selectively remove data They notice 5 ants in their sample are very small. They label them “outliers” without a real scientific reason and remove them, which pulls the mean upward.

Hack 3: Change the confidence level Their 95% CI includes 2.5 mm in its range, meaning the result isn’t significant. So they switch to a 90% CI, which is narrower, and now the interval is:

CI: [2.6, 3.4] mm

Now it excludes the reference value of 2.5 mm and looks significant!

Hack 4: Measure something else Their body length result wasn’t impressive, so they start measuring head width, leg length, antenna length — 10 different things — until one of them gives a significant result, then report only that one.

Any one of these manipulations can produce a CI that looks clean and legitimate in a published report, but was actually the result of repeated tries and selective reporting. The reader has no idea.

The Honest Version Would Say “We collected one random sample, measured body length, and found a 95% CI of [1.1, 2.9] mm. The wide interval reflects natural variability in ant size and our sample size of 30.” That is far less exciting — but scientifically valid.

P-hacking with ant sizes illustrates that the problem isn’t always malicious. Researchers can convince themselves their re-sampling or outlier removal is justified. But every extra “try” inflates the chance of a false result — and the published CI ends up being a lucky shot dressed up as a reliable estimate.

12.4 Summary

Statistical inference bridges the gap between samples and populations through sampling distributions and the CLT. Confidence intervals offer a practical way to express uncertainty in estimates, essential for business decisions like market analysis or quality control. Remember, higher confidence levels widen intervals, and larger samples narrow them, improving precision.

12.5 Useful R Functions

The qnorm() and qt() functions calculate quartiles for the normal and \(t\) distributions, respectively.

The if() function creates a conditional statement in R.

12.6 Exercises

The following exercises will help you test your knowledge on Statistical Inference. In particular, the exercises work on:

  • Simulating confidence intervals.
  • Estimating confidence intervals in R.
  • Estimating confidence intervals for proportions.

Try not to peek at the answers until you have formulated your own answer and double-checked your work for any mistakes.

Exercise 1

In this exercise you will be simulating confidence intervals.

  1. Set the seed to \(9\). Create a random sample of 1000 data points and store it in an object called Population. Use the exponential distribution with rate of \(0.02\) to generate the data. Calculate the mean and standard deviation of Population and call them PopMean and PopSD respectively. What are the mean and standard deviation of Population?

    Answer

    The mean of Population is 48.61. The standard deviation is 47.94.

    Start by generating values from the exponential distribution. You can use the rexp() function in R to do this. Setting the seed to 9 yields:

    set.seed(9)
    Population <- rexp(1000, 0.02)

    The population mean is:

    (PopMean <- mean(Population))
    [1] 48.61053

    The standard deviation is:

    (PopSD <- sd(Population))
    [1] 47.94411
  2. Create a for loop (with 10,000 iterations) that takes a sample of 50 points from Population, calculates the mean, and then stores the result in a vector called SampleMeans. What is the mean of the SampleMeans?

    Answer

    The mean is very close to the population mean 48.83. The standard deviation is 6.83.

    In R you can use a for loop to create the vector of sample means.

    nrep <- 10000
    SampleMeans <- c()
    for (i in 1:nrep){
      x <- sample(Population, 50, replace = T)
      SampleMeans <- c(SampleMeans, mean(x))
    }

    The mean of SampleMeans is:

    (xbar <- mean(SampleMeans))
    [1] 48.7005

    The standard deviation is:

    (Standard <- sd(SampleMeans))
    [1] 6.827595
  3. Create a \(90\)% confidence interval using the first data point in the SampleMeans vector. Does the confidence interval include PopMean?

    Answer

    The confidence interval is [47.71, 70.17]. Since the population mean is equal to 48.61, the confidence interval does include the population mean.

    Let’s construct the upper and lower limits of the interval in R.

    (ll <- SampleMeans[1] + qnorm(0.05) * Standard)
    [1] 47.71385
    (ul <- SampleMeans[1] - qnorm(0.05) * Standard)
    [1] 70.17464
  4. Now take the minimum of the SampleMeans vector. Create a new \(90\)% confidence interval. Does the interval include PopMean? Out of the \(10,000\) intervals that you could construct with the vector SampleMeans, how many would you expect to include PopMean?

    Answer

    The confidence interval is [14.86, 37.32]. This interval does not include the population mean of 48.61. Out of the 10,000 confidence intervals, one would expect about 9,000 to include the population mean.

    Let’s find the confidence interval limits using R.

    (Minll <- min(SampleMeans) + qnorm(0.05) * Standard)
    [1] 14.85631
    (Minul <- min(SampleMeans) - qnorm(0.05) * Standard)
    [1] 37.31709

    We can confirm in R that about 9,000 of the intervals include PopMean. Once more, let’s use a for loop to construct confidence intervals for each element in SampleMeans and check whether the PopMean is included. The count variable keeps track of how many intervals include the population mean.

    count = 0
    
    for (i in SampleMeans){
      (ll <- i + qnorm(0.05) * Standard)
      (ul <- i - qnorm(0.05) * Standard)
      if (PopMean <= ul & PopMean >= ll){
        count = count + 1
      }
    }
    
    count
    [1] 8978

Exercise 2

  1. A random sample of \(24\) observations is used to estimate the population mean. The sample mean is \(104.6\) and the standard deviation is \(28.8\). The population is normally distributed. Construct a \(90\)% and \(95\)% confidence interval for the population mean. How does the confidence level affect the size of the interval?

    Answer

    The 90% confidence interval is [94.52, 114.67] and the 95% confidence interval is [92.68, 116.76]. The larger the confidence level, the larger the interval.

    Let’s construct the intervals using R. Since the population standard deviation is unknown we will use the t-distribution. The interval is constructed as ({x} t_{/2} ).

    (ul90 <- 104.6 - qt(0.05, 23) * 28.8 / sqrt(24))
    [1] 114.6755
    (ll90 <- 104.6 + qt(0.05, 23) * 28.8 / sqrt(24))
    [1] 94.52453

    For the 95% confidence interval we adjust the significance level accordingly.

    (ul95 <- 104.6 - qt(0.025, 23) * 28.8 / sqrt(24))
    [1] 116.7612
    (ll95 <- 104.6 + qt(0.025, 23) * 28.8 / sqrt(24))
    [1] 92.43883
  2. A random sample from a normally distributed population yields a mean of \(48.68\) and a standard deviation of \(33.64\). Compute a \(95\)% confidence interval assuming a) that the sample size is \(16\) and b) the sample size is \(25\). What happens to the confidence interval as the sample size increases?

    Answer

    The confidence interval for a sample size of 16 is [30.75, 66.61]. The confidence interval when the sample size is 25 is [34.79, 62.57]. As the sample size gets larger, the confidence interval gets narrower and more precise.

    Let’s use R again to calculate the confidence interval. For a sample size of 16 the interval is:

    (ul16 <- 48.68 - qt(0.025, 15) * 33.64 / sqrt(16))
    [1] 66.60549
    (ll16 <- 48.68 + qt(0.025, 15) * 33.64 / sqrt(16))
    [1] 30.75451

    Increasing the sample size to 25 yields:

    (ul25 <- 48.68 - qt(0.025, 24) * 33.64 / sqrt(25))
    [1] 62.56591
    (ll25 <- 48.68 + qt(0.025, 24) * 33.64 / sqrt(25))
    [1] 34.79409

Exercise 3

You will need the sleep data set for this problem. The data is built into R, and displays the effect of two sleep inducing drugs on students. Calculate a \(95\)% confidence interval for group 1 and for group 2. Which drug would you expect to be more effective at increasing sleeping times?

Answer

The 95% confidence interval for group 1 is [-0.53, 2.03]. Let’s first calculate the standard error for group 1.

(se1 <- sd(sleep$extra[sleep$group == 1]) / sqrt(length(sleep$extra[sleep$group == 1])))
[1] 0.5657345

We can now use the standard error to estimate the lower and upper limits of the confidence interval.

(ll1 <- mean(sleep$extra[sleep$group == 1]) + qt(0.025, 9) * se1)
[1] -0.5297804
(ul1 <- mean(sleep$extra[sleep$group == 1]) - qt(0.025, 9) * se1)
[1] 2.02978

The 95% confidence interval for group 2 is [0.90, 3.76].Let’s repeat the procedure for group 2. Start by finding the standard error.

(se2 <- sd(sleep$extra[sleep$group == 2]) / sqrt(length(sleep$extra[sleep$group == 2])))
[1] 0.6331666

Using the standard error we can complete the confidence interval.

(ll2 <- mean(sleep$extra[sleep$group == 2]) + qt(0.025, 9) * se2)
[1] 0.8976775
(ul2 <- mean(sleep$extra[sleep$group == 2]) - qt(0.025, 9) * se2)
[1] 3.762322

Drug 2 is more effective. Drug 2 does not include zero in the interval, and the interval is to the right of zero. It is unlikely that drug 2 has no effect on students’ sleeping time. Additionally, Drug 2’s mean increase in sleeping hours is 2.33 vs. 0.75 for drug 1.

Exercise 4

  1. A random sample of \(100\) observations results in \(40\) successes. Construct a \(90\)% and \(95\)% confidence interval for the population proportion. Can we conclude at either confidence level that the population proportion differs from \(0.5\)?

    Answer

    The 90% and 95% confidence intervals are [0.319, 0.481], and [0.304, 0.496] respectively. Since they do not include 0.5, we can conclude that the population proportion is significantly different from 0.5.

    We can create an object that stores the sample proportion and sample in R:

    (p <- 0.4)
    [1] 0.4
    (n <- 100)
    [1] 100

    The 90% confidence interval is given by:

    (Ex1ll90 <- p + qnorm(0.05) * sqrt(p * (1 - p) / 100))
    [1] 0.319419
    (Ex1ul90 <- p - qnorm(0.05) * sqrt(p * (1 - p) / 100))
    [1] 0.480581

    The 95% confidence interval is:

    (Ex1ll95 <- p + qnorm(0.025) * sqrt(p * (1 - p) / 100))
    [1] 0.3039818
    (Ex1ul95 <- p - qnorm(0.025) * sqrt(p * (1 - p) / 100))
    [1] 0.4960182
  2. You will need the HairEyeColor data set for this problem. The data is built into R, and displays the distribution of hair and eye color for \(592\) statistics students. Construct a \(95\) confidence interval for the proportion of Hazel eye color students.

    Answer

    The 95% confidence interval is [0.128, 0.186].

    The data can easily be viewed by calling HairEyeColor in R.

    HairEyeColor
    , , Sex = Male
    
           Eye
    Hair    Brown Blue Hazel Green
      Black    32   11    10     3
      Brown    53   50    25    15
      Red      10   10     7     7
      Blond     3   30     5     8
    
    , , Sex = Female
    
           Eye
    Hair    Brown Blue Hazel Green
      Black    36    9     5     2
      Brown    66   34    29    14
      Red      16    7     7     7
      Blond     4   64     5     8

    Note that there are three dimensions to this table (Hair, Eye, Sex). We can calculate the proportion of Hazel eye colored students with the following command that makes use of indexing:

    (p <- (sum(HairEyeColor[, 3, 1]) + sum(HairEyeColor[, 3, 2])) / sum(HairEyeColor))
    [1] 0.1570946

    Now we can use this proportion to construct the intervals. Recall that for proportions the interval is calculated by ({p} z_{/2} ). The 95% confidence interval is given by:

    (Ex2ll95 <- p + qnorm(0.025) * sqrt(p * (1 - p) / 592))
    [1] 0.1277818
    (Ex2ul95 <- p - qnorm(0.025) * sqrt(p * (1 - p) / 592))
    [1] 0.1864074

Exercise 5

The 2024 Formula 1 Crypto.com Miami Grand Prix was held on May 5, 2024 at the Miami International Autodrome. Lando Norris won the race for McLaren in his first-ever F1 victory. The table below shows the fastest lap time recorded by each of the 20 drivers during the race.

Driver Team Fastest Lap
Oscar Piastri McLaren Mercedes 1:30.634
Alexander Albon Williams Mercedes 1:30.849
Sergio Perez Red Bull Racing 1:30.855
Carlos Sainz Ferrari 1:30.928
Lando Norris McLaren Mercedes 1:30.980
Charles Leclerc Ferrari 1:31.084
Lewis Hamilton Mercedes 1:31.233
Max Verstappen Red Bull Racing 1:31.261
Lance Stroll Aston Martin 1:31.588
Yuki Tsunoda RB Honda RBPT 1:31.682
Fernando Alonso Aston Martin 1:31.727
Kevin Magnussen Haas Ferrari 1:31.774
George Russell Mercedes 1:31.921
Nico Hulkenberg Haas Ferrari 1:31.941
Zhou Guanyu Kick Sauber 1:31.991
Esteban Ocon Alpine Renault 1:32.037
Pierre Gasly Alpine Renault 1:32.055
Valtteri Bottas Kick Sauber 1:32.098
Daniel Ricciardo RB Honda RBPT 1:32.122
Logan Sargeant Williams Mercedes 1:33.452

Source: Formula 1 Official Results, formula1.com


We can store the data into a tibble and convert the lap times to seconds with the code below:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
   miami_2024 <- tibble(
     position    = 1:20,
     driver      = c("Oscar Piastri", "Alexander Albon", "Sergio Perez",
                     "Carlos Sainz", "Lando Norris", "Charles Leclerc",
                     "Lewis Hamilton", "Max Verstappen", "Lance Stroll",
                     "Yuki Tsunoda", "Fernando Alonso", "Kevin Magnussen",
                     "George Russell", "Nico Hulkenberg", "Zhou Guanyu",
                     "Esteban Ocon", "Pierre Gasly", "Valtteri Bottas",
                     "Daniel Ricciardo", "Logan Sargeant"),
     team        = c("McLaren Mercedes", "Williams Mercedes", "Red Bull Racing",
                     "Ferrari", "McLaren Mercedes", "Ferrari",
                     "Mercedes", "Red Bull Racing", "Aston Martin",
                     "RB Honda RBPT", "Aston Martin", "Haas Ferrari",
                     "Mercedes", "Haas Ferrari", "Kick Sauber",
                     "Alpine Renault", "Alpine Renault", "Kick Sauber",
                     "RB Honda RBPT", "Williams Mercedes"),
     fastest_lap = c("1:30.634", "1:30.849", "1:30.855", "1:30.928", "1:30.980",
                     "1:31.084", "1:31.233", "1:31.261", "1:31.588", "1:31.682",
                     "1:31.727", "1:31.774", "1:31.921", "1:31.941", "1:31.991",
                     "1:32.037", "1:32.055", "1:32.098", "1:32.122", "1:33.452"),
     lap_seconds = c(90.634, 90.849, 90.855, 90.928, 90.980,
                     91.084, 91.233, 91.261, 91.588, 91.682,
                     91.727, 91.774, 91.921, 91.941, 91.991,
                     92.037, 92.055, 92.098, 92.122, 93.452)
   )
  1. What is the sample mean and sample standard deviation?

    Answer

    The sample mean is 91.611 seconds (1:31.611) and the sample standard deviation is 0.656 seconds. Since the true population standard deviation is unknown, we will use the t-distribution to construct confidence intervals.

The sample size is:

   (n <- nrow(miami_2024))
[1] 20

The mean:

   (xbar <- mean(miami_2024$lap_seconds))
[1] 91.6106

and the Standard Deviation

   (s <- sd(miami_2024$lap_seconds))
[1] 0.6559704
  1. Construct a 90% and a 95% confidence interval for the true mean fastest lap time at the Miami circuit. Use the t.test() function to verify your results.

    Answer

    The 90% confidence interval is [1:31.357, 1:31.864] and the 95% confidence interval is [1:31.304, 1:31.918]. Since \(n = 20\), we use the t-distribution with \(df = n - 1 = 19\) degrees of freedom.

    The standard error is:

   (se <- s / sqrt(n))
[1] 0.1466794

For the 90% CI (\(\alpha = 0.10\)), \(t_{0.05, \, 19} \approx 1.729\):

The lower bound is:

   (ll90 <- xbar + qt(0.05, 19) * se)
[1] 91.35697

and the upper:

   (ul90 <- xbar - qt(0.05, 19) * se)
[1] 91.86423

For the 95% CI (\(\alpha = 0.05\)), \(t_{0.025, \, 19} \approx 2.093\):

The lower:

   (ll95 <- xbar + qt(0.025, 19) * se)
[1] 91.3036

The upper:

   (ul95 <- xbar - qt(0.025, 19) * se)
[1] 91.9176

We can verify using t.test():

t.test(miami_2024$lap_seconds, conf.level = 0.95)

    One Sample t-test

data:  miami_2024$lap_seconds
t = 624.56, df = 19, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 91.3036 91.9176
sample estimates:
mean of x 
  91.6106 
  1. Logan Sargeant’s fastest lap of 1:33.452 (93.452 sec) is noticeably slower than the rest of the field. Remove his lap time from the data and recalculate the 95% confidence interval. How does removing this outlier affect the interval, and what does this illustrate about the sensitivity of confidence intervals to extreme values?

    Answer

    Without Sargeant’s lap time, the 95% CI narrows to approximately [91.189, 91.630], with a new mean of 91.409 seconds. The interval shifts downward and becomes substantially narrower — the margin of error drops because the sample standard deviation falls sharply once the outlier is removed. This illustrates that confidence intervals are sensitive to extreme observations, particularly in small samples. This is also a reminder of why removing data points should only be done with a clear justification — in this case, Sargeant set his fastest lap on lap 15 during an early pit phase, making it a genuinely unrepresentative time.

miami_no_sar <- miami_2024 |> filter(driver != "Logan Sargeant")

   t.test(miami_no_sar$lap_seconds, conf.level = 0.95)

    One Sample t-test

data:  miami_no_sar$lap_seconds
t = 788.53, df = 18, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 91.26986 91.75751
sample estimates:
mean of x 
 91.51368