13 Inference III

In this chapter, you will begin mastering the concepts of null and alternative hypotheses, equipping you with the skills to draw reliable conclusions from sample data, effectively manage risks, and confidently justify strategic decisions.

13.1 Hypothesis Testing Intuition

The null hypothesis is a statement about the population parameter. Usually, the status quo. In research, it states no effect or no relationship between variables. The null hypothesis includes some form of the equality sign (i.e., $\geq$, $\leq$, or $=$).

For instance, consider the hypothesis that the average annual income of an individual in the U.S. is 50,000 dollars. When we take a sufficiently large sample from the population, the sample mean can be treated as a random variable following a normal distribution (central limit theorem). The hypothesized value, 50,000, represents the mean of this distribution. Our goal is to determine whether the calculated sample mean is likely to have been drawn from this hypothesized normal distribution with the specified mean. If the sample mean is “close” to the hypothesized mean then we would conclude that our sample is likely to have been drawn from such a distribution. If it is “far away” then we question whether the population mean is the one hypothesized and we should pivot to an alternative.

The alternative hypothesis directly contradicts the null hypothesis. In research, it states the prediction of the effect or relationship. The alternative includes non-equality signs (i.e., $>$, $<$, or $\ne$).

Example: We would like to test the hypothesis that the average income in the U.S. is $50,000. We take a sample of 100 people, and record their annual income. For this sample the mean is $48,000.

Remember that any sample statistic is subject to sampling error. At first glance, a sample statistic close to the hypothesized value (e.g., a sample mean near $50,000) might suggest the null hypothesis is true. However, we must consider the standard error of the sampling distribution:

If the standard error is small (e.g., $1), a sample mean even slightly away from $50,000 is many standard errors away → evidence against the hypothesis.
If the standard error is large (e.g., $15,000), the same sample mean falls well within 2 standard errors → consistent with the hypothesis.

Formally, we measure how far the sample statistic is from the hypothesized value in standard error units using the test statistic: \[t_{calculated}=\frac{\bar{x}-\mu}{s/\sqrt{n}}\] If this sample statistic is large for the given level of significance, then we can vote against the null hypothesis and favor the alternative.

13.2 Testing the Null Hypothesis

There are two equivalent ways to decide whether this observed difference is too large to be explained by random sampling error alone:

a) The Critical Value (Rejection Region) Approach

Choose a significance level (e.g., α = 0.05).
With degrees of freedom df = n − 1 = 99 and α = 0.05 (two-tailed test), look up the critical t-values in a t-table: approximately ±1.984.
If |t₍calculated₎| > 1.984, reject the null hypothesis.
- This means the sample mean is in the extreme 5% of the sampling distribution if the null were true.

b) The p-value Approach (most common today)

Calculate the t-statistic as above.
Using software or a t-distribution calculator, find the probability of observing a t-value at least as extreme as the one calculated, assuming the null is true. This is the p-value.
Decision rule:
- If p-value ≤ α (e.g., ≤ 0.05) → reject the null hypothesis.
- If p-value > α → fail to reject the null hypothesis.

Interpretation:
A small p-value (e.g., 0.003) means the data are very unlikely under the null hypothesis → evidence against H₀.
A large p-value (e.g., 0.42) means the data are consistent with H₀ → not enough evidence to reject it.

Both methods always lead to the same conclusion when using the same α level.

13.3 Steps to Perform Hypothesis Testing

To conduct hypothesis testing:

Specify the null and alternate hypothesis.

For means use:
- $H_o: \mu \leq 0$; $Ha: \mu > \mu_o$ right-tail probability
- $H_o: \mu \geq 0$; $Ha: \mu < \mu_o$ left-tail probability
- $H_o: \mu = 0$; $Ha: \mu \ne \mu_o$ two-tail probability
For proportions use:
- $H_o: P \leq 0$; $Ha: P > P_o$ right-tail probability
- $H_o: P \geq 0$; $Ha: P < P_o$ left-tail probability
- $H_o: P = 0$; $Ha: P \ne P_o$ two-tail probability

Specify the confidence level (i.e., how likely you would be to see non-extreme data, when assuming the null is true. False negative tolerance) and significance level (i.e. how likely you would be to see extreme data, when assuming the null is true. False positive tolerance). Confidence levels are usually set at, $0.90$, $0.95$, or $0.99$, which correspond to $10$%, $5$%, and $1$% significant levels, respectively.
Calculate the test statistic.
- For a test on means use $t_{df}= \frac {\bar x-\mu_o}{s/\sqrt{n}}$, where $df=n-1$, $\bar x$ is the sample mean, $\mu_o$ is the hypothesized value of $\mu$, $s$ is the sample standard deviation, and $n$ is the sample size.
- For a test on proportions use $z= \frac {\bar p- P_o}{\sqrt {P_o(1-P_o)/ n}}$, where $\bar p$ is the sample proportion, $P_o$ is the hypothesized value of the population proportion $P$, and $n$ is the sample size.
Find the p-value (i.e., the likelihood of getting the observed or more extreme data, assuming the null hypothesis is true). (Substitute $t$ for $z$ if using proportions)
- For a right-tail test, the $p$-value is $P(T\geq t)$.
- For a left-tail test, the $p$-value is $P(T\leq t)$.
- For a two-tail test, the $p$-value is $2P(T\geq t)$ if $t>0$ or $2P(T\leq t)$ if $t<0$.
The decision rule is to reject the null hypothesis when the $p-value<\alpha$, and not to reject when $p-value \geq alpha$.

13.4 Useful R Functions

t.test() generates a $t$-test for a vector of values. Use the alternative argument to specify “greater”, “less” or “two.sided” test. The mu argument specifies the hypothesized value for the mean. The conf.level sets the confidence level of the test (0.9,0.95,0.99, etc.).

prop.test() generates a proportion test when provided the number of successes and sample size.

13.5 Exercises

The following exercises will help you test your knowledge on Hypothesis Testing. In particular, the exercises work on:

Stating Null and Alternate Hypothesis.
Determine the statistical validity of the null hypothesis.
Conducting t-tests in R.

Try not to peek at the answers until you have formulated your own answer and double-checked your work for any mistakes.

Exercise 1

Consider the following hypothesis: $H_{o}: \mu=50$, $H_{a}: \mu \neq 50$. A sample of $16$ observations yields a mean of $46$ and a standard deviation of $10$. Calculate the value of the test statistic. At a $5$% significance level, does the population mean differ from $50$?
Answer

The sample statistic is -1.6. The null hypothesis can’t be rejected at a 5% significance level since the p-value is 13.04%. We conclude that the population mean is not statistically different from 50.

In R we can calculate the t-statistic.
```
muEx1 <- 50
sigmaEx1 <- 10
n <- 16

(teststat <- (46 - muEx1) / (sigmaEx1 / sqrt(n)))
```
```
[1] -1.6
```
```
(tcrit <- qt(0.025, n - 1))
```
```
[1] -2.13145
```
Since the t-statistic is greater than the critical value of -2.13, we can’t reject the null. We can also estimate the p-value to confirm this finding. Recall that the P-value is the likelihood of obtaining a sample mean at least as extreme as the one derived from the given sample.
```
2 * pt(teststat, n - 1)
```
```
[1] 0.130445
```
Consider the following hypothesis: $H_{o}: \mu \geq 100$, $H_{a}: \mu < 100$. You take a sample from a normally distributed population that yields the values in the table below. Conduct a test at a $1$% significance level to prove the hypothesis.

96 102 93 87 92 82
Answer

The null hypothesis that (H_{o}: ) can’t be rejected since the p-value of 1.9% is greater than the 1% significance level.

Let’s start by creating an object to store the values of our sample.
```
sample2 <- c(96, 102, 93, 87, 92, 82)
```
Now we can construct the t-stat and calculate the critical value.
```
mean2 <- mean(sample2)
standard2 <- sd(sample2)
n2 <- length(sample2)
(tstat2 <- (mean2 - 100) / (standard2 / sqrt(n2)))
```
```
[1] -2.816715
```
Lastly, we can calculate the p-value.
```
pt(tstat2, n2 - 1)
```
```
[1] 0.0186262
```
We can also verify our result using the t.test() function in R.
```
t.test(sample2, alternative = "less", mu = 100, conf.level = 0.99)
```
```
    One Sample t-test

data:  sample2
t = -2.8167, df = 5, p-value = 0.01863
alternative hypothesis: true mean is less than 100
99 percent confidence interval:
    -Inf 101.557
sample estimates:
mean of x 
       92 
```
Consider the following hypothesis: $H_{o}: \mu \leq 210$, $H_{a}: \mu > 210$. You take a sample from a normally distributed population that yields the values in the table below. Conduct a test at a $10$% significance level to prove the hypothesis.

210 220 299 220 290 280 233 221 292 299
Answer

The null hypothesis that (H_{o}: ) can be rejected since the p-value of 0.2% is less than the 10% significance level.

Let’s create the object in R with the data.
```
sample3 <- c(210, 220, 299, 220, 290, 280, 233, 221, 292, 299)
```
Using the t.test() function we find:
```
t.test(sample3, alternative = "greater", mu = 210, conf.level = 0.9)
```
```
    One Sample t-test

data:  sample3
t = 3.8333, df = 9, p-value = 0.002004
alternative hypothesis: true mean is greater than 210
90 percent confidence interval:
 239.6593      Inf
sample estimates:
mean of x 
    256.4 
```

Exercise 2

According to www.nps.gov, the period of time between Old Faithful’s eruptions is on average $92$ minutes. Use the built-in faithful R data set and a two-tail test to determine whether this claim is true.

Answer

The claim that the duration between eruptions is 92 minutes can be rejected at a 10%, 5%, and 1% significance level.

Once more calculate the t-test in R with the t.test() function.

t.test(faithful$waiting, alternative = "two.sided", mu = 92, conf.level = 0.99)


    One Sample t-test

data:  faithful$waiting
t = -25.601, df = 271, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 92
99 percent confidence interval:
 68.75871 73.03541
sample estimates:
mean of x 
 70.89706

Exercise 3

To test if the population proportion differs from $0.4$, a scientist draws a random sample of $100$ observations and obtains a sample proportion of $0.48$. Specify the competing hypothesis. At a $5$% significance level, does the population proportion differ from $0.4$?
Answer

The competing hypotheses are (H_{o}: p = 0.4), (H_{a}: p ). At a 5% significance level, we can’t reject the null hypothesis since the p-value of the test statistic (0.102) is greater than the significance level (0.05). We conclude that the population proportion is not significantly different from 0.4.

In R we can calculate the test statistic ().
```
(pstat <- (0.48 - 0.4) / sqrt(0.4 * (1 - 0.4) / 100))
```
```
[1] 1.632993
```
Now we can use the pnorm() function in R to get the p-value. Since it is a two-tailed test, we multiply the probability by 2.
```
2 * pnorm(pstat, lower.tail = FALSE)
```
```
[1] 0.1024704
```
When taking a sample of $320$ observations, $128$ result in success. Test the following hypothesis $H_{o}: p \geq 0.45$, $H_{a}: p < 0.45$ at a $5$% significance level.
Answer

From the sample, 40% are labeled as success. Testing the hypothesis reveals that we can reject the null at a 5% significance level. We conclude that the population proportion is less than 0.45.

We once again create the test statistic in R.
```
(pstat2 <- (0.4 - 0.45) / sqrt(0.45 * (1 - 0.45) / 320))
```
```
[1] -1.797866
```
With the statistic, we can now find the p-value:
```
pnorm(pstat2, lower.tail = TRUE)
```
```
[1] 0.0360991
```
Determine if more than $50$% of the observations in a population are below $10$ with the sample data below. Conduct the test at a $1$% significance level.

8 12 5 9 14 11 9 3 7 12
Answer

The competing hypotheses are (H_{o}: p ), (H_{a}: p > 0.5). At a 1% significance level, we can’t reject the null hypothesis since the p-value of the test statistic (0.26) is greater than the significance level (0.01). We conclude that more than 50% of the observations in the population are below 10.

Let’s create an object to store the values.
```
values <- c(8, 12, 5, 9, 14, 11, 9, 3, 7, 12)
```
Now, let’s count how many values are below 10 and calculate the proportion.
```
sum(values < 10) / length(values)
```
```
[1] 0.6
```
Lastly, we find the test-statistic and p-value:
```
pstat3 <- (0.6 - 0.5) / sqrt(0.5 * (1 - 0.5) / 10)
pnorm(pstat3, lower.tail = FALSE)
```
```
[1] 0.2635446
```
We can also use the prop.test() function in R to confirm our result.
```
prop.test(6, 10, p = 0.5, alternative = "greater", conf.level = 0.99, correct = FALSE)
```
```
    1-sample proportions test without continuity correction

data:  6 out of 10, null probability 0.5
X-squared = 0.4, df = 1, p-value = 0.2635
alternative hypothesis: true p is greater than 0.5
99 percent confidence interval:
 0.2724654 1.0000000
sample estimates:
  p 
0.6 
```

Exercise 4

According to www.worldatlas.com, $5$% of the population has hazel color eyes. Use the built-in HairEyeColor R data set and a two-tail test to determine whether this claim is true.

Answer

We reject the null hypothesis that 5% of the population has hazel eyes with our sample.

The number of people with hazel eyes is calculated as:

(s <- sum(HairEyeColor[, 3, 1] + HairEyeColor[, 3, 2]))

[1] 93

The total number of people in the survey is given by:

(t <- sum(HairEyeColor))

[1] 592

We can use the prop.test() function once more:

prop.test(93, 592, p = 0.05, alternative = "two.sided", conf.level = 0.95, correct = FALSE)


    1-sample proportions test without continuity correction

data:  93 out of 592, null probability 0.05
X-squared = 142.94, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.05
95 percent confidence interval:
 0.1300037 0.1886070
sample estimates:
        p 
0.1570946