mean(c(8,10,9,12,12))
[1] 10.2
Measures of Central Location determine where the center of a distribution lies.
The mean is the average value for a numerical variable. The sample statistic is estimated by \(\bar{x}=\sum x_{i}/n\), where \(x_i\) is observation \(i\), and \(n\) is the number of observations. The population parameter is defined as \(\mu=\sum x_{i}/N\).
The median is the value in the middle when data is organized in ascending order. When \(n\) is even, the median is the average between the two middle values.
The mode is the value with highest frequency from a set of observations.
The weighted mean uses weights to determine the importance of each data point of a variable. It is calculated by \(\frac{\sum w_{i}x_{i}}{\sum w_{i}}\), where \(w_{i}\) are the weights associated to the values \(x_{i}\).
The geometric mean is a multiplicative average that is less sensitive to outliers. It is used to average growth rates or rated of return. It is calculated by \(\sqrt[n]{(1+r_1)*(1+r_2)...(1+r_n)}-1\), where \(\sqrt[n]{}\) is the \(n_{th}\) root, and \(r_i\) are the returns or growth rates.
Base R has a collection of functions that calculate measures of central location.
The mean()
function calculates the average of a vector of values.
The median()
function returns the median of a vector of values.
The table()
function provides us with a frequency distribution. We can then identify the mode(s) of the vector provided.
The summary()
function returns a collection of descriptive statistics for a vector or data frame.
The following exercises will help you practice the measures of central location. In particular, the exercises work on:
Calculating the mean, median, and the mode.
Calculating the weighted average.
Applying the geometric mean for growth rates and returns.
Answers are provided below. Try not to peak until you have a formulated your own answer and double checked your work for any mistakes.
For the following exercises, make your calculations by hand and verify results using R functions when possible.
Use the following observations to calculate the mean, the median, and the mode.
8 | 10 | 9 | 12 | 12 |
Use following observations to calculate the mean, the median, and the mode.
-4 | 0 | -6 | 1 | -3 | -4 |
Use the following observations, calculate the mean, the median, and the mode.
20 | 15 | 25 | 20 | 10 | 15 | 25 | 20 | 15 |
Download the ISLR2
package. You will need the OJ data set to answer this question.
Find the mean price for Country Hill (PriceCH) and Minute Maid (PriceMM).
Find the mean price of Country Hill (PriceCH) in store 1 and store 2 (StoreID). Which store had the better price?
Find the mean price paid by Country Hill (PriceCH) purchasers (Purchase) in store 1 (StoreID)? How about store 2? Which store had the better price?
Date | Price Per Share | Number of Shares |
---|---|---|
February | 250.34 | 80 |
April | 234.59 | 120 |
Aug | 270.45 | 50 |
Consider the following observations for the consumer price index (CPI). Calculate the inflation rate (Growth Rate of the CPI) for each period.
1.0 | 1.3 | 1.6 | 1.8 | 2.1 |
Suppose that you want to invest $1000 dollars in a stock that is predicted to yield the following returns in the next four years. Calculate both the arithmetic mean and the geometric mean. Use the geometric mean to estimate how much money you would have by the end of year 4.
Year | Annual Return |
---|---|
1 | 17.3 |
2 | 19.6 |
3 | 6.8 |
4 | 8.2 |
To find the mean we will use the following formula \(( \frac{1}{n} \sum_{i=i}^{n} x_{i})\). The summation of the values is \(51\) and the number of observations is \(5\). The mean is \(51/5=10.2\).
The median is found by locating the middle value when data is sorted in ascending order. The median in this example is \(10\).
The mode is the value with the highest frequency. In this example the mode is \(12\) since it is repeated twice and all other numbers appear only once.
The mean can be easily verified in R by using the mean()
function:
mean(c(8,10,9,12,12))
[1] 10.2
Similarly, the median is easily verified by using the median()
function:
median(c(8,10,9,12,12))
[1] 10
We can use the table()
function to calculate frequencies and easily identify the mode.
table(c(8,10,9,12,12))
8 9 10 12
1 1 1 2
These mean is verified in R:
mean(c(-4,0,-6,1,-3,-4))
[1] -2.666667
The median in R:
median(c(-4,0,-6,1,-3,-4))
[1] -3.5
Finally, the mode in R:
table(c(-4,0,-6,1,-3,-4))
-6 -4 -3 0 1
1 2 1 1 1
These mean is verified in R:
mean(c(20,15,25,20,10,15,25,20,15))
[1] 18.33333
The median in R:
median(c(20,15,25,20,10,15,25,20,15))
[1] 20
The frequency distribution identifies the modes:
table(c(20,15,25,20,10,15,25,20,15))
10 15 20 25
1 3 3 2
The means can be easily found with the mean()
function:
library(ISLR2)
mean(OJ$PriceCH)
[1] 1.867421
mean(OJ$PriceMM)
[1] 2.085411
The means for each store can be found by using indexing and a logical statement. The Country Hill mean price at store 1 is given by:
mean(OJ$PriceCH[OJ$StoreID==1])
[1] 1.803758
The Country Hill mean price at store 2 is given by:
mean(OJ$PriceCH[OJ$StoreID==2])
[1] 1.841216
The mean for Country Hill purchasers at store 1 is given by:
mean(OJ$PriceCH[OJ$StoreID==1 & OJ$Purchase=="CH"])
[1] 1.797176
The mean for Country Hill purchasers at store 2 is:
mean(OJ$PriceCH[OJ$StoreID==2 & OJ$Purchase=="CH"])
[1] 1.857383
In R you can create two vectors. One holds the share price and the other one the number of shares bought.
<-c(250.34,234.59,270.45)
PricePerShare<-c(80,120,50) NumberOfShares
Next, you can multiply the PricePerShare and NumberOfShares vectors to find the numerator and then use sum()
function to find the denominator. The weighted average is:
<-
(WeightedAveragesum(PricePerShare*NumberOfShares)/sum(NumberOfShares))
[1] 246.802
In R you can use the mean()
function on the PricePerShare vector.
<-mean(PricePerShare)) (Average
[1] 251.7933
30% | 23.08% | 12.5% | 16.67% |
In R create an object to store the values of the CPI:
<-c(1,1.3,1.6,1.8,2.1) CPI
Next use the diff()
function to find the difference between the end value and start value. Divide the result by a vector of starting value and multiply times 100.
<-100*diff(CPI)/CPI[1:4]) (Inflation
[1] 30.00000 23.07692 12.50000 16.66667
In R include the annual rates in a vector:
<-c(0.173,0.196,0.068,0.082) growth
The arithmetic mean is:
100*mean(growth)
[1] 12.975
The geometric mean is:
<-((prod(1+growth))^(1/4)-1)*100) (geom
[1] 12.8384
At the end of the four years we would have:
1000*(1+geom/100)^4
[1] 1621.167