4  Descriptive Stats IV

4.1 Concepts

Measures of Dispersion

Measures of dispersion are used to determine the spread (variability) of the data.

  • The range is calculated by \(largest-smallest\). It ignores the variability of the data between the largest and smallest values.

  • The variance calculates the dispersion around the mean. It uses squared deviations. The population parameter is given by \(\sigma^2= \frac{\sum (x_i-\mu)^2}{N}\), while the sample statistic is \(s^2=\frac{\sum (x_i-\bar{x})^2}{n-1}\).

  • The standard deviation measures the average deviation around the mean. It is calculated as the square root of the variance. For the population parameter use \(\sigma=\sqrt{\sigma^2}\) and \(s=\sqrt{s^2}\) for the sample statistic.

  • The Mean Absolute Deviation (\(MAD\)) measures the average deviation from the mean. This measure uses absolute deviations. It is calculated by \(MAD=\frac{\sum |x_i-\mu|}{N}\) for the population and \(mad=\frac{\sum |x_i-\bar{x}|}{n}\) for the sample.

  • The coefficient of variation \(CV=s/\bar{x}\) adjusts the standard deviation for differences in units of measure or scale.

Portfolio Assesment

To asses the performance of a portfolio calculate:

  • The mean return of the portfolio \(\alpha\bar{R}_1+(1-\alpha)\bar{R}_2\), where \(\alpha\) is the weight of investment 1 in the portfolio and \(\bar{R}_i\) is the average return of investment \(i \in\){\(1\),\(2\)}.

  • The variance of the portfolio is given by \(\begin{bmatrix} \alpha \\ 1-\alpha \end{bmatrix}^T \begin{bmatrix} s_x^2 & s_{xy} \\ s_{xy} & s_y^2 \end{bmatrix} \begin{bmatrix} \alpha \\ 1-\alpha \end{bmatrix}\)

  • The Sharpe ratio quantifies the excess return of an investment over the risk free return. It is calculated by \(\frac{\bar{R_p}-R_f}{s}\), where \(\bar{R_p}\) is the mean return of the portfolio, \(R_f\) is the risk free return, and \(s\) is the standard deviation.

Useful R Functions

The range() function returns the maximum and minimum of a vector of values.

The diff() function finds the first difference of a vector.

The var() function calculates the sample variance for a vector of values. To calculate the population variance, adjust the result by a factor of \((n-1)/n\).

The sd() function calculates the sample standard deviation.

The matrix() function defines a matrix.

When dealing with matrices, the t() function transposes a vector or matrix, and the operator %*% performs matrix multiplication.

4.2 Exercises

The following exercises will help you practice the measures of dispersion. In particular, the exercises work on:

  • Calculating the range, MAD, variance, and the standard deviation.

  • Using R to calculate measures of dispersion.

  • Calculating and using the Sharpe ratio to select investments.

Answers are provided below. Try not to peak until you have a formulated your own answer and double checked your work for any mistakes.

Exercise 1

For the following exercises, make your calculations by hand and verify results using R functions when possible. Make sure to calculate the deviations from the mean.

  1. Use the following observations to calculate the Range, MAD, Variance and Standard Deviation. Assume that the data below is the entire population.

    70 68 4 98
  2. Use the following observations to calculate the Range, MAD, Variance and Standard Deviation. Assume that the data below is a sample from the population.

    -4 0 -6 1 -3 0

Exercise 2

You will need the Stocks data set to answer this question. You can find this data at https://jagelves.github.io/Data/Stocks.csv The data is a sample of daily stock prices for ticker symbols TSLA (Tesla), VTI (S&P 500) and GBTC (Bitcoin).

  1. Calculate the standard deviations for each stock. Which stock had the lowest standard deviation?
  2. Calculate the MAD. Does your answer in 1. remain the same?
  3. Finally, calculate the coefficient of variation. Any changes to your conclusions?

Exercise 3

Install the ISLR2 package. You will need the Portfolio data set to answer this question. The data has 100 records of the returns of two stocks.

  1. Calculate the mean and standard deviation for each stock. Which investment has higher returns on average? Which investment is safest as measured by the standard deviation?
  2. Use a Risk Free rate of return of 3.5% to calculate the Sharpe ratio for each stock. Which stock would you recommend?
  3. Calculate the average return for a portfolio that has 30% of stock X and 70% of stock Y. What is the standard deviation of the portfolio?

4.3 Answers

Exercise 1

  1. The mean is \(60\), the Range is \(94\), the MAD is \(28\), the variance is \(1186\) and the variance is \(34.44\).

Start by crating a vector to hold the values:

Ex1<-c(70,68,4,98)

The range can be calculated by using the range() and diff() functions in R.

(Range<-diff(range(Ex1)))
[1] 94

Next, we can create a table by hand that captures the deviations from the mean. Let’s calculate the mean first:

(Average1<-mean(Ex1))
[1] 60

Now we can use the mean to fill out a table of deviations:

\(x_{i}\) \(x_{i}-\bar{x}\) \((x_{i}-\bar{x})^2\) \(|x_{i}-\bar{x}|\)
70 10 100 10
68 8 64 8
4 -56 3136 56
98 38 1444 38

The variance averages out the squared deviations \((x_{i}-\bar{x})^2\), the MAD averages out the absolute deviations \(|x_{i}-\bar{x}|\), and the standard deviation is the square root of the variance.

Let’s verify the variance in R:

SquaredDeviations1<-(Ex1-Average1)^2
AverageDeviations1<-mean(SquaredDeviations1)
var(Ex1)*3/4
[1] 1186

Note that R calculates the sample variance. Hence, we must multiply the result by \(3/4\) to get the population variance. The standard deviation is just the square root of the variance:

sqrt(AverageDeviations1)
[1] 34.43835

Lastly, the MAD is calculated by averaging the absolute deviations \(|x_{i}-\bar{x}|\).

AbsoluteDeviations1<-abs(Ex1-Average1)
mean(AbsoluteDeviations1)
[1] 28
  1. The mean is \(-2\), Range is \(7\), the MAD is \(2.33\), the variance is \(7.6\) and the standard deviation is \(2.76\).

Here is the table of deviations from the mean:

\(x_{i}\) \(x_{i}-\bar{x}\) \((x_{i}-\bar{x})^2\) \(|x_{i}-\bar{x}|\)
-4 -2 4 2
0 2 4 2
-6 -4 16 4
1 3 9 3
-3 -1 1 1
0 2 4 2

We can check the results in R. Let’s start with the variance:

Ex2<-c(-4,0,-6,1,-3,0)
var(Ex2)
[1] 7.6

The standard deviation can be found with the sd() function:

sd(Ex2)
[1] 2.75681

The MAD is given by:

(MAD<-mean(abs(Ex2-mean(Ex2))))
[1] 2.333333

Lastly, the range:

diff(range(Ex2))
[1] 7

Exercise 2

  1. For the sample taken, GBTC has the less variation. The standard deviation of GBTC is \(9.43\), which is less than \(16.57\) for VTI or \(50.38\) for TSLA.

Start by loading the data set from the website. Since the file is in csv format, we will use the read.csv() function.

StockPrices<-read.csv("https://jagelves.github.io/Data/Stocks.csv")

Let’s start with the standard deviation of the Tesla stock. The standard deviation is given by:

sd(StockPrices$TSLA)
[1] 50.38092

Next, let’s find the standard deviation for the S&P 500 or VTI. The standard deviation is given by:

sd(StockPrices$VTI)
[1] 16.5731

Finally, let’s calculate the standard deviation for GBTC or Bitcoin.

sd(StockPrices$GBTC)
[1] 9.434213
  1. The answer is the same, since the MAD for GBTC is \(8.46\) which is lower than \(14.27\) for VTI or \(41.67\) for TSLA.

To calculate the MAD for TSLA we can use the following command:

(MADTSLA<-mean(abs(StockPrices$TSLA-mean(StockPrices$TSLA))))
[1] 41.67163

The MAD for VTI is:

(MADVTI<-mean(abs(StockPrices$VTI-mean(StockPrices$VTI))))
[1] 14.27169

The MAD for GBTC is:

(MADGBTC<-mean(abs(StockPrices$GBTC-mean(StockPrices$GBTC))))
[1] 8.458029
  1. By considering the magnitudes of the stock prices, it seems like VTI is the less volatile stock. VTI has a CV of \(0.08\) which is lower than \(0.44\) for GBTC or \(0.18\) for TSLA. In fact, by CV Bitcoin seems to be the most risky asset.

The coefficients of variations are as follows. For TSLA the CV is:

(CVTSLA<-sd(StockPrices$TSLA)/mean(StockPrices$TSLA))
[1] 0.1793755

For VTI the CV is:

(CVVTI<-sd(StockPrices$VTI)/mean(StockPrices$VTI))
[1] 0.07970004

For GBTC we get:

(CVGBTC<-sd(StockPrices$GBTC)/mean(StockPrices$GBTC))
[1] 0.4442497

Exercise 3

  1. The best performing stock on average is stock X. It has an average return of \(-0.078\)% vs. \(0.097\)% for stock Y. The safest stock is stock X as well, since the standard deviation is \(1.062\) percentage points vs. \(1.14\) percentage points for stock Y.

Start by loading the ISLR2 package:

library(ISLR2)

Next, calculate the mean for stock X:

mean(Portfolio$X)
[1] -0.07713211

and stock Y.

mean(Portfolio$Y)
[1] -0.09694472

Then, calculate the standard deviation for stock X

sd(Portfolio$X)
[1] 1.062376

and stock Y.

sd(Portfolio$Y)
[1] 1.143782
  1. The Sharpe Ratio measures the excess return per unit of risk taken. Stock X has the better Sharpe Ratio. \(-0.106\) vs. \(-0.115\). Stock X is recommended since it provides a higher excess return per unit of risk taken.

To calculate Sharpe Ratios use both the average return, and the standard deviation. For stock X, the Sharpe Ratio is:

(mean(Portfolio$X)-0.035)/sd(Portfolio$X)
[1] -0.1055484

The Sharpe Ratio for stock Y:

(mean(Portfolio$Y)-0.035)/sd(Portfolio$Y)
[1] -0.1153583
  1. The portfolio has an average return of \(-0.091\) which is worse than stock X but better than stock Y. The standard deviation is \(1.00\). This is better than stock X and Y separately. The Sharpe ratio of \(-0.091\) is also better for the portfolio than for each stock individually.

The mean of the portfolio is given by:

(mean_return=0.3*mean(Portfolio$X)+0.7*mean(Portfolio$Y))
[1] -0.09100094

The covariance matrix is given by:

(risk<-cov(Portfolio))
          X         Y
X 1.1286424 0.6263583
Y 0.6263583 1.3082375

Using the matrix we can now calculate the standard deviation:

(standard<-sqrt(t(c(0.3,0.7)) %*% (risk %*% c(0.3,0.7))))
         [,1]
[1,] 1.002838

Finally, the Sharpe ration for the portfolio is:

mean_return/standard[1]
[1] -0.09074338