10  Probability III

Studying continuous probability distributions is crucial for understanding and predicting outcomes in uncertain scenarios where variables can take on any value within a range, rather than being limited to distinct, countable outcomes. These distributions are widely used in fields such as engineering, economics, natural sciences, and machine learning to model phenomena like time to failure, stock prices, rainfall amounts, or human heights. By mastering continuous probability distributions, analysts and decision-makers can assess uncertainties, estimate probabilities of events, and develop data-driven strategies to address real-world challenges. Below, we introduce some popular continuous distributions and their practical applications.

10.1 Continuous Random Variables

Continuous random variables are characterized by their probability density function \(f(x)\). The probability density function does not directly provide probabilities!

The probability of a continuous random variable assuming a single value is zero. Instead, probabilities are defined for intervals. These are calculated by areas under the PDF curve (integral).

10.2 Uniform Distribution

The uniform probability density function is given by: \[f(x)= \frac {1}{b-a}\] when \(a \leq x \leq b\) and \(0\) otherwise. Here \(b\) is the upper limit of the distribution and \(a\) is the lower limit. The expected value of the uniform distribution is:

\[E(x)= \frac {a+b}{2}\]

The variance of the uniform distribution is

\[var(x)= \frac {(b-a)^2} {12}\] Example: Consider the travel time between NY and CHI. Typically, this trips can take anywhere from 120-140 minutes. The specific time of a particular flight is unpredictable, but assumed to be anywhere between this interval. We assume a uniform distribution with upper limit of 140 and lower limit of 120. The probability that the flight take 122 minutes is zero since the interval has an infinite amount of possible values. The probability of the flight taking 130 minutes or less is 0.5. \[f(x<=130)=\frac{130-120}{20}=0.5\] We can confirm this result in R using the punif() function. Below is the code:

punif(130,120,140)
[1] 0.5

10.3 Normal Distribution

The normal PDF is given by

\[f(x)= \frac {1}{\sigma \sqrt{2\pi}} e^{\frac {-1}{2} (\frac {x-\mu}{\sigma})^2}\] where \(\mu\) is the mean, \(\sigma\) is the standard deviation, \(\pi\) is 3.1415… , and \(e\) is 2.7282… . The normal distribution has the following properties:

  • It is symmetrical about the mean \(\mu\).

  • The mean is at the middle and divides the area of the distribution into halves.

  • The total area under the curve is equal to 1.

  • The distribution is completely determined by its mean and standard deviation.

The standard normal has a mean of \(0\) and a standard deviation of \(1\). Otherwise, it has the exact same properties as the normal distribution.

Example: Dr Tires is planning on offering a mileage guarantee on each set of tires they sell. They are considering a 40,000 mile guarantee. Given that the mean tire mileage is 36,500 miles with a standard deviation of 5,000 miles, we can estimate that the probability that a tire lasts more than 40,000 miles 24.2%. We can confirm this using R.

1-pnorm(40000,36500,5000)
[1] 0.2419637

10.4 Exponential Distribution

The exponential distribution is useful in computing probabilities for the time it takes to complete a task. It describes the time between events in a Poisson process.

The probability density function is given by \[f(x)=1/{\mu} \times e^{\frac{-x}{\mu}}=\lambda e^{-{\lambda} x}\] Example: Let \(x\) represent the loading time for a truck at the Dock Dash loading dock. If the average loading time is 15 minutes, the probability that loading a truck will take 6 minutes or less is 32.97%.

pexp(6,1/15)
[1] 0.32968

We can also estimate that the probability that it takes between 6 and 18 minutes is 36.91%.

pexp(18,1/15)-pexp(6,1/15)
[1] 0.3691258

10.5 Triangular Distribution

The triangular distribution is characterized by a single mode (the peak of the distribution) and two boundaries. It is often used in situations where the lower and upper bounds of a potential outcome are known, but the exact likelihood of the outcome is uncertain.

The probability density function is given by \[f(x)=\frac {2(x-a)}{(b-a)(c-a)}\] for \(a \leq x < c\)

\[f(x)=\frac {2}{(b-a)}\] for \(x=c\) and

\[f(x)=\frac {2(b-x)}{(b-a)(b-c)}\]

for \(c < x \leq b\), and \(f(x)=0\) otherwise. The expected value of the distribution is \[E(x)= \frac {a+b+c}{3}\] The variance of the triangular distribution is

\[var(x) = \frac {a^2+b^2+c^2-ab-ac-bc}{18}\]

Example: Bite Bliss is planning a new store in Williamsburg. It is estimated that the minimum weekly sales are 1000 and the maximum is 6000. They also estimate that the most likely outcome is around 3000. The probability that future sales will be between 2000 and 2500 is 12.5%.

library(extraDistr)
ptriang(2500,1000,6000,3000)-ptriang(2000,1000,6000,3000)
[1] 0.125

10.6 Useful R Functions

To calculate the density of continuous random variables use the dunif(), dnorm(), and dexp() functions. For the triangular distribution use the extraDistr package and the dtriang() function.

To calculate probabilities of continuous random variables use the punif(), pnorm(), pexp(), and ptriang() functions.

To calculate quartiles of continuous random variables use the qunif(), qnorm(),qexp(), and qtriang() functions.

To calculate generate random variables based on continuous random variables use the runif(), rnorm(), rexp(), and rtriang() functions.

10.7 Exercises

The following exercises will help you practice some probability concepts and formulas. In particular, the exercises work on:

  • Calculating probabilities for continuous random variables.

  • Calculating the expected value and standard deviation.

  • Applying the uniform, normal, and exponential distributions.

Answers are provided below. Try not to peak until you have a formulated your own answer and double checked your work for any mistakes.

Exercise 1

For the following exercises, make your calculations by hand and verify results with a calculator or R.

  1. A random variable \(X\) follows a continuous uniform distribution with minimum of \(-2\) and maximum of \(4\). Determine the height of the density function \(f(x)\), the mean, the standard deviation, and calculate \(P(X \leq -1)\).
Answer

The height of the density function \(f(x)=0.1667\), the mean is \(1\), standard deviation is \(1.73\), and \(P(X \leq -1)=0.1667\).

\(f(x)\) can be easily estimated by using the formula of the continuous uniform random variable. \(f(x)=\frac{1}{b-a}\). Using R as a calculator we find:

1/(4-(-2))
[1] 0.1666667

The mean is given by \(\mu = \frac{a+b}{2}\). In R we determine that the mean is:

(-2+4)/2
[1] 1

The standard deviation is \(\sigma = \sqrt {\frac{(b-a)^2}{12}}\). Using R we find:

sqrt((4-(-2))^2/12)
[1] 1.732051

Finally, we can find the probability of \(Z\) being less than \(-1\) by using the punif() function:

punif(-1,-2,4)
[1] 0.1666667
  1. Your internet provider will arrive sometime between 10:00 am and 12:00 pm. Suppose you have to run a quick errand at 10:00 am. If it takes \(15\) minutes to run the errand, what is the probability that you will be back before the internet provider arrives? What if you take \(30\) minutes?
Answer

The probability that you will arrive on time is \(0.875\). If the time of the errand is 30 minutes, then the probability goes down to \(0.75\).

There is a \(120\) minute interval in which the IP can arrive. The density function is given by \(f(x)=1/120\). Using R we can find \(P(X>15)\):

punif(15,0,120,lower.tail=F)
[1] 0.875

Once more we can find \(P(X>30)\):

punif(30,0,120,lower.tail=F)
[1] 0.75

Exercise 2

  1. A random variable \(Z\) follows a standard normal distribution. Find \(P(-0.67 \leq Z \leq -0.23)\), \(P(0 \leq Z \leq 1.96)\), \(P(-1.28 \leq Z \leq 0)\) and \(P(Z > 4.2)\).
Answer

\(P(-0.67 \leq Z \leq -0.23)=0.158\), \(P(0 \leq Z \leq 1.96)=0.475\), \(P(-1.28 \leq Z \leq 0)=0.4\) and \(P(Z > 4.2) \approx 0\).

Use the pnorm() function to find the probabilities. \(P(-0.67 \leq Z \leq -0.23)\):

pnorm(-0.23)-pnorm(-0.67)
[1] 0.157617

\(P(0 \leq Z \leq 1.96)\)

pnorm(1.96)-pnorm(0)
[1] 0.4750021

\(P(-1.28 \leq Z \leq 0)\)

pnorm(0)-pnorm(-1.28)
[1] 0.3997274

\(P(Z > 4.2)\)

options(scipen=999)
pnorm(4.2,lower.tail = F)
[1] 0.00001334575
  1. Let \(Y\) be normally distributed with \(\mu=2.5\) and \(\sigma=2\). Find \(P(Y>7.6)\), \(P(7.4 \leq Y \leq 10.6)\), a \(y\) such that \(P(Y>y)=0.025\), and a \(y\) such that \(P(y \leq Y \leq 2.5)=0.4943\).
Answer

\(P(Y>7.6)=0.005386\), \(P(7.4 \leq Y \leq 10.6)=0.0071\), a \(y\) such that \(P(Y>y)=0.025\) is \(6.42\), and a \(y\) such that \(P(y \leq Y \leq 2.5)\) is \(-2.56\).

Let’s use once more the pnorm() function in R.

\(P(Y>7.6)\)

pnorm(7.6,2.5,2,lower.tail = F)
[1] 0.005386146

\(P(7.4 \leq Y \leq 10.6)\)

pnorm(10.6,2.5,2)-pnorm(7.4,2.5,2)
[1] 0.007117202

\(y\) such that \(P(Y>y)=0.025\)

qnorm(0.025,2.5,2,lower.tail = F)
[1] 6.419928

\(y\) such that \(P(y \leq Y \leq 2.5)=0.4943\). Note that \(2.5\) is the mean. Hence we are looking for a \(y\) that has \(0.5-0.4943=0.0057\) on the left:

qnorm(0.0057,2.5,2)
[1] -2.560385
  1. Assume that football game times are normally distributed with a mean of \(3\) hours and a standard deviation of \(0.4\) hour. What is the probability that the game lasts at most \(2.5\) hours? Find the maximum value for a game to be in the bottom \(1\)% of the distribution.
Answer

The probability is \(10.56\)%. A game lasting no more than \(2.069\) hours would be in the bottom \(1\)%.

Let’s use pnorm() once more in R.

pnorm(2.5,3,0.4)
[1] 0.1056498

For the threshold we can use qnorm():

qnorm(0.01,3,0.4)
[1] 2.069461

Exercise 3

  1. Random variable \(S\) is exponentially distributed with mean of \(0.1\). What is the standard deviation of \(S\)? What is \(P(0.10 \leq S \leq 0.2)\)?
Answer

The standard deviation is equal to the mean \(0.1\). \(P(0.10 \leq S \leq 0.2)=0.2325\)

Let’s use pexp() in R:

pexp(0.2,rate = 10)-pexp(0.1,rate = 10)
[1] 0.2325442
  1. A tollbooth operator has observed that cars arrive randomly at a rate of \(360\) cars per hour. What is the mean time between car arrivals? What is the probability that the next car will arrive within ten seconds?
Answer

The mean time between car arrivals is \(1/360=0.002778\). The probability that the next car will arrive within the next 10 seconds is \(0.6321\).

Once more we use pexp() in R:

pexp(1/360,360)
[1] 0.6321206