Chapter 6 Probability Distributions (Part 2)

We cover measures of central tendency like expected value, and measures spread like variance because they allow us to recognize characteristics of a distribution. These are called the parameters of a distribution.

6.1 Expected Value

\[ E[X] = \sum_{x \in S} xp(x) \tag{Discrete} \]

\[ E[X] = \int_{S}{xf(x)}dx \tag{Continuous} \]

Properties of Expected Value

\[\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]\]

\[\mathbb{E}[c] = c \quad \text{(where } c \text{ is a constant)}\]

\[\mathbb{E}[X_1 + X_2 + \ldots + X_n] = \mathbb{E}[X_1] + \mathbb{E}[X_2] + \ldots + \mathbb{E}[X_n]\]

Additional Notes

  • What if the discrete set is infinitely countable? The sum needs to converge absolutely for E(X) to exist.
  • Notation: \(\mathbb{E}[X]\) and \(\mu_x\) can be used interchangeably
  • When deriving properties of expectation, it is crucial to understand the \(\mathbb{E}[X]\) is a fixed constant.

6.2 Variance

The variance of a random variable is the expected value (i.e. the average) of the squared deviation from the mean.

\[ \\ Var(X) = \sum_{x \in A}(x-\mu)^2p(x) = \mathbb{E}([X − \mathbb{E}[X]^2) \tag{Discrete} p(x) \]

\[ Var(x) = \int_{-\infty}^{\infty}{\sum_{x \in A}(x-\mu)^2}f(x)dx = \int_{-\infty}^{\infty}{\mathbb{E}([X − \mathbb{E}[X]^2)}dx \tag{Continuous} \]

Shortcut formula for variance

\[ \mathbb{E}[X^2]-\mathbb{E}[X]^2 \]

Properties of Variance

\[\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2abCov(X,Y)\] \[ \text{Var}(aX + b) = a^2\text{Var}(X) \]

\[\text{Var}(c) = 0 \quad \text{(where } c \text{ is a constant)}\]

Additional Notes

  • Properties of variance tend to be derived from expectation

  • It is implicit that expectation of \(X\) is defined, however even if \(E(X)\) exists it is possible that \(Var(X)\) is infinite.

  • If independent, the covariance is \(0\).

  • Variance is quadratic.


6.3 Law of the Unconscious Statistician (LOTUS)

LOTUS shows the relation between expected value of random variable \(X\) and how we can use it to calculate \(E[g(x)]\).

\[ E[g(X)] = \sum_{x \in R_x} g(x)p(x) \]

\[ \mathbb{E}[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f_X(x) \, dx \]

Additional Notes

  • Implies expected value of a transformed can be found using the original random variable, i.e. there is no need to compute the transformed CDF or PDF to find it’s expected value.
  • Implies linearity of expectation

6.4 Moments

Moments are used to describe the shape of a random variable distribution, moments are quantitative measures.

  • The first moment is expected value

  • The second central moment is variance

  • The standardized third central moment of \(X\) is skew

  • Kurtosis is a measure that is based on the fourth moment and the variance of \(X\).

Additional Notes

  • The existence of higher moments implies the existence of lower moments
  • \(E[X^n]\) - \(n\)th moment of \(X\)

6.5 Transformations of Random Variables

If we know a given a random variable, we may want to understand how we can use how can we form new random variables. This is transformation.

If we know the mean and standard deviation of the original distributions, we can use that information to find the mean and standard deviation of the resulting distribution. Consider adding a constant to a random variable. The mean would change, but variance would be unaffected. The distribution simply shifts by the amount of which was added to it.

\[ Y = X +k \\ \mu_Y = X + k \\ \sigma^2_Y = \sigma^2_X \]

Next, consider multiplying by a constant and a random variable. This scales the mean and the variance by the factor. Understand that the overall area of the distribution can not change, as this would effect the definition of our PMF or PDF. So if you stretch some area horizontally by a factor of 3, it would triple the area. Therefore you should compress the area vertically by the same amount in order to get the same area you started with.

For example, if a random variable \(X\) which has f(x) values: 1,2,3,4,5. If we scale \(X\) by 2, we have a Y = 2X, which has the f(y) values 2,4,6,8,10. E(Y) is 6.

\[ Y = kX \\ \sigma^2_Y = k\sigma^2_X \\ \mu_Y = k\mu_X \]