Chapter 7 Special Discrete Distributions

7.1 Bernoulli Random Variables

\[ X \sim Bern(p) \qquad R_x = \{0,1\} \]

A special case of the binomial distribution for a single trial, i.e., \(n=1\).

Bernoulli trial is an experiment with two possible outcomes. A Bernoulli random variable itself uses these two possible outcomes, namely success and failure where the random variable takes the values of these outcomes and defines them to be \(X(\text{Success}) = 1\) and \(X(\text{Failure}) = 0\), respectively.

PMF

\[ \begin{equation} p(x) = P(X = x)= \begin{cases} p & \text{if } x = 1\\ 1 - p = q & \text{if } x = 0\\ 0 & \text{if } x \not\in R_x \end{cases} \end{equation} \]

Parameter Details

\[ 0 \le p \le 1 \]

Spread / Dispersion Equations

\[ E(X) = p \qquad Var(X) = p(1-p) \qquad \sigma_x = \sqrt{p(1-p)} \]

Additional Notes

  • A Bernoulli random variable is an indicator function.

  • Alternative expressions for the Bernoulli PMF include:

\[ p(x)=\mathbb{P} (X = x) = p^x (1-p)^{(1-x)} \quad x \in \{0,1\} \]

  • Bernoulli PMF takes on two discrete values, and it jumps abruptly from one value to the other at the point where the value of the random variable changes from 0 to 1. This can be shown graphically with its CDF. This is similar to other discrete random variables.

  • Bernoulli can be generalized to more than 2 outcomes.


7.2 Binomial Random Variables

\[ X \sim Binom(n,p) \quad R_x = \{0,1,2,...,n\} \]

A binomial random variable is used for determining the number of successes in a fixed number of i.i.d. Bernoulli trials, which is the sum of independent Bernoulli trials **/indicator random variables** \(X = X_1 + X_2 + X_3 + X_4 + ... + X_n\)

i.i.d. means that each of the random variables that we sum to find \(X\) (as shown above) are independent, and have the same distribution (which in this case is \(X_n ~ Bern(p)\)), all of which have the same fixed probability of success.

PMF

\[ \begin{equation} p(x) = P(X= x) = \begin{cases} \binom{n}{x}p^x(1-p)^{n-x} & \text{if } x = 0,1,2,...,n\\ 0 & \text{if } x \not\in R_x \end{cases} \end{equation} \]

Parameter Details

\(p\): probability of success

\(n\): number of trials

Parameters \(n\) and \(p\) represent the number of successes in \(n\) independent Bernoulli trials, where each trial has a fixed probability of success, denoted by \(p\).

Spread / Dispersion Equations

\[ E(X) = np \quad Var(X) = np(1-p) \quad \sigma_x = \sqrt{np(1-p)} \]

Additional Notes

  • Flipping a coin 10 times and counting the number of heads would be an example of a binomial random variable. Calculating where X = 3 (or any value of x in the support) would be a result obtained from the random variable.

7.3 Geometric Random Variables

\[ X \sim Geom(p) \qquad R_x = \{1,2,3,...\} \]

A geometric random variable is used to find the number of trials that are needed to get the first success in a Bernoulli process (a sequence of *iid* Bernoulli trials is called a Bernoulli process). In other words, a geometric random variable represents the number of failures that occur before the first success in a sequence of independent Bernoulli trials.

PMF

\[ \begin{equation} p(x) = P(X= x) = \begin{cases} p(1-p)^{x-1} & \text{if } x = 1,2,...,n\\ 0 & \text{if } x \not\in R_x \end{cases} \end{equation} \]

Parameter Details

0 < p < 1

p each obs the same probability of success, namely p

Spread / Dispersion Equations

\[ E(X) = \frac{1}{p} \qquad Var(X) = \frac{1-p}{p^2} \quad \sigma_x = \frac{\sqrt{1-p}}{p} \]

Additional Notes

  • Note that Bernoulli random variable is a single experiment, binomial is n experiments, and a geometric is infinitely many experiments.

  • There is a case where there are infinite trials without success.

  • Often used to model situations where you repeatedly perform a binary experiment.

  • Holds the Memoryless Property: “independent trials do not have a memory”, considering what happens upon conditioning a geometric random. Applies to the geometric distribution as \(P( X > a + b | x > a ) = P ( x > b )\)

  • \(E(X)\) is obviously the average number of trials– knowing the past does not affect the future.

  • The distribution can also be used to denote the number of failures before the first success, in which case the support would obviously begin at 0 instead of 1, and the PMF would look also look a bit different.


7.4 Negative Binomial Random Variables

\[ X \sim Negative Binom(r,p) \qquad R_x = {r, r + 1, r + 2, r + 3, ...} \]

A negative binomial random variable is a generalization of a geometric random variable, where \(X\) represents the number of trials/experiments until the \(rth\) success occurs

Parameter Details

r, p

PMF

\[ p(x; r,p) = P(X= x) = \binom{x-1}{r-1}p^r(1-p)^{x-r}, \qquad 0 < p < 1, \qquad x = r, r + 1, r + 2, ... \]

Spread / Dispersion Equations

\[ E(X) = \frac{r}{p} \qquad Var(X) = \frac{r(1-p)}{p^2} \quad \sigma_x = \frac{\sqrt{r(1-p})}{p} \]

Additional Notes

  • many alternative formulations
  • aka the pascal distribution

7.5 Hypergeometric Random Variables

\[ X \sim HyperGeometric(K, N, n) \qquad R_x = \{0,1,2,...\} \]

n are drawn at random and without replacement.

PMF

\[ P_X(k)= P(X = k) = \frac{{K \choose k} {N-K \choose n-k}}{N \choose n} \]

Parameter Details

  • \(N\) is the total population size
  • \(K\) is the number of individuals in the population that have the attribute of interest
  • \(n\) is the sample size
  • \(k\) is not a parameter, it is the number of individuals in the sample that have the attribute of interest

Spread / Dispersion Equations

\[ E(k)=\frac{nK}{N} \qquad Var(k)=\frac{nK}{N}\frac{N-K}{N}\frac{N-n}{N-1} \]

\[ E(k) = \sum_{i=1}^{n} k_i P(k_i) \]

\[ Var(k) = \sum_{i=1}^{n} (k_i - E(k))^2 P(k_i) \]

Additional Notes


7.6 Poisson Random Variables

\[ X \sim Pois(\lambda) \quad R_x = \{0,1,2,...\} \]

A Poisson random variable can be used to approximate a binomial random variable if \(n\) is large and \(p\) is small. (e.g. lottery tickets sold and winner tickets). This is because a Poisson probability mass function is the limit of a binomial probability mass function.

Aside from approximating the binomial distribution, The Poisson distribution appears in connection with the study of sequences of random events occurring over time. Where the number of events occur in a fixed time interval, events occur independently and with a constant rate.

PMF

\[ \begin{equation} p(x) = p(X=x)=\frac{e^{-\lambda}\lambda^x}{x!}, \quad x = 0,1,2,... \end{equation} \]

Parameter Details

\[ \lambda \in (0,\infty) \quad,\quad \lambda = np \]

Spread and Dispersion

\[ E(X) = np = \lambda \qquad Var(X) = (\lambda + \lambda^2) - \lambda^2 = \lambda \qquad \sigma_x = \sqrt{\lambda} \]

Additional Notes

  • \(np \le 10\) and \(p < 0.1\)

  • Poisson process is a stochastic process that models the arrival of events over time.

  • When used as an approximation, events can have weak dependence.