Chapter 10 Estimation
When we have a random sample, we make an assumption about the true distribution that the population which from which the sample follows. We can use a point estimate, which we formally refer to as a statistic to estimate parameter values of an assumed distribution.
10.1 Estimation from Normal Distribution
If we have \(n\) random variables from a population that follows the form
\[ X_i \stackrel{iid}\sim N(\mu_i, \sigma_i^2) \qquad i = 1,...,n \]
Then a sample which we assume comes from that distribution, can be described by the following statistics, which help us describe the sample/population.
Distribution for sum of any linear combination \(Y\) is also normal, with the same mean and variance:
\(Y \sim N(\Sigma a_i\mu_i, \Sigma a_i^2 \sigma_i^2)\)
Sample distribution mean of this linear combination is also normal, and has the same mean, with a sample variance that is corrected for bias.
\(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)
Sampling distribution for the variance is a Chi-squared Distribution
\(\frac{(n-1)s^2}{\sigma^2} \sim \chi^2(n-1)\)
Sampling distribution for estimating the true \(\mu\) and \(\sigma^2\) (when it is not known), we use a Student’s t Distribution. It is a standard normal divided by the square root of a Chi-squared distribution.
\(\frac{\bar{X}-\mu}{\frac{s}{\sqrt{n}}} \sim t(n-1)\)
10.2 Estimation without Normal Association
How do we estimate parameters without the assumption that the data comes from a normal distribution? We use the central limit theorem
When we do not know if a random sample (\(X_1, X_2,..,X_n\)) is normally distributed we can use the CLT to conclude that the distribution of the sample mean is also normal.
For any distribution the sample mean \(\bar{X}\) has a mean \(\mu\), and sample variance \(\frac{\sigma^2}{n})\). We use the CLT to approximate (using many samples) that the distribution of the sample will converge to a normal distribution.
When \(n\) is large enough, we can assume that the distribution of sample is normally distributed. We often hear that the rule is that if \(n > 30\) the distribution of \(\bar{X}\) will be approximately normal, however the rate of convergence depends on the original distribution.