Independent and identically distrubuted
\[F_{X,Y}(x,y) = F_X(x) \times F_Y(y) \forall x, y\]
\[F_x(x) = F_Y(x) \forall x\]
\[\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\]
\[V[\bar{X}] = \frac{V[X]}{n}\]
Example of the weak law of large numbers in action:
Example:
Imagine a population consisting of three units. Each unit has an associated measurement: \(Y_1\), \(Y_2\) and \(Y_3\). You are interested in the average \(Y_{avg} = (Y_1 + Y_2 + Y_3)/3\). You draw a sample of two units without replacement with equal probability and observe their measurements \({Y_a,Y_b}\). You plan to estimate \(Y_{avg}\) with the estimator \(\hat{Y} = \frac{(Y_a + Y_b )}{2}\).
Answers:
a.) Due to linearity of expectations and the fact that \(Y_{avg}\) is given to be the population mean, \(E[\hat{Y} - Y_{avg}] = E[\hat{Y}] - E[Y_{avg}] = E[\hat{Y}] - Y_{avg}\).
\[E[\hat{Y} - Y_{AVG}] = E[\hat{Y}] - Y_{avg} = \frac{\frac{Y_1 + Y_2}{2} + \frac{Y_1 + Y_3}{2} + \frac{Y_2 + Y_3}{2}}{3} - \frac{Y_1 + Y_2 + Y_3}{3} = 0\]
b.) Note that we know know \(E[\hat{Y}]\), so its square is simple to derive. However, \(\hat{Y}^2 \neq \hat{Y}\). So, while \(E[\hat{Y}] = Y_{avg}\), \(E[\hat{Y^2}] \neq E[Y_{avg}^2]\)
\[V[\hat{Y}] = E[\hat{Y^2}] - E[\hat{Y}]^2 = \]
\[\frac{(Y_1 + Y_2/2)^2 + (Y_1 + Y_3/2)^2 + (Y_2 + Y_3/2)^2}{3} - \left(\frac{Y_1 + Y_2 + Y_3}{3}\right)^2 = \] Starting with the first term:
\[\frac{(Y_1 + Y_2/2)^2 + (Y_1 + Y_3/2)^2 + (Y_2 + Y_3/2)^2}{3} = \] \[\frac{(Y_1^2 + Y_2^2 + 2Y_1Y_2 + Y_1^2 + Y_3^2 + 2Y_1Y_3 + Y_2^2 + Y_3^2 + 2Y_2Y_3/4)}{3} = \] \[\frac{2(Y_1^2 + Y_2^2 + Y_3^2 + Y_1Y_2 + Y_1Y_3 + Y_2Y_3)}{12} = \] \[\frac{Y_1^2 + Y_2^2 + Y_3^2 + Y_1Y_2 + Y_1Y_3 + Y_2Y_3}{6}\]
For the second term:
\[\big(\frac{Y_1 + Y_2 + Y_3}{3}\big)^2 = \]
\[\frac{Y_1^2 + Y_2^2 + Y_3^2 + 2Y_1Y_2 + 2Y_2Y_3 + 2Y_1Y_3}{9}\] Recombining and subtracting:
\[3\left(\frac{Y_1^2 + Y_2^2 + Y_3^2 + Y_1Y_2 + Y_1Y_3 + Y_2Y_3}{6}\right) - 2\left(\frac{Y_1^2 + Y_2^2 + Y_3^2 + 2Y_1Y_2 + 2Y_2Y_3 + 2Y_1Y_3}{9}\right) = \] \[\left(\frac{Y_1^2 + Y_2^2 + Y_3^2 - Y_1Y_2 - Y_2Y_3 - Y_1Y_3}{18}\right)\]
c.)
\[E[(\hat{Y} - Y_{AVG})^2] = \]
\[V[\hat{Y}] + (E[\hat{Y}] - Y_{AVG})^2 = \]
\[V[\hat{Y}] + E[\hat{Y}]^2 + Y_{AVG}^2 - 2E[\hat{Y}]Y_{AVG} = \]
\[V[\hat{Y}] + \] \[\frac{Y_1^2 + Y_2^2 + Y_3^2 + 2Y_1Y_2 + 2Y_2Y_3 + 2Y_1Y_3}{9} + \frac{Y_1^2 + Y_2^2 + Y_3^2 + 2Y_1Y_2 + 2Y_2Y_3 + 2Y_1Y_3}{9} - \] \[2\big(\frac{Y_1 + Y_2 + Y_3}{3}\big)\big(\frac{Y_1 + Y_2 + Y_3}{3}\big)\]
\[ = V[\hat{Y}] + 0 =\]
\[V[\hat{Y}] = \left(\frac{Y_1^2 + Y_2^2 + Y_3^2 - Y_1Y_2 - Y_2Y_3 - Y_1Y_3}{18}\right)\]
Example:
There is an infinite population of units from which you draw a simple random sample of n units. Each unit in the population has a measurement \(Yi\). You are interested in the population mean of \(Yi\), and plan to use the sample mean as your estimator. \(Yi = 100\) for 5% of the population, and \(Yi = 0\) for the remaining 95%. Is this estimator consistent?
# Define sample size
n = 10
# Sample from infinite population using rbinom 10,000 times
estimates = vector(mode = "integer", length = 10000)
for (i in seq_along(estimates)) {
rs = (rbinom(n, 1, .05)*100)
estimates[i] = mean(rs)
}
# Calculate the estimation error of each estimate
pop_mean = mean(rep(c(0, 100), c(95, 5)))
error = pop_mean - estimates
See below for an example of the central limit theorem in action. A random generative process has PMF:
\[\begin{equation} f(x) = \begin{cases} \frac{1}{6} & : x = 1 \\ \frac{1}{6} & : x = 2 \\ \frac{1}{6} & : x = 3 \\ \frac{1}{6} & : x = 4 \\ \frac{1}{6} & : x = 5 \\ \frac{1}{6} & : x = 6 \\ 0 & : otherwise \\ \end{cases} \end{equation}\]What is this process? Now let’s observe 5, 10, 100, 1000, and 10,000 outcomes of this random generative process, and record the sample mean. We will repeat this process 5 times for 5 draws, 10 times for 10 draws, 100 times for 100 draws, etc., and examine the distribution of our sample means. A visual depiction is provided below:
Getting ahead, but this is the basis of the plug in principle, which we will discuss next week. The plug in principle allows us to use the sample analogue to estimate population features we are interested in. For example, we can use the sample mean to estimate the expected value, the sample variance to estimate the population variance, etc. Note also the implications of the central limit theorem for statistical inference and hypothesis testing, which we will get to shortly.
Intuition that may help when we begin to discuss hypothesis testing: If we roll a die 5 times and the average value of its roll is 3.7, how certain are we that it is a fair die? How about if we roll a die 10,000 times and the average value of its roll is 3.7? We will return to this later in hypothesis testing.
Another example: Part of the beauty of the central limit theorem is that the original distribution that we are sampling from does not have to be normal for the distribution of its sample means to be distributed normally.
Let’s look at an exponential distribution with mean 50.
Now let’s randomly sample from this exponential distribution 1000 times and take the mean, 50 times.
Let’s try that again, but this time take the mean 100 times.
Now let’s try it 10,000 times.