Quantitative Research Methods: Introduction to Applied Statistics

David Sichinava, Rati Shubladze
December 27, 2017

Eleventh Meeting

Today's plans

Uncertainty
- Estimation
- Hypothesis Testing

Estimation

Parameter and Estimator
Random Sample: We try to estimate unknown parameter of population

Estimation: Ideal case

Estimation error = \( \overline{X}_{n} - p \)
However, the true value of the population is never known. Thus, we try to calculate what will be the average magnitude of error.

Estimation: Ideal case

Suppose, we are conducting the same experiment (survey, observation etc). In that case, our estimation errors will be independent from each other. Hence, it is random variable

\( \mathbb{E}(\overline{X}_{n}) = \frac{1}{n}\sum \mathbb{E}(X_{i}) = \mathbb{E}(X) \)

Estimation: Sample and Population mean experimental effect

Unbiased and consistent estimation
- If expectation equals the parameter;
- If it converges to the parameter as the sample size increases;
In randomized controlled trials, the average outcome difference between the treatment and control groups is an unbiased estimator of the sample average treatment effect (SATE).
The estimator is also unbiased and consistent for the population average treatment effect (PATE).

The standard error of the estimate

When an unbiased estimator with a large degree of variability is of little use in practice
In that case we should look at the standard error of the estimate

Confidence intervals

In order to study the properties of an estimator, we should consider characterizing the entire sampling distribution rather than its mean and standard deviation.
Here we need to utilize Theoretical Distribution. In many case we can use Central limit theorem and assume,that sampling distribution of the sample mean is approximately normally distributed:.
We can also apply yet another measure - Confidence interval

Confidence intervals

Confidence interval is a range, where, in beforehand excising probability we can observ the parameter that is interesting to us:
\( CI(\alpha) = [\overline{X}_{n}-z_{\alpha/2}*se; \overline{X}_{n}+z_{\alpha/2}*se; ] \)

Confidence intervals

n <- 1000 # Sample
x.bar <- 0.6 #  Point estimate
s.e. <- sqrt(x.bar * (1 - x.bar) / n) # Standard error
## 99% CI
c(x.bar - qnorm(0.995) * s.e., x.bar + qnorm(0.995) * s.e.)

## 95% CI
c(x.bar - qnorm(0.975) * s.e., x.bar + qnorm(0.975) * s.e.)
## 90% CI
c(x.bar - qnorm(0.95) * s.e., x.bar + qnorm(0.95) * s.e.)

Experiment analysis

STAR <- read.csv("STAR.csv", head = TRUE)
hist(STAR$g4reading[STAR$classtype == 1], freq = FALSE, xlim = c(500, 900),
ylim = c(0, 0.01), main = "Small class",
xlab = "Fourth-grade reading test score")
abline(v = mean(STAR$g4reading[STAR$classtype == 1], na.rm = TRUE),
col = "blue")
hist(STAR$g4reading[STAR$classtype == 2], freq = FALSE, xlim = c(500, 900),
ylim = c(0, 0.01), main = "Regular class",
xlab = "Fourth-grade reading test score")
abline(v = mean(STAR$g4reading[STAR$classtype == 2], na.rm = TRUE),
col = "blue")

Experiment analysis

n.small <-
sum(STAR$classtype == 1 & !is.na(STAR$g4reading))
est.small <- mean(STAR$g4reading[STAR$classtype == 1], na.rm = TRUE)
se.small <- sd(STAR$g4reading[STAR$classtype == 1], na.rm = TRUE) /
sqrt(n.small)
est.small
se.small
## estimate and standard error for regular class
n.regular <- sum(STAR$classtype == 2 & !is.na(STAR$classtype) &
!is.na(STAR$g4reading))

Experiment analysis

est.regular <- mean(STAR$g4reading[STAR$classtype == 2], na.rm = TRUE)
se.regular <- sd(STAR$g4reading[STAR$classtype == 2], na.rm = TRUE) /
sqrt(n.regular)
est.regular

Experiment analysis

alpha <- 0.05
## 95% confidence intervals for small class
ci.small <- c(est.small - qnorm(1 - alpha / 2) * se.small,
est.small + qnorm(1 - alpha / 2) * se.small)
ci.small
## [1] 719.6417 727.1406
## 95% confidence intervals for regular class
ci.regular <- c(est.regular - qnorm(1 - alpha / 2) * se.regular,
est.regular + qnorm(1 - alpha / 2) * se.regular)
ci.regular

Experiment analysis

ate.est <- est.small - est.regular
ate.se <- sqrt(se.small^2 + se.regular^2)
ate.ci <- c(ate.est - qnorm(1 - alpha / 2) * ate.se,
ate.est + qnorm(1 - alpha / 2) * ate.se)
ate.ci

Experiment analysis: Student T-Test distribution

c(est.small - qt(0.975, df = n.small - 1) * se.small,
est.small + qt(0.975, df = n.small - 1) * se.small)
ci.small
c(est.regular - qt(0.975, df = n.regular - 1) * se.regular,
est.regular + qt(0.975, df = n.regular - 1) * se.regular)
ci.regular

t.ci <- t.test(STAR$g4reading[STAR$classtype == 1],
STAR$g4reading[STAR$classtype == 2])
t.ci