Quantitative Research Methods - Introduction to Applied Statistics

David Sichinava, Rati Shubladze
January 10, 2018

Thirteenth Meeting

Today's plan

Uncertainty
- Hypothesis testing;
- One and two samples t-test;
- Statistical power;
Linear regression with uncertainty

Hypothesis testing

Statistical hypothesis testing employs a probabilistic approach to proof whether an event or a phenomenon exists
- Or, how random the event is
Proof by contradiction
- Reductio ad absurdum
Null hypothesis \( H_{0} \)
- Sharp null hypothesis: assesses all potential outcomes
- Nonsharp null hypothesis: assesses the average outcome of an experiment

Hypothesis testing

Null hypothesis ⇒ Test statistic ⇒ Reference statistic ⇒ Calculating the probability of a test statistic occuring in the reference statistic

Hypothesis testing

Result	Rejecting \( H_{0} \)	Retaining \( H_{0} \)
\( H_{0} \) True	Type I error	True
\( H_{0} \) False	True	Type II error

Hypothesis testing

How to decide whether to reject or retain null hypothesis?
- We should quantify the degree to which the observed value of the test statistic is unlikely to occur under the null hypothesis
- p-values

Hypothesis testing

Little p-value, What are you trying to say, Of significance?

Stephen T. Ziliak, Roosvelt University

Hypothesis testing

Significance;
The probability that under the null hypothesis, we observe a value of the test statistic at least as extreme as the one we actually observed;
Example: say, we are testing a new vaccine and did an experiment. After the experiment, the treatment group saw the decrease in temperature, however, the p-value was 0.04. It means that if the vaccine was not effective, we would receive the same difference between treatment and control groups in the 4% of all future experiments.
\( p \)-value DOES NOT MEASURE THE PROBABILITY OF A MISTAKE!!!

Hypothesis testing

As it was in the case of confidence intervals, we choose the level of confidence
- Usually, 95% or 99%

Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 236(767), 333-380.

Hypothesis testing

P-value for one sample and two samples tests

Hypothesis testing

One sample test:
- Population mean equals to a particular value;
Two samples test:
- Means of two populations are equal;

Hypothesis testing and confidence intervals

z-scores for confidence intervals:
- \( \frac{\overline{X}_{n}-\mathbb{E}_{X}}{Standard\ Error} \)
z-scores for \( p \)-values:
- \( \frac{\overline{X}_{n}-\mu_{0}}{Standard\ Error} \)

STAR-project

One sample t-test

t.test(STAR$g4reading, mu = 710)

STAR-project

Two samples t-test

t.test(STAR$g4reading[STAR$classtype == 1],
STAR$g4reading[STAR$classtype == 2])

Study of labor discrimination

resume <- read.csv("resume.csv")
x <- table(resume$race, resume$call)
prop.test(x, alternative = "greater")

Hypothesis testing: what has not to be done

“Publication bias”
“Multiple testing”

Linear regression with uncertainty

minwage <- read.csv("minwage.csv")
## compute proportion of full-time employment before minimum wage increase
minwage$fullPropBefore <- minwage$fullBefore /
(minwage$fullBefore + minwage$partBefore)
## same thing after minimum-wage increase
minwage$fullPropAfter <- minwage$fullAfter /
(minwage$fullAfter + minwage$partAfter)
## an indicator for NJ: 1 if it’s located in NJ and 0 if in PA
minwage$NJ <- ifelse(minwage$location == "PA", 0, 1)

Linear regression with uncertainty

## -1 Removes intercept and creates indicator variables for each category
fit.minwage <- lm(fullPropAfter ~ -1 + NJ + fullPropBefore +
wageBefore + chain, data = minwage)
## regression result
fit.minwage

Linear regression with uncertainty

summary(fit.minwage)
confint(fit.minwage)["NJ", ]