QRMIAS: Fourth Meeting

Quantitative Research Methods – Introduction to Applied Statistics

DAVID SICHINAVA, RATI SHUBLADZE
November 1, 2017

Fourth Meeting

Today's meeting

  • Causality
    • Observation
  • Confounding bias
  • Before-and-after design
  • Difference-in-Difference estimate
  • Observational studies:
    • Statistics for one variable
    • Quantiles
    • Root of mean squares (RMS)
    • Standard deviation

Again about causality...

  • “Cause and effect must be contiguous in space and time…”
  • “The same cause always produces the same effect, and the same effect never arises but from the same source…” David Hume. A Treatise of Human Nature

The role of randomization

  • Randomization creates homogeneous groups, therefore the difference between the two groups could be attributed to the treatment
  • It eliminates selection bias

Observational studies

  • However, in many cases we cannot randomly administer treatment on research participants
  • In these cases, we rely on observational data
    • Pros: high external validity
    • Cons: low internal validity, that is, we may fail to properly explain the underlying causal mechanism.

Card & Krueger (1994)

  • In 1992, New Jersey decided to raise minimum wage from $4.25 to $5.05
  • Classic economic theory predicts that it could reduce employment

Card & Krueger (1994)

  • In order to identify causal mechanism, we need another New Jersey where minimum wage stayed at pre-1992 level.
  • However, as this is impossible, Card and Krueger used Pennsylvania as a reference (control).

Card & Krueger (1994)

  • “Treatment group”: fast food restaurants in New Jersey (NJ)
  • “Tontrol group”: fast food restaurants in Pennsylvania (PA)

Card & Krueger (1994)

  • Treatment intervention: increase of minimum wage.

Card & Krueger (1994)

Drawing

Card & Krueger (1994)

minwage <- read.csv("https://davidsichinava.github.io/introstatsr/pages/m4/data/minwage.csv")

### Or download it manually from here:
### https://goo.gl/rgbAxj

minwage <- read.csv("minwage.csv")

Card & Krueger (1994)

dim(minwage) ### Dimensions of the table

summary(minwage) ### Descriptive statistics

Card & Krueger (1994)

variable Description
chain name of the fast-food restaurant chain
location location of the restaurants (centralNJ, northNJ, PA, shoreNJ, southNJ)
wageBefore wage before the minimum-wage increase
wageAfter wage after the minimum-wage increase?
fullBefore number of full-time employees before the minimum-wage increase
fullAfter number of full-time employees after the minimum-wage increase
partBefore number of part-time employees before the minimum-wage increase
partAfter number of part-time employees after the minimum-wage increase

Card & Krueger (1994):

  • Difference between the proportion of full-time employes could indicate on the effect of minimum wage increase.
  • Dependent variable: proportion of full-time employees

Card & Krueger (1994)

## Subset the data for each state
minwageNJ <- subset(minwage, subset = (location != "PA"))
minwagePA <- subset(minwage, subset = (location == "PA"))

Card & Krueger (1994)

## create a variable for proportion of full-time employees in NJ and PA
minwageNJ$fullPropAfter <- minwageNJ$fullAfter / (minwageNJ$fullAfter + minwageNJ$partAfter)
minwagePA$fullPropAfter <- minwagePA$fullAfter / (minwagePA$fullAfter + minwagePA$partAfter)

## compute the difference-in-means
mean(minwageNJ$fullPropAfter) - mean(minwagePA$fullPropAfter)

Card & Krueger (1994)

## create a variable for proportion of full-time employees in NJ and PA
minwageNJ$fullPropAfter <- minwageNJ$fullAfter / (minwageNJ$fullAfter + minwageNJ$partAfter)
minwagePA$fullPropAfter <- minwagePA$fullAfter / (minwagePA$fullAfter + minwagePA$partAfter)

## compute the difference-in-means
mean(minwageNJ$fullPropAfter) - mean(minwagePA$fullPropAfter)

Confounding variable bias

  • A pretreatment variable that is associated with both the treatment and the outcome variables is called a confounder and is a source of confounding bias in the estimation of the treatment effect.
  • Statistical control
    • Subclassification

Confounding variable bias

prop.table(table(minwageNJ$chain))
prop.table(table(minwagePA$chain))

Confounding variable bias

## Burger-Kings?
minwageNJ.bk <- subset(minwageNJ, subset = (chain == "burgerking"))
minwagePA.bk <- subset(minwagePA, subset = (chain == "burgerking"))

mean(minwageNJ.bk$fullPropAfter) - mean(minwagePA.bk$fullPropAfter)

Card & Krueger (1994)

Drawing

Confounding variable bias

## Location?

minwageNJ.bk.subset <- subset(minwageNJ.bk, subset = ((location != "shoreNJ") & (location != "centralNJ")))
mean(minwageNJ.bk.subset$fullPropAfter) - mean(minwagePA.bk$fullPropAfter)

Before and after design

  • Longitudinal / panel data yield more credible results on comparisons between treatment and control groups
  • The before-and-after design examines how the outcome variable changed from the pretreatment period to the posttreatment period for the same set of units. The design is able to adjust for any confounding factor that is specific to each unit but does not change over time. However, the design does not address possible bias due to time-varying confounders.

Before and after design

## Full-time employees in the starting period
minwageNJ$fullPropBefore <- minwageNJ$fullBefore / (minwageNJ$fullBefore + minwageNJ$partBefore)

## Difference-in-means
NJdiff <- mean(minwageNJ$fullPropAfter) - mean(minwageNJ$fullPropBefore)
NJdiff

Difference-in-difference design

Drawing

Difference-in-difference design

Drawing

Difference-in-difference design

## Penn: difference in means
minwagePA$fullPropBefore <- minwagePA$fullBefore / (minwagePA$fullBefore + minwagePA$partBefore)

## NJ: difference in means
PAdiff <- mean(minwagePA$fullPropAfter) - mean(minwagePA$fullPropBefore)

## Diff-in-diff
NJdiff - PAdiff

Descriptive statistics

## Median difference between states
median(minwageNJ$fullPropAfter) - median(minwagePA$fullPropAfter)
## Difference-in-medians between states
NJdiff.med <- median(minwageNJ$fullPropAfter) - median(minwageNJ$fullPropBefore)
NJdiff.med
## Median difference-in-difference
PAdiff.med <- median(minwagePA$fullPropAfter) - median(minwagePA$fullPropBefore)
NJdiff.med - PAdiff.med

Descriptive statistics

summary(minwageNJ$wageBefore)

summary(minwageNJ$wageAfter)

IQR(minwageNJ$wageBefore)

IQR(minwageNJ$wageAfter)

quantile(minwageNJ$wageBefore, probs = seq(from = 0, to = 1, by = 0.1))

Descriptive statistics (RMSE)

Drawing

Descriptive statistics (RMSE)

sqrt(mean((minwageNJ$fullPropAfter - minwageNJ$fullPropBefore)^2))
mean(minwageNJ$fullPropAfter - minwageNJ$fullPropBefore)

Descriptive statistics (RMSE)

Drawing

Descriptive statistics(standard deviation, variance)

sd(minwageNJ$fullPropBefore)
sd(minwageNJ$fullPropAfter)
var(minwageNJ$fullPropBefore)
var(minwageNJ$fullPropAfter)