Research Methodology in Social Sciences

Dr. David Sichinava
October 19, 2018

Third Meeting

Through this class, we will use frequentist understanding of probability, that is:
- “the proportion of times the outcome would occur in a very long sequence of observations”
- Usually, probabilities are expressed in numbers between 0 and 1, or - in percentages.

Probability has many properties, although the following are the most important (for us):

Drawing

Let's imagine that we are doing an experiment or drawing a sample from population;
Each possible outcome of the variable has a certain probability of occuring
We call these variables random variables
Set of these possible outcomes is called probability distribution. Based on probability theory (and math) we can estimate these values.

plot of chunk unnamed-chunk-3

plot of chunk unnamed-chunk-4

Probability distributions have formulas for calulating probabilities:
- Mean, measuring the expected value of a distribution,
- Standard deviation, measuring the spread of a distribution

Drawing

In normal distribution, the probability of \( \mu \) within any particular number of \( \sigma \) standard deviations is the same (0.68 for one \( \sigma \), 0.95 for two \( \sigma \) and 0.997 for three \( \sigma \))
For normal distribution, for each value of \( z \), the probability of falling within \( z \) standard deviation from the mean depends on the value of \( z \). For instance, the value of two standard deviations above the mean has z-value of 2.00
Put it simply, \( z \)-score is the number of standard deviations from the mean
It is an useful way of calculating a probability of a particular value in the distribution (and comparing it to other groups)
\( z=(y-\mu)/\sigma \)

We rarely have complete data and rely on samples
Sampling distribution of a statistic is the probability distribution that specifies probabilities for the possible values that statistic can take
That is, we take many samples from population, and then calculate the value of sample statistics
The sampling distribution of the sample mean \( \hat\y \) centers around population mean. Standard deviation, or standard error relates to the population \( \sigma \) by \( \sigma_{y}=\sigma/\sqrt{n} \).
- What will happen, if we increase n?

For large random samples, the sampling distribution of the sample mean is approximately normal. This holds no matter what the shape of the population is.
It's one of the foundational principles of frequentist statistical inference

Drawing