📝

Connection between OLS and Differences in Averages

In cases where the primary variable of interest is binary, the regression of any outcome variable on this binary variable is just a difference in means or the naive difference. But don’t let me just tell you this to make you believe it; let’s derive it.

Let’s say we have the following regression:

$Y = \alpha + \delta D + \varepsilon$ , where:

Y is the dependent variable,
D is a binary (indicator) variable.
$\alpha$ is the intercept
$\delta$ is the coefficient of interest, and
$\epsilon$ is the error term.

Step 1: Taking Expectations

Since D is binary, we compute the expected value of $Y$ Conditional on D:

When $D = 0$ :

$E[Y | D = 0] = \alpha + \delta(0) + E[\varepsilon | D=0] = \alpha$

Assuming $E[\varepsilon | D=0] = 0$

When $D = 1:$

$E[Y | D = 1] = \alpha + \delta(1) + E[\varepsilon | D=1] = \alpha + \delta$

Assuming $E[\varepsilon | D=1] = 0$

These assumptions are essentially a version of unbiasedness

Step 2: Difference in Means Interpretation

The coefficient $\delta$ represents the difference in the expected values of $Y$ for the two groups:

$E[Y | D=1] - E[Y | D=0] = (\alpha + \delta) - \alpha = \delta.$

Thus, $\delta$ captures the difference in the mean outcome between the treatment group $D=1$ and the control group $D=0$

Now notice that without the assumptions the subtraction above would be

E[Y | D=1] - E[Y | D=0] = (\alpha+\delta+E[\varepsilon | D=1]) - (\alpha+E[\varepsilon | D=0]) \\ =E[\varepsilon | D=1] -E[\varepsilon | D=0]+\delta

If we don’t assume unbiasedness, then the difference in means give us the causal effect $\delta$ plus bias, and here it is written in a different way. This would also be true for regression.