We will keep using the data shown in that example and can be found here: https://github.com/dstellotri/rmda
Before we were trying to under how to obtain beta from this model:
Now, we want to understand how changes once we add a covariate. In this case the full model will be:
Let’s go through several methods. Each of this inspire different ways of understanding what a covariate is really doing. What I recommend is going through this and trying to understand from your own perspective the intuition of what “controlling for a variable” is doing.
Regression
- First run the regression using the data and report what the value of is.
* The value of beta 1 is 14.84
reg income batten parents
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(2, 15) = 246.69
Model | 15477.94 2 7738.97 Prob > F = 0.0000
Residual | 470.56 15 31.3706667 R-squared = 0.9705
-------------+---------------------------------- Adj R-squared = 0.9666
Total | 15948.5 17 938.147059 Root MSE = 5.601
-------------------------------------------------------------------------------
income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
batten | 14.84 2.743895 5.41 0.000 8.991526 20.68847
parentsincome | 1.2944 .0672114 19.26 0.000 1.151142 1.437658
_cons | 65.33333 5.027009 13.00 0.000 54.61852 76.04815
-------------------------------------------------------------------------------
Mean Comparison
- How would we obtain the value if we were to use averages?
Parents Income | Batten=1 | Batten=0 | Difference | N |
50 | 142.5 | 129.25 | 13.25 | 6 |
75 | 180 | 165 | 15 | 6 |
100 | 208.75 | 192.5 | 16.25 | 6 |
The difference in each bin of parents income is 13.25, 15 and 16.25. In each of these comparisons, there are 6 observations, so when we average out all the difference we obtain: 13.25 (6/18)+15(6/18)+16.25(6/18)=14.83 Using this method we obtain a beta of 14.83
. tab parentsincome
parentsinco |
me | Freq. Percent Cum.
------------+-----------------------------------
50 | 6 33.33 33.33
75 | 6 33.33 66.67
100 | 6 33.33 100.00
------------+-----------------------------------
Total | 18 100.00
* There are 3 tiers of parents income, so we will make ///
* the mean comparison under each of these tiers
sum income if batten==1 & parentsincome==50
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 2 142.5 3.535534 140 145
sum income if batten==0 & parentsincome==50
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 4 129.25 4.349329 125 135
display 142.5 - 129.25
13.25
sum income if batten==1 & parentsincome==75
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 3 180 5 175 185
sum income if batten==0 & parentsincome==75
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 3 165 5 160 170
display 180 - 165
15
sum income if batten==1 & parentsincome==100
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 4 208.75 8.539126 200 220
sum income if batten==0 & parentsincome==100
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 2 192.5 3.535534 190 195
display 208.75 - 192.5
16.25
display 13.25*(6/18)+15*(6/18)+16.25*(6/18)
14.833333
Using the formula
- Now obtain the value of beta 1 using the following Formula
. reg batten parentsincome
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(1, 16) = 1.28
Model | .333333333 1 .333333333 Prob > F = 0.2746
Residual | 4.16666667 16 .260416667 R-squared = 0.0741
-------------+---------------------------------- Adj R-squared = 0.0162
Total | 4.5 17 .264705882 Root MSE = .51031
-------------------------------------------------------------------------------
batten | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
parentsincome | .0066667 .0058926 1.13 0.275 -.005825 .0191583
_cons | 5.55e-17 .4580176 0.00 1.000 -.9709539 .9709539
-------------------------------------------------------------------------------
* Then we obtain the predicted value of batten given this model
predict battenhat, xb
* Now we substract the actual value of Batten minus the predicted Value,
* we name this variable batten_tilda:
gen batten_tilda=batten-battenhat
* Notice that this is also the same process of obtaining the residuals of
reg baten parents income
predic res_batten, res
correlate batten_til income, covariance
(obs=18)
| batten~a income
-------------+------------------
batten_tilda | .245098
income | 3.63725 938.147
sum batten_til, det
batten_tilda
-------------------------------------------------------------
Percentiles Smallest
1% -.6666667 -.6666667
5% -.6666667 -.6666667
10% -.6666667 -.5 Obs 18
25% -.5 -.5 Sum of Wgt. 18
50% -1.49e-08 Mean -1.32e-08
Largest Std. Dev. .4950738
75% .5 .5
90% .6666666 .5 Variance .245098
95% .6666666 .6666666 Skewness -1.65e-08
99% .6666666 .6666666 Kurtosis 1.3104
display 3.63725/.245098
14.839982
FWL Way
- Here is another method (similar to the one before) in which it shows how obtain the same beta and provides similar intuition. It’s called using the Frisch–Waugh–Lovell theorem.
reg batten parentsincome
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(1, 16) = 1.28
Model | .333333333 1 .333333333 Prob > F = 0.2746
Residual | 4.16666667 16 .260416667 R-squared = 0.0741
-------------+---------------------------------- Adj R-squared = 0.0162
Total | 4.5 17 .264705882 Root MSE = .51031
-------------------------------------------------------------------------------
batten | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
parentsincome | .0066667 .0058926 1.13 0.275 -.005825 .0191583
_cons | 5.55e-17 .4580176 0.00 1.000 -.9709539 .9709539
-------------------------------------------------------------------------------
predict res_batten, res
reg income parentsincome
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(1, 16) = 167.82
Model | 14560.3333 1 14560.3333 Prob > F = 0.0000
Residual | 1388.16667 16 86.7604167 R-squared = 0.9130
-------------+---------------------------------- Adj R-squared = 0.9075
Total | 15948.5 17 938.147059 Root MSE = 9.3145
-------------------------------------------------------------------------------
income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
parentsincome | 1.393333 .1075549 12.95 0.000 1.165327 1.62134
_cons | 65.33333 8.360044 7.81 0.000 47.61083 83.05583
-------------------------------------------------------------------------------
predict res_y, res
reg res_y res_batten
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(1, 16) = 31.20
Model | 917.606684 1 917.606684 Prob > F = 0.0000
Residual | 470.559999 16 29.4099999 R-squared = 0.6610
-------------+---------------------------------- Adj R-squared = 0.6398
Total | 1388.16668 17 81.6568637 Root MSE = 5.4231
------------------------------------------------------------------------------
res_y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
res_batte | 14.84 2.656765 5.59 0.000 9.20791 20.47209
_cons | 8.28e-10 1.278237 0.00 1.000 -2.709741 2.709741
------------------------------------------------------------------------------
* This provides the beta of 14.84