Using the worksheet on “calculating beta by hand” we learned how to obtain beta in OLS, and in short one uses the following equation
Now, notice what the expression on the top and the bottom represent.
Where, X is the variable “attached” to beta.
Getting to work
Knowing these concepts, and the interpretation of regression where the main variable is discrete, let’s go over the following example. For the following data:
You can use this link to download it
The data is called:
Mean Comparison
Imagine that we are trying to find the effect of batten on income. We can run the following regression:
Let’s say for the sake of the example that Income is an average of earnings for the 4 years after Batten.
- The value of . Obtain this value using averages.
- The value of . Obtain this value using the following Formula
- The value of . Obtain this value using a regression
. sum income if batten==1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 9 184.4444 27.88867 140 220
. sum income if batten==0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
income | 9 155.2222 27.11908 125 195
. display 184.4444 - 155.2222
29.2222
. * First we obtain the covariance between Batten and Income
.
.
. correlate batten income, covariance
(obs=18)
| batten income
-------------+------------------
batten | .264706
income | 7.73529 938.147
. * Then we obtain the variance of Batten
. sum batten, det
batten
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 18
25% 0 0 Sum of Wgt. 18
50% .5 Mean .5
Largest Std. Dev. .5144958
75% 1 1
90% 1 1 Variance .2647059
95% 1 1 Skewness 0
99% 1 1 Kurtosis 1
. * Now we divide:
. display 7.73529/ .2647059
29.222205
. reg income batten
Source | SS df MS Number of obs = 18
-------------+---------------------------------- F(1, 16) = 5.08
Model | 3842.72222 1 3842.72222 Prob > F = 0.0386
Residual | 12105.7778 16 756.611111 R-squared = 0.2409
-------------+---------------------------------- Adj R-squared = 0.1935
Total | 15948.5 17 938.147059 Root MSE = 27.507
------------------------------------------------------------------------------
income | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
batten | 29.22222 12.96672 2.25 0.039 1.734006 56.71044
_cons | 155.2222 9.168855 16.93 0.000 135.7851 174.6593
------------------------------------------------------------------------------
What these three exercise should show you, are ways of obtaining the same number. They all carry the same interpretation but seeing how to obtain them is important. You can do this for a continuous variable and it should work as well. Notice that for a continuous variable using average could get really complex because you would have to do it for each value change, therefore method (2) seems the most sensible to use for the continuous variable