Sebastian Tello
  • Home
  • CV
  • Contact
  • Research
  • Resources
  • RMDA
  • APP
InstagramBluesky
📝

Enter the Black Box: How to Obtain Beta (Part 1 - No Covariates)

How to obtain beta part 1.pdf115.3KB

Using the worksheet on “calculating beta by hand” we learned how to obtain beta in OLS, and in short one uses the following equation

β^=∑i=1N(Xi−Xˉ)(Yi−Yˉ)∑i=1N(Xi−Xˉ)2\hat{\beta}=\frac{\sum^{N}_{i=1} ({X_i-\bar{X}})({Y_i-\bar{Y}})}{\sum^{N}_{i=1}({X_i-\bar{X}})^2}β^​=∑i=1N​(Xi​−Xˉ)2∑i=1N​(Xi​−Xˉ)(Yi​−Yˉ)​

Now, notice what the expression on the top and the bottom represent.

❓
What do those expression represent?
‣
Answer

Getting to work

Knowing these concepts, and the interpretation of regression where the main variable is discrete, let’s go over the following example. For the following data:

You can use this link to download it

https://www.dropbox.com/scl/fo/vzm55313fdzl1ned2s56i/h?rlkey=774jljwee9r6kg3wsx49ifsw8&dl=0

The data is called:

battenincome.dta2.5KB

Mean Comparison

Imagine that we are trying to find the effect of batten on income. We can run the following regression:

Incomei=β0+β1Batteni+ϵiIncome_i=\beta_0+\beta_1Batten_i+\epsilon_iIncomei​=β0​+β1​Batteni​+ϵi​

Let’s say for the sake of the example that Income is an average of earnings for the 4 years after Batten.

image
  1. The value of β1=29.22\beta_1=29.22β1​=29.22. Obtain this value using averages.
  2. ‣
    Answer
  3. The value of β1=29.22\beta_1=29.22β1​=29.22. Obtain this value using the following Formula β1=Cov(Batten,Income)Var(Batten)\beta_1=\frac{Cov(Batten,Income)}{Var(Batten)}β1​=Var(Batten)Cov(Batten,Income)​
  4. ‣
    Answer
  5. The value of β1=29.22\beta_1=29.22β1​=29.22. Obtain this value using a regression
  6. ‣
    Answer

What these three exercise should show you, are ways of obtaining the same number. They all carry the same interpretation but seeing how to obtain them is important. You can do this for a continuous variable and it should work as well. Notice that for a continuous variable using average could get really complex because you would have to do it for each value change, therefore method (2) seems the most sensible to use for the continuous variable