This worksheet will try to help us understand a bit more fixed-effects from a data-approach.
*First, we'll open a dataset
webuse nlswork
* Get yourself familiar with the data. These data contains information on individual's wages and education.
* Let's collapse some variables, so the data just looks like averages per year
collapse wks_work hours tenure ttl_exp, by(year)
* Now let's look at the data, specially for hours and years
list year hours
* This tell us the average hours per year. Now let's explore fixed-effects
* First, we'll run a regression of hours on year fixed effects
reg hours i.year
* Explore each of coefficient of the year, and let's try to interpret them.
* Now let's figure out what's the base year that STATA picked
reg hours i.year, base
* In my stata it seems that base year is 68, if its not in your STATA, then let's set it
fvset base 68 year
reg hours i.year, base
* So now that 68 is the base, you'll notice the coefficient on that year is 0, b* ut that means then that the constant - in this regression - should be the aver* age hours for the year 68. Check that that's the case!
* Now that you've done that, let's move on to the coefficient on 69: .597126
* This means that the year 69 is .597126 "away" from the base year 68.
* Let's test that let's add up the constant to that coefficient
fvset base 68 year
reg hours i.year, base
display _b[_cons]+_b[69.year]
. 37.947197
sum hours if year==69
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
hours | 1 37.9472 . 37.9472 37.9472
* This works! This means that the coefficient on 69 is capturing the deviation from year 68, in year 69.
* Do the same for the rest of the coefficients.
* Now let's plot this coefficients to see if we can detect a pattern.
* install coefplot if you don't have it installed
* you can take out the scheme(plotplainblind) if you want.
reg hours i.year, base
coefplot, keep(*.year) vertical recast(connected) xlabel(,angle(45)) ylabel(,angle(horizontal)) scheme(plotplainblind)
* SO this tell us how hours worked has evolved over the years, and it seems that there is a big "bump" in year 69 (relative to 68) and then decreased. Let's now plot the raw data and see how it looks.
twoway (connected hours year, sort), scheme(plotplainblind)
You can notice how the fixed effects are capturing the averages changes from a baseline.
Now let’s practice a different approach of understanding FE with “individual” fixed effects.
webuse nlswork
* Let's declare the panel
xtset id year
estimates clear
eststo: reg hours i.year
coefplot est1 , keep(*.year) vertical recast(connected) xlabel(,angle(45)) ylabel(,angle(horizontal)) scheme(plotplainblind)
This tell us that hours worked has decreases since 69, after a bump from 68 to 69. Now we want to compare this trend to how the trend would look if we were to include individual level fe. The idea is that individual level FE would be capturing time-invariant characteristics of the individual (such as race, birth year, etc), and therefore could be capturing bias or selection into the patterns we see over time. So if the FE matter we would see something different from the line above. If they don’t then we would see the lines being the same. Let’ see if they do or don’t
webuse nlswork, clear
xtset id year
estimates clear
eststo: reg hours i.year
eststo: xtreg hours i.year, fe
coefplot est1 est2, keep(*.year) vertical recast(connected) xlabel(,angle(45)) ylabel(,angle(horizontal)) scheme(plotplainblind) legend(order(1 "Year FE" 2 "Individual FE"))
Not surprisingly, they matter; we can interpret the line from “est 2” as the changes over time in hours worked once we’ve accounted for individual time-invariant characteristics. Why is it different from the line with just year fe? (est1). This could be for many reasons. For example, it could imply that the composition of who is reporting the hours worked may change in a way that correlates with individual-level characteristics. Look at the drop in hours worked from 77 to 83. It is more pronounced once we account for individual-level FE than year FE. This means that within individuals, we see large changes in hours relative to just looking at the “Year FE” line. Why is this?
Imagine that “who” is in our data changes over time. Say, for example, that over time there are fewer white individuals, and white individuals work fewer hours on average than other races, then this could be confounding our relationship over time because as white individuals “leave” the sample, the average hours worked will be higher as - on average- these individuals work fewer hours, but if we look at changes “within” individuals, we may actually notice that on average individuals are working fewer hours. So the compositional effect may be pushing things “upward” as we see in est1 vs. est2 for years 77,78, etc.
Let’s see if this is right. First, let’s see how hours worked varies by race:
webuse nlswork
collapse hours, by(year race)
twoway (connected hours year if race==1, sort) (connected hours year if race==2, sort) (connected hours year if race==3, sort), scheme(plotplainblind) legend(order(1 "White" 2 "Black" 3 "Other"))
Ok, this does seem to point at the fact that white individuals work, on average fewer hours than black or “other race” individuals. Now let’s see how many of them are in our sample over time.
webuse nlswork, clear
tab race, gen(racedummy)
collapse racedummy*, by(year)
twoway (connected racedummy1 year, sort) (connected racedummy2 year, sort) (connected racedummy3 year, sort), scheme(plotplainblind) legend(order(1 "White" 2 "Black" 3 "Other"))
Ok maybe there are some different patterns, but hard to see. We can do better by using year FE to see how composition changes over time. We are interested in knowing if there are “less” white individuals over time, so we can do the following
webuse nlswork, clear
tab race, gen(racedummy)
reg racedummy1 i.year
coefplot, keep(*.year) vertical recast(connected) xlabel(,angle(45)) ylabel(,angle(horizontal)) scheme(plotplainblind)
This graph shows our point much more clearly. Over the years 75, 75, etc we see the composition changing and fewer whites. This means that hours worked should go up in the year-fe model. You can follow the rest of the patterns.
If this theory were true, if we were to control for “race,” then the line with “year FE + race dummies” should be “closer” to the “individual FE” line than then just “year FE” line. Let’s test that
webuse nlswork, clear
xtset id year
estimates clear
eststo: reg hours i.year
eststo: xtreg hours i.year, fe
eststo: xtreg hours i.year i.race
coefplot est1 est2 est3, keep(*.year) vertical recast(connected) xlabel(,angle(45)) ylabel(,angle(horizontal)) scheme(plotplainblind)
This seems to pan out. The line with individual FE “est2” is on the bottom. Adding race has made us closer to the individual FE model, but not there quite yet. That means that individual-level FE is still capturing other potential confounders, but controlling for race (est3) gets us closer to this line.