Before you begin this worksheet please read through “STATA: Using regsave or coefplot” under the Worksheets tab.
The Georgia HOPE program is a scholarship program created in 1993. HOPE scholarships provide Georgian students who meet the minimum GPA with full tuition scholarships to in-state institutions. We want to know the causal effect of this program on attendance of post-secondary institutions in Georgia.
1) Before looking through the data, think about our research question: how does the HOPE scholarship affect in-state college attendance? What should our unit- and time-dimensions of our panel data be? What comparison do we want to make? Who is in our “treated” group? Who is in our “control” group?
2) Download the HOPE_HW data from the Worksheets folder below:
https://www.dropbox.com/scl/fo/93hx3utwgeuasbzd46u5y/h?rlkey=nfqareiu5gpmxvcu9qsfh9ij4&dl=0
Browse the data. What is the current unique level of observation (i.e. what does each row represent)? What years are in our sample? How many states are in our sample?
3) Run a standard difference-in-difference regression model. We are interested in “InCollege” as our outcome. Interpret the coefficients in this context. What is the effect of HOPE scholarships on attendance at Georgia colleges?
Graphing Trends Over Time
We are going to recreate the following graph:
Collapsing the Data
First, we want our level of observation to be states. Collapse the data so that we have college attendance by state for every year.
Reshaping the Data
Browse the data. We have data on the year, whether a state is Georgia or not (”Treatment”), and their college attendance outcome. To graph our treatment and control groups separately, we may want to have different variables capturing their outcomes — we will have to reshape the data.
Note that you can also graph the data while it’s in long format. We will discuss both, but this is to ensure you know how to do both!
Type “help reshape” into Stata. You can reshape data wide or long. Here we want to reshape data wide — we have our “treatment” variable long, but we want to separate them so we have their outcomes based on if Treatment == 0 or Treatment == 1.
In order to reshape wide, we need to know 1) what our “stub” is, 2) what our “id” variables are (i), and what we want to reshape (j).
Since our data is long, we will reshape college attendance rates (stub) so we have attendance rates for Georgians v. Non Georgians (j), for each year (i).
Reshape the data so that we have “InCollege” for non-Georgia states, and “InCollege” for Georgia States, by year.
Graphing the Data
With our reshaped data, we are going to use line graphs to plot the changes in college attendance for our treatment and control group over time. Line graphs are also a twoway graph type, so we can follow the same syntax of: twoway (line y x). With the reshaped data, we can put both of our y-outcomes (for both of our groups) within one line command.
Graph the most simple version
Note: You could also get the same graph without reshaping, by following the same syntax from HW4 where we have multiple lines within twoway, i.e.:
twoway (line Y X if characteristic = 0) (line Y X if characteristic = 1)
instead of:
twoway (line Y1 Y2 X)
From this graph above, we can make aesthetic changes: we can label our legend, change our axes labels, and add titles/axes titles. We can also add a vertical line to indicate when our intervention happened. Try adding a vertical line at 1993.
Event Study
Finally, we are going to create an event study. You can create an event study with either coefplot or regsave. We are going to start with coefplot.
Coefplot will plot the coefficients from the most recent regression you have run.
1) Start by running our DiD regression, and just type coefplot. What does it show you? How can we clean this up — what options do we need to add?
a) Do we want to plot coefficients for just “Georgia,” just “Year,” or our constant? What coefficients do we care about? Use the option “keep” (like you would with esttab) to keep only the coefficients we want to plot:
coefplot, keep(whichever variables you want to keep the coefficients for).
b) In class and in papers, we have primarily seen Event Studies which are oriented the other way: our labels our on the x-axis, and our confidence intervals are vertical. Add another option “vertical”
c) Right now, our points are not connected across time. We want to use the “recast” option to change our plot type — the default plot type for coefplot is a scatterplot. Change this plot type to connected — use “help coefplot” if you are confused about the syntax.
d) We may want to clean up our labels and titles. Add a title for the graph. For our x-labels, we want to angle them so they are not overlapping. Use “help coefplot” to see your options for xlabel — can we change the angle of our labels?
2) Does this event study support the parallel trends assumption?