7️⃣

Homework 7

Purpose

The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The concepts we will practice in this homework relate to difference-in-difference.

This homework has the most “real” feeling of what data analysis is like: thinking about a policy, downloading data, cleaning it, analysis, and providing conclusions! I hope you feel accomplished at the end of this homework because of all the progress you’ve made! Think of what your abilities were on data-analysis before starting the program and think of them now, and I hope that you see progress on yourself!

Guidelines

Work will be independent.
Submit your answers to gradescope (within Canvas).
We encourage you to use the boxes, PDFs, JPGs, and PNGs are preferable over word documents or cvs. Recall you can always save something as a PDF. You can also “Screenshot” anything. You can do this in windows using the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
Submit your do-file to gradescope (within Canvas).
You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
Your responses should be professionally formatted and written.
The due date is May 1st at 10:00 pm EDT.

Preamble

One of the first exercise we started in class was talking about health insurance. Specifically, we did an in-class worksheet on Medicaid. Medicaid is a state program that’s federally and state-funded which provides health coverage for low-income children, and low-income childless adults. As we’ve learned one of the main purposes of a program like Medicaid is to mitigate financial risk, and we should evaluate its merits based on financial outcomes. However, as we’ve noted on this worksheet, a lot of the conversation of the “merit” of Medicaid is whether it impacts healthcare access and health. In this homework, we’ll get to do a real-world evaluation of that.

In order to evaluate Medicaid, we’ll need to use some sort of reform in which either Medicaid is expanded or contracted and then seeing what happens to overall measures of health and healthcare access. For example, let’s say a given state reduces their Medicaid eligibility for some reason, we could then compare health outcomes in that state before and after this change, to a control group of states. That’s exactly what we’ll head into. In Tennessee, in 2005 due to budget cuts, the state of Tennessee cut eligibility for about 170,000 individuals. You can read more of the context on the paper attached to the homework files. The gist is that in 1994 Tennessee was the first state to extend Medicaid eligibility to childless adults. Before 1994 and in other states (even today!) childless adults no matter their income, wouldn’t be eligible. However in 1994, Tennessee decided to expand eligibility. Forward a couple of years, and the government of Tennessee asked the consulting firm McKinsey to produce a budget analysis. The report analysis concluded that the rising money spend on Medicaid was a problem and that something needed to be done to tame it: either more revenue (i.e. more taxes) or spend less on the program (either fewer benefits, or fewer beneficiaries). Note: I’m not actually sure why their focus was on contraction of Medicaid rather than other program or services, but the focus was on Medicaid for some reason. After this report there was a gubernatorial election in Tennessee, and the main talking point was “managing healthcare in the state”. Philip Norman Bredesen Jr. (D) won that election, partly because of his business experience with a pharmaceutical company. His main idea for reforming Medicaid in Tennessee was to terminate eligibility for childless adults (the one that started in 1994) in addition to other cuts in some benefits. This reform led to the disenrollment of several people in Medicaid in 2005. This reform allow us to then compare the outcomes of people in Tennessee before and after the disenrollment compared to a control group (other southern states) before and after 2005. That’s what we’ll do with real data.

We will try to replicate main results from a paper of mine “Effects of losing public health insurance on preventative care, health, and emergency department use: Evidence from the TennCare disenrollment” which you can find here. The data for the homework you can find there as well.

Getting acquiescence with the data

The data for this homework comes from the Behavioral Risk and Factor Surveillance Survey (BRFFS). You could download the raw data from this link and get it to a state for analyzing it, but we’ve already done some of the work of cleaning it and appending the years together, and you can find that under brfss_final.dta on the dropbox folder.

The Behavioral Risk Factor Surveillance System (BRFSS) is a nationwide health-related telephone survey conducted by the Centers for Disease Control and Prevention (CDC). It gathers data from U.S. adults on health-related behaviors, chronic health conditions, and use of preventive services. Started in 1984, it's the largest continuously conducted health survey in the world, collecting data annually in all 50 states, D.C., and U.S. territories. For this analysis, we’ve downloaded the data and append a number of years together, although the data level is at the individual level, the comparisons we are making we’ll be treating the data as if were a state-year panel.

Graphing the Big Picture

We’ll start by graphing some big picture trends in health insurance and on health outcomes for this sample. The first thing we have to check with this data is if we actually see the effects of the reform or not. This is essentially showing graphical evidence of the main lever of the reform: reducing the share of people on Medicaid. If we see this, then we know this reform had a “bite” and if not, then maybe something went wrong on the implementation or it didn’t do what was intended. It’s always the first step to check. Here are ☝🏽Some Tips when replicating some of the graphs.

☝🏽Some Tips

Replicate Figure 1 from the paper. For this replication we will use the dta “tenncare_enrollment”, which has the actual count of the number of people enrolled in Tenncare over the years. Before 2005 the data is only reported annual, and starting in 2005 the data is reported by month. We’ve obtained these data from PDF reports from Tennessee. Nowadays this task is easier because AI models make it easier to translate PDF data into excel files. For your convenience here is the figure we are trying to replicate. Submit code & Figure.
Using administrative data, we’ve confirmed that people were disenrolled from TennCare (the Medicaid program in Tennessee). Now let’s try to re-create that similar graph but with BRFSS. Notice that brfss only have a variable indicating if people were insured or not (any insurance, not just Medicaid). By creating a graph of health insurance rate over time, and comparing to other southern states, we will be creating our first step towards a DD. Notice that even though people may have lost Medicaid, people could get health insurance through many different ways, so it’s possible that health insurance rates may not have decreased, even though Medicaid rates did. Create a graph that shows how the percent of people with health coverage varies over time in Tennessee vs other southern states. The years for the graph should be from 2000 through 2009. The data has more years, but let’s stop in 2009 to avoid confounding the effects of the recession. The next thing is people 65 and over have access to Medicare, so we don’t want to confound that effect, so only keep people of ages 21-64. The y-axis should represent the % of people with health coverage. One line should be the percentage for the state of Tennessee and the other one should be the average percentage across other southern states. Which states should we use? As always, it’s good to go with an “external” definition rather than our own. The U.S census has a formal definition for states in the south, which you can find here. We’ve created a sample graph for you to replicate first with the variable age, and you can then use the same steps to create a graph for % of people with health insurance. Make sure to account for “Weights” which we talk about on the “some tips” page. Make the y label of your final graph range from 70% to 100%. Submit code & Figure.

‣

Sample graph for age

Just be looking at the graph of health-insurance, what will be your assessment of the parallel trends assumption?

Doing Difference-in-Difference: Health Coverage

We’ll now use DD regressions to estimate the causal effects of the TennCare Disenrollment. Recall the sample restrictions we are using: southern states, ages 21-64 and 2000-2009.

The simplest DD specification has four regressors:

(1)\ \ \ \ \ Y_{it}=\alpha +\beta TN_{it}+\gamma Post_{t}+\delta (TN_{it}\times Post_t)+\epsilon_{it}

where $TN$ identifies people $i$ living in Tennessee in year $t$ , and $Post$ identifies years from the post-reform period, so it takes the value of 1 if the reporting comes from August 2005 onward and 0 otherwise. For questions 1- 7, you are about to run a total of 6 regressions and at the end we are asking you to make 1 table, so we recommend reading the whole set of instructions and then writing your code appropriately. Create all the variables that are needed and make sure you add the weighting option across all your regressions: reg y x [aw=_finalwt]

Present a table with all the models. Create a table that shows the main results from the regressions below. Each column should be a different regression, and the rows should have the main coefficients (not the fixed effects) with their standard errors and asterisks to indicate their statistical significance. Make sure to add footnotes to explain the asterisks, and any other important notes. (Hint: use the esttab command). Present the code and the table.

Model 1: Simple DD as equation 1
Model 2: Simple DD as equation 1 without weights
Model 3: Model 1 + Controls
Model 4: Model 1 + Controls + Year FE
Model 5: Model 1 + Controls + Year FE + Month FE
Model 6: Model 1 + Controls + Year FE + Month FE + State FE

Model 1. Run the DD regression in (1) using the cover dummy as your outcome variable. Presents the results in the table above. For this question, interpret each of the parameter estimates ( $\alpha,\beta,\gamma,\ and\ \delta$ ) in words that everyone can understand, don’t use jargon from class. Which parameter captures the causal effect of the TennCare Disenrollment on coverage? Report the main result of this regression in a technical way.
Model 2. Run the same DD regression but now don’t use weights, what are the differences?
Model 3. Adding controls: Now let’s add controls to the regression (the one with weights) to absorb some potential bias, add controls for race, gender, education levels, marital type and age, namely the following variables: black hispanic_me other female education1 education2 education3 education4 education5 age maritaltype2 maritaltype3 maritaltype4 maritaltype5 maritaltype6. Talk about the differences between model 1 and the model with controls. What is the sign of the bias from adding all of these covariates?
Model 4. Add in a full set of year fixed effects – i.e. a separate dummy variable for each year in the sample. Note that we’ll still keep our post variable mostly because post is define at the year-month level, not at the year level. If it were define at the year level, we would replace the post by the year-fixed effects.

(2)\ \ \ \ \ Y_{it}=\alpha +\beta TN_{i}+\gamma Post_{t} + \vec{\gamma_t}+\delta (TN_{i}\times Post_t)+\epsilon_{it}

How does this model differ conceptually from the regression in (1)? That is, what do the year effects $\vec{\gamma_t}$ capture that the single $Post$ dummy did not? Does this change alter your estimated effect of TennCare Disenrollment on coverage?

Model 5. Now add to the model before, month FE. Does this change alter your estimated effect of TennCare Disenrollment on coverage?
Model 6. As seen in our graph above, coverage rates vary widely across states. Add state fixed effects to your model with year FE. Answer the following questions: How do these controls alter the estimated effect of TennCare Disenrollment on Health Coverage? Do these results strengthen or weaken your confidence in the DD strategy?
Notice that Model 6, should replicate exactly the result from Table 2, Column “Has health insurance (BRFSS)”. The coefficient & sample size should be exactly the same if you did it right. The standard errors and p-values may not be exactly the same. Given the information on the paper and that you were able to replicate the first panel of this column, replicate the findings from the panel “DD Model Adults with Children” and “DD Model Adults without children”. You’ll know you’ll have the right answer once you replicate both numbers. Report here a table with the 2 models from Table 2 Columns “Has health insurance (BRFSS)”.
Creating an event study. Graphing an event study is important for assessing parallel trends assumptions. For the coverage outcome run the model from Model 6 only for childless adults (the main people treated by the reform). However replace the interaction of TN and Post by the interaction of the TN with every single year dummy. You will notice that instead of having one $\delta$ , we have several, each representing a $\delta_t$ for every single year. This specification is the event study. Pick the year 2000 as base, which in STATA you set by typing fvset base 2000 year before running the regression. Where $\vec{\gamma}_t$ represents year fixed-effects, $\vec{S_s}$ represents state fixed-effects, and $\vec{M_m}$ represents months fixed effects.

(4)\ \ \ \ \ Y_{it}=\alpha +\beta TN_{i}+\vec{\gamma_t}+\vec{\delta_t} (TN_{i}\times \vec{\gamma_t})+\vec{S_s} +\vec{M_m}+\phi Controls_{it}+ \epsilon_{it}

Present the results of the event study for the outcome “cover” in a graph. This means making a plot where year is the x-axis and you are plotting the $\delta_t$ from the regression, so the y-axis is the possible ranges of the coefficients. Make sure you also plot their confidence intervals. Hint: We are essentially replicating the graph from the paper Figure 2a, so you’ll know if you’ve done this correct once you have a similar graph. Check out the commands regsave or coefplot or check this worksheet on how to use them.
How are the coefficients from $\delta_t$ related to the graph of health insurance that you made above?
Given the results from the graph, what is your assessment of parallel trends? Compare your answer here to the answer from the graph with raw-trends

‣

For once you are done

Extensions

Given the clean data, see if you can replicate the same sample with the same variables but downloading it from BRFSS directly.
As we’ve talked before, understanding that losing public health insurance has detrimental effects on health is important, but we are also interested in the financial effects. This dataset doesn’t offer great financial data, hence we’ve partner with the Fed of Philly to access their credit score data and we write about the financial effects of losing public health insurance in this paper.
Notice that main motivation for this reform was to save the state money, so if we wanted to talk about “overall” welfare, we would have to balance the “savings” the government had, with the implied cost of worse health and financial effects people may have. This is where you can use your econ-concepts to really trying to make a statement of “is this worth it?” or “does it actually save money in the long-run?”. There are so many steps to answer such a question for a given policy and hopefully you’ve gained some appreciation that carefully thinking about policy may not a be straightforward as some people think it is!