The following are practice questions for the upcoming exam; we’ve written a hefty 60-ish questions. Just a reminder that some of the answers are didactical as opposed to exactly what we would want in an answer. In addition to the questions here, recall that you can always re-take some of the quizzes, or you can go over the worksheets and turn some of those into questions; the worksheet has several questions to help you think about regression in different ways and of course, if more questions are needed, you can go over homeworks. If that’s not enough, you could also go over lectures through exercises we did in class or that are on the slides. You can also create new exercises by doing things “backward” or in different directions. Finally, you can find even more exercises/questions in the book Real Stats at the end of each chapter. And if that’s not quite enough, the internet is your oyster!
- Warm-up is a great way to start your exercises, so as a warm-up, you can redo the practice for midterm 1 and then do midterm 1 again!
- Recall that these exercises are not exhaustive of all the concepts we’ve seen in class!
Some broad tips on using these tools
- Doing a bunch of questions could be useful, but it could also be not beneficial if you are not training your brain to think carefully. For example, a better way of using tools like this is to first write answers for all without looking at the right answer. Second, discuss the answers with a peer. Finding someone who disagrees with your answer is particularly helpful. Discussing how to approach a question (without looking at the right answer) is a useful exercise that the brain can take advantage of to make things “click.” Finally, have someone grade you and give you a grade without them telling you what you got wrong or right. Then, re-take it or re-do the exercise and have them grade it again until you get 100%. In short, the ideal case is when one never looks at the right answer.
- Notice that when doing any work assignment, especially in a non-school setting, one doesn’t know the “right answer.” You only know how you think you would approach it, and that’s exactly what we are after. So overall, the key is to see the “right answer” as the last thing you do.
- When you have a question that has multiple-choice options (like in the quizzes), go through each option and think about why it’s right or wrong or what you could change it to to make it right.
- “Trivial mistakes.” Sometimes, we look at an answer, realize we made a mistake, and categorize it as a “silly mistake.” Sometimes, this makes sense, but we must be careful about what we categorize as a “silly mistake” and not do anything about it. You want to ask yourself, “How could I change my process to guarantee that this doesn’t happen again?”. For example, if the pilot industry were comfortable with “silly mistakes,” we’d be in a pickle. Their approach is to create a set of “checklists” to ensure the likelihood of making a silly mistake is zero. What does that mean for RMDA? For example, let’s say your silly mistake is “wrong units,” so something that you want to add to your process is “check in what units should be the final answer” as part of the process; you can add that step, maybe at the end. The takeaway from this tip should be: “How could I change my process to guarantee that this doesn’t happen again?”.
- Change the scenarios: You can create more questions out of these questions. For example, change some numbers and re-do problems. Maybe change the Y to other units; how do the results change? Ask yourself: What other questions could we ask given this setting? etc. The practice of making your brain think about other potential questions is the “studying” itself.
- You can use these questions as either assessment or evaluative. We note that some answers are meant to be didactical (teaching moments) rather than answers that get straight to the point. Some questions will say, “Show your work,” but in the answers, we show numbers. It should be understood that one would want to show the process, and the answer is to check if you are using the right method.
- If you plan to use these questions as an “assessment,” I recommend you not study, take these questions, and then go back to studying those topics in which you feel weaker.
- If you plan to use these questions as “evaluative,” I recommend timing yourself. Since the exam is a time-constrained exercise, it’s also good to practice questions with a time constraint.
Practice questions
Let’s make sure your regression interpretation is not rusty. Let’s work through these questions (some that you may have seen before) and try to get 100% before moving to the more conceptual questions.
- For the following questions, refer to the following equation and its respective graph
- What’s the value of ?
- What’s the value of ?
- When do countries tax wealth? Taxes are a big deal. they affect how people allocate their time, how much money government has, etc. Inheritance taxes are a particularly interesting tax policy because of the clear potential for conflict between rich and poor. Scheve and Stasavage (2012) investigated the sources of inheritance taxes by looking at tax policy and other characteristics of 19 countries for which data is available from 1816 to 2000. The data is measured every five years. Specifically the researchers looked at the relationship between inheritance taxes and who was allowed to vote. To assess if expanded suffrage led to increases or decreases in inheritance taxes, we can begin with the following model:
The dependent variable is the top inheritance tax rate, which is measure as a percent (0-100), and the independent variable is a dummy variable for whether all men were eligible to cote in at least half of the previous years.
- What does represents?
- What does represent?
- What’s the average difference in inheritance tax between expanded and not expanded suffrage?
- Let’s say you get:
- What’s the average inheritance tax for countries without expanded suffrage? Recall that units of inheritance tax are on percent from 0-100%
- What’s the average inheritance tax for countries with expanded suffrage?
- What’s the average difference in inheritance tax between expanded and not expanded suffrage?
3. For the following questions refer to the following table. The outcome is inheritance tax rate.
- What’s the marginal effect of having universal suffrage on inheritance tax rate in column (d)?
- Write column (d) as an equation with numbers.
- In an effort to better understand the effects of “Get-out-the-vote” messages on voter turnout, Gerber and Green (2005) conducted an RCT involving approximately 30,000 individuals in New Haven, CT, in 1998. One of the treatments was randomly assigned in person visits in which a volunteer visited the person's home and encouraged him or her to vote. Table 3 reflects the findings from the RCT.
- What’s the marginal effect of being assign to in person contact on voting?
- Fill in the values of the following tables using the values from the table above
- Use the figure below to answer: What’s the sign of in the following equation?
- Energy Efficiency promises a double whammy of benefits: if we reduce the amount of energy we can both save the world and save money. What’s not to love? In this exercise we’ll dig into how to explore this relationship. The technology innovation is a programmable thermostat, which is a device that allows the user to preset temperatures at energy-efficient levels. Another important variable is HDD “heating degree-days”, which is a measure of how cold it was in the month (it is the number of degrees that a day’s average temperature is below 65 degree Fahrenheit). Usually the relationship between HDD and temperature (measure as Therms) is positive (the colder it gets, the higher the temperature people set their thermostat). We have data of houses that use thermostat and houses that don’t. The results from an OLS analysis are below, the main outcome variable for all of these regressions is “Therms”. The cost of a therm is $1.59 per therm. The cost of the thermostat is $60
- Using Model (a), What’s the main conclusion?
- Does you main conclusion change when accounting for HDD?
- Using results from model (b), how much money are houses who use thermostat saving? According to this model is the thermostat worth it?
- Does it make sense that the programmable thermostat should save $30 in the middle of the summer? This indicates that the cost-savings depend on the weather outside. It makes more sense to think about the effects of the thermostat with respect to temperature outside. Therefore we focus on model (c). What’s the interpretation of the number -0.48?
- Using Model (c), what is the effect of the thermostat when HDD is 500?
- Using Model (c) What’s the average therm use for houses that don’t have a thermostat is particular hot months?
Conceptual or topical questions
- Suppose infants with birthweights below 1500 grams are classified as “very low birthweight” and are therefore automatically eligible for a stay in the neonatal intensive care unit (NICU) under most insurance plans.
- Explain intuitively how you would use this fact to estimate the effect of NICU visits on infant health outcomes.
- We want to know the effect of being sent to the NICU on 1-year infant mortality. Do you think this is sharp or fuzzy regression discontinuity? What is the “running variable”?
- What do we need to assume about the 1500-gram cutoff to get credible identification of the effect of NICU stays?
- How would you explore the discontinuity in a regression? Write the equation.
- Mention a particular robustness check you would suggest performing.
- Let’s say we are concerned that hospital staff are coding some babies that are above the cutoff (e.g., 1550, 1600) as under the cutoff (e.g., 1450, 1490). How would this bias the coefficients? Let’s say we do the analysis and ignore these potential sources of bias. Would the effect we estimate be an upper or lower bound?
- Students who graduate from Georgia high schools with GPAs of 3.0 or higher are eligible for the state’s HOPE scholarship. HOPE scholarships provide tuition support for students to enroll at public or private colleges in Georgia. The program aims to increase college enrollment overall and encourage strong students to stay in their home state.
- Describe how you would evaluate the effects of HOPE eligibility on enrollment at Georgia colleges using a regression discontinuity (RD) strategy. Specify the treatment group, the control group, and any assumptions required for this strategy to capture the causal effect of HOPE on college-going. Specify what type of relationship the running variable has concerning the outcome.
- Create a hypothesis of how the HOPE scholarship affects enrollment at Georgia colleges. What about how it would impact “college enrollment” overall?
- Draw some graphs that are consistent with your story and that would represent the RDs. Be sure to label any important features of your picture (axes, legend, etc.) How closely your picture matches your story is more important than which story you believe to be true.
- Write out a regression that is consistent with this story. Write down how you would code each variable. Practice seeing how these coefficients map to your graph.
- Do you think the HOPE scholarship is well suited for an RD study? Would you offer any caveats about using the HOPE eligibility threshold for an RD analysis?
- Imagine you run an RD regression with the student's age as the outcome variable. You find a jump around the threshold. Would this finding make you more or less confident about your results?
- Someone is concerned that you haven’t controlled for the students' race in the proposed model, so your estimate is biased. What conditions would need to be true for a student’s race to be an issue of concern?
- In 1996, Texas adopted a new school accountability program to help with student performance, while the states bordering Texas did not adopt such a program. With standardized test scores (Score) in 1995 and 1997 for a large sample of 4th graders in Texas and the bordering states, we could run the regression:
- Interpret what each coefficient is capturing.
- Someone would like to know what would have happened in Texas had we not implemented the policy. How would you obtain this from the regression?
- Which of the following provides the best estimate of the causal effect of the policy? ?
- A state implemented a reform (at year 0) to increase health coverage. You are in charge of estimating the effect of this policy. Use the following graphs to answer the following questions:
- You have decided to use a DD strategy to analyze the effects of this policy. Which panel provides the most appropriate control group to implement a DD strategy? Explain.
- Using the graph you picked in 4a, calculate the effect of the policy using a DD strategy. Show your work in a table:
- Imagine if we had run the following regression: . Indicate what would be the value of each coefficient.
- For each of the following examples, explain how to create a difference-in-difference design to estimate the effects of the policy:
- The state of VA passed a bill in February 2015 to increase funding for mental health in schools. The government plans to use this money to increase the counselor-per-student ratio in each school. The bill was passed in February 2015 and enacted that Fall. You have a school-by-year dataset for all the southern states (using the census definition). This dataset includes the average students' mental health outcomes and other school characteristics. Write down the model you would use and explain what each (set of) variables means and how they would be coded (not the STATA code). Try to have a standard model and a generalized model.
- You have the following dataset representing the share of teens reporting anxiety in school. Using these data, create an event study figure with 2014 as your base year.
- Use the event study to evaluate the assumptions of the research design. Is this supporting the assumptions or not?
- Discuss how these events could affect the causal interpretation of the generalized model from the example above. For each event, under what conditions would these events be a concern, and what conditions would it not be a concern?
- A pandemic occurred in 2020.
- The rollout of a social media app (say Instagram) in 2015.
- The long-term closure of schools in 2013 Tennessee due to unprecedented snow.
- Increases in teacher pay in all southern states but VA in 2017
- Increases in teacher pay in all southern states but VA in 2011
Here, Texas=1 if the observation was drawn from a Texas school (=0 otherwise), and D97=1 if the observation was from 1997 (=0 otherwise).
d. Imagine that . Fill in the following table:
Texas | Border States | |
1995 | ||
1997 |
e. What would have happened to the treatment group had they not received treatment?
f. Someone is concerned that in the model above, you have not controlled for the difference in “culture” between TX and other bordering states. What would have to be true for this concern to be valid?
g. Someone is concerned that you have not included year FE in the model. Explain if this is a concern or not?
Before | After | Diff | |
Treatment | |||
Control | |||
Diff |
Southern States | Virginia | |
2011 | 14 | 9 |
2012 | 15 | 9 |
2013 | 16 | 7 |
2014 | 17 | 6 |
2015 | 18 | 6 |
2016 | 19 | 8 |
2017 | 20 | 9 |
2018 | 21 | 10 |
True or False
- A dataset is a city-year panel. In a DD design, we include city-fixed effects and year-fixed-effects. We cannot include a variable such as “area in squared miles” for each city.
- True
- False, explain
- In a regression that includes school-fixed effects, these capture all time-invariant characteristics across each child.
- True
- False, explain
- A new policy in Costa Rica has expanded the number of cafeterias in some public schools. To understand the effects of cafeterias on children's nutrition, we should control enrollment in the schools to account for new students coming because of the new cafeterias.
- True
- False, explain
- If you were interested in the effect of attending a “selective college” (think an IVY league school) on lifetime earnings, we should not include college fixed effects.
- True
- False, explain