Purpose
The objective of this homework is to practice reading, interpreting, and critiquing a published paper that uses instrumental variables (IV) — This homework trains you to be the kind of analyst who can pick up a paper, understand its design, identify its limits, and credibly communicate its findings to a decision-maker who has never heard of a LATE. This homework was inspired by work on APP, so think of this as something you may run into while doing your APP.
Guidelines
- You can work by yourself or with groups of up to two.
- Submit your group answers to Gradescope (within Canvas). One submission per group, please.
- You will get points for correct answers. Points will be deducted if the answer contains more information than necessary, or if it contains incorrect statements alongside correct ones. We reward precision: use the fewest characters needed to maximize accuracy. This will get more strict over time.
- Your responses should be professionally formatted and written.
- The due date is March 27th, at 9pm EDT.
- You can answer quantitative questions to the nearest 0.01.
Preamble
You work as a research analyst at a policy organization advising a state legislator in Central Appalachia — a region where community college (CC) enrollment rates are well below national averages and limited access to credentials constrains economic mobility. Your boss has been asked a direct question:
"We have $2 million to expand access to community college. Should we spend it on Federal Work-Study offers? And if so, how many students would we reach?"
Ideally, you can look through this paper and see if you can find the answer straight up. This homework will walk you through steps to get there, but I think it’s good practice to see if you can do it without the guardrails, I won’t always be next to you! Anyways, To answer this, your organization has identified a recent paper: Minaya, Scott-Clayton & Soliz (2026), "The Causal Effects of Federal Work-Study Offers on College Enrollment and Program Participation" (EdWorkingPaper No. 26-1400). The paper studies 66,360 FAFSA applicants at a large multi-campus public college system, exploiting quasi-experimental variation in who receives FWS offers based on the timing of FAFSA submission relative to campus-specific budget cutoffs. Notice that this paper’s main strategy is difference-in-difference, which we haven’t studied yet. However, they also have a version of their strategy that is IV, the questions about IV refer to that part of the strategy.
Your task: evaluate the evidence, interpret the design, and write rigorous answers that could form the basis of a policy memo to the legislator.
Paper reference: Minaya, Scott-Clayton & Soliz (2026), "The Causal Effects of Federal Work-Study Offers on College Enrollment and Program Participation," EdWorkingPaper No. 26-1400. Retrieve it from EdWorkingPapers.com. Read — at minimum — the introduction, design section, and Tables 2–4 and 6 before you begin. You can also find it here
Section 1: Understanding the Research Design
- The natural experiment. In two to three sentences, describe the source of variation the authors exploit. What makes some students eligible for FWS offers, and what makes when they file their FAFSA matter? Do not use the word "instrument" yet — just explain the setting plainly.
- The IV framework. Let be the instrument, the treatment, and the outcome (pick one of interest to your boss).
- Define each variable precisely as used in this paper and also explain them in simple terms so that your boss can understand.
- Write the first-stage, reduced-form, and structural (second-stage) equations. Some may be written in the paper and some may not. Use these equations to also express the IV estimator as a ratio. Notice the two important “covariates” this IV needs.
- Understand the results part 1
- What does Figure 2 visually demonstrate?
- Which table provides evidence of the “first stage”?
- Interpret the of 0.178 from Table 3. (Notice that the layout of the table is a little bit different from what we are used to)
- Is the instrument strong? Cite the relevant statistic from the paper.
- Table 2 question
- What is the purpose of a Table 2 in this IV design?
- Table 2 shows a 2 percentage point imbalance in dependency status at the cutoff. Is this a serious concern? Explain.
Section 2: Interpreting the Results
- What is the reduced-form effect on Fall 2017 enrollment for the full sample? (find the right table) Is it statistically significant? Economically significant?
- For the CC applicant subsample, (find the right table) report: (a) the reduced-form effect on enrollment and its statistical significance, and (b) the first-stage coefficient.
- Using only the two numbers from the previous questions, compute the IV (LATE) estimate. Show your work clearly.
- Interpret your estimate in: (a) precise technical language and (b) plain language for the legislator. No more than three sentences each.
- Using the paper's specific design: (a) describe who the compliers are in concrete terms, and (b) describe two groups of non-compliers — always-takers and never-takers — in this specific context.
Section 3: The Policy Application — "The APP Problem" (Translating estimates into actionable information)
- As your client's aide you have these two number to report: the reduced-form effect and the IV estimate. She asks which one to use to evaluate her proposed FWS expansion. Which do you recommend, and why? Be precise about what each quantity measures.
- The legislator wants 400 additional community college enrollees via FWS offers. Using the IV estimate, how many FWS offers must she make? Show your work.
- The study you are reading has a different population than Appalachian, so in order to understand if these are “larger” or “smaller” than what would be for this population, Identify two specific ways Central Appalachian CC students may differ from the study population. For each, state whether the true LATE in that context would be larger or smaller than 36 percentage points and explain your reasoning. (Hint: which concept we’ve seen in class can help you talk about this?)
- Now let's go back to the original question, we think you are ready for it: "We have $2 million to expand access to community college. If we spend it on Federal Work-Study offers? And if so many students would end up enrolled?"
- A cost effectiveness analysis provides a ratio, in this setting you want to know the cost per student enrolled in community college of this policy, given the numbers you just obtained what would be that estimate for this policy? Notice that once we have this number we can compare them to other policy options to see what’s the best “bang for your buck”. This is a prime example of how research informs policy making.
Section 4: IV Assumptions
- State the exclusion restriction for this paper's instrument in plain English. (b) What did the authors test in Appendix Figure A2, and what did they find? Why does this test matter?
- The paper shows that receiving a FWS offer increases the probability of holding a FWS job by about 27 percentage points — far less than 100%. Why don't the authors use to estimate the causal effect of actually working a FWS job on enrollment? What assumption would this violate?
- The legislator says: "If only about 27% of offer recipients actually get a FWS job, doesn't that mean we need to scale up our 1,111 estimate even further — to account for all the offers that won't be converted into jobs?" How do you respond?
Section 5: Doing it with data
- The results before were obtained from reading the paper. When you don’t have access to data, it is important to rely on paper’s estimates to try to answer the questions as best as one can. There are other opportunities in which you will have the data readily available and could run some regressions. The data used in this paper is of restricted access, but we are providing you with simulated data created with the idea to replicate the results (they won’t be exactly the same as the paper, but more than being exactly the same the question is whether you can replicate the empirical exercise with data). Use the data and the information provided in the paper of the models to create a do-file that finds the following regressions of interest. For the purposes of these questions our main outcome of interest is enrollment in Fall 2017 for the full sample and sub-sample of community college applicants. Present a single table that provides the following regressions, generate a variable called z or instrument for the instrument:
- The regression of the first-stage from the model with covariates, for the full sample.
- The regression of the ITT from the model with covariates, for the full sample.
- The regression of the first-stage from the model with covariates, for the sample of community college applicants.
- The regression of the ITT from the model with covariates, for the sample of community college applicants.
- The regression that represents the causal effect of receiving a FWS offer on Enrollment in Fall, 2017 for the sample of community college applicants. (Hint: this should be very very similar as the wald estimator version if you wanted to check)
Data can be found here (fws_simulated.dta)
Extensions (non graded - but good practice)
- Table 4 shows large enrollment effects for CC applicants (+4.1pp) and for independent students, but null effects for four-year applicants and dependent students. Provide two distinct economic explanations for this pattern.
- The paper concludes that "expanding funding alone is unlikely to deliver full benefits without infrastructure to match students to jobs." Using take-up rates and IV logic, explain what the authors mean and propose one concrete policy complement to a FWS funding expansion.