Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The concepts we will practice in this homework are related to understanding how to interpret different regression components. In the first class survey, some of the main topics students were interested in were: social policy, education, gender differences, and international policy. The following case contains some of these topics, so hopefully, you will find it engaging. The data and tables come from a real-world intervention done in villages in Afghanistan to improve schooling outcomes.
Guidelines
- Your responses should be professionally formatted and written. You can type the answers in word, PDF, or a google doc file.
- Submit your do-file and answers on this submission form
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- The due date is June 30th, 2025
Preamble
You are working for a non-profit that is helping the local & federal governments of Afghanistan understand the impact of investing in education infrastructure on educational outcomes. The current problem is that the attendance and performance of children in schools are below the standard. A significant funder is interested in this problem but is also worried about who receives the resources; they are particularly concerned about the differences in outcomes between boys and girls. You work for a non-profit that is helping in providing a diagnosis of the problem and analyzing the effects of an intervention from the funder. You plan to evaluate the intervention and talk with the local government about scalability. You can obtain the data for this homework from here. (schools_experiment.dta)
Section 1. Diagnosing the problem
The first step is to diagnose and understand the problem. You learn a couple of things as you read the literature and talk with several stakeholders. First, there seems to be low attendance in schools because there aren’t many schools available, and the distance to school for the average child is 3.16 miles. Second, people have mentioned that the interest in putting kids in school changes as they age. One story you heard is that as kids age, they are more likely to go to school because they see that other kids go to school and start asking for it. The other story you’ve heard is that as kids age, families are less likely to put their kids in school because now they are more useful for household work. Third, the funder is worried about gender disparities in attending school and how these disparities are exacerbated by distance.
Given all of this information, you use the data collected by your organization and run some regressions and summary statistics. The following tables provide some results from that analysis. Your first job is to interpret the results of the table. At this point, you would probably be asked to “write a report” or meet to “de-brief” about what you learned. We won’t ask you to write a report, but we will ask you some questions about interpreting the table. Do note that if we take the “training wheels off,” we wouldn’t guide you on what to write on the report and expect you to notice some important patterns. For now, we will guide you on how to notice patterns.

Respond to the following questions by looking at the results of Table 1.1. These are results from six regressions in which the main outcome variable is if a child has enrolled in a school or not. The data is at the children's level and contains limited information about the children and their families.
- Interpret the coefficient on distance from column (1). (Hint: be mindful of units).
- Assess the magnitude of the coefficient. (i.e., provide a conclusion, is it large or small?). In your answer, show a process of how you determine your conclusion. Apply concepts we learned in class rather than applying any random process.
- Using column (2), what would be your conclusion on the relationship between age and school enrollment? Which theory you’ve heard on the relationship between a child’s age and schooling (see above) matches closely with the results from column (2)?
- Using column (6) results, what would be your conclusion on the relationship between a child’s age and school enrollment? Provide a complete interpretation.
- Using column (3) results: Are there gender disparities in attending the school? If so, provide a full interpretation. Make sure you assess the magnitude.
- Using column (5) results: Are there gender disparities in attending the school? Provide a full interpretation and mention some key takeaways. (Note: this question is as general as the question before, but notice that one could provide a lot more information than before. What is that relevant information? What are the key takeaways? That is part of the challenge. Use frameworks we learned in class to guide you.)
- During your visits to households, you met a 5-year-old boy named Abdul-Alim that lives about 5 miles away from school. You are wondering how much his predicted likelihood of being enrolled in a school will change if he were a girl. Use model (5) to answer this question. Show your process
Section 2. Evaluating Treatment
From the diagnostic part, it seems that a factor that is manipulable by a government or NGO and that matters for enrollment is the distance to a school. Because of this, it does seem that investing in building more schools may significantly impact education outcomes, especially if one were to build schools in a way that reduces the distance to schools. The government, in partnership with the large funder and the non-profit, implements a program of building schools in villages. Now you are in charge of understanding the program's effects on educational outcomes. You receive the table below (Table 1.2) and are asked to interpret the findings. The following questions will help you with that. Note: the covariates in the model were measured before treatment. At this point, we don't know how the treatment was assigned (i.e., randomly or not).

- First, we’ll want to know if the treatment was implemented correctly. Which of these outcomes demonstrates the best if the policy was implemented correctly? If so, why.
- Use the results from the models without covariates, provide an interpretation of the effect of adding a school to a village for each outcome, and then assess the magnitude.
- Imagine the outcome “Distance to School” was in logarithmic form, Ln(Distance to School), and we obtained the same coefficients. Provide the interpretation of the coefficient on treatment. Also, provide the interpretation of the coefficient on the head of household years of education.
- Not knowing details about the intervention, but using this context, build a story of why you would be concerned about interpreting the results from the models without covariates as causal.
- Comparing the results from the models with and without covariates for all outcomes: What can we suspect or say about the treatment?
Section 3. How did we do on gender disparities?
This intervention seems to work, especially for getting kids enrolled in schools. We noticed in the first part of the work that there were gender disparities, and although we know the intervention wasn’t set out to address that, your non-profit is still interested in how this intervention helped address this issue (if it did at all). The non-profit is curious because they want to use this intervention to address gender disparities in other places. For this section, you’ll interpret the results from Table 1.3. (Below)

- Does the treatment help reduce gender disparities in distance to school? Explain your answer.
- Does the treatment help reduce gender disparities in school enrollment? Explain your answer.
- Does the treatment help reduce gender disparities in standardized test scores? Explain your answer.
Section 5. Doing the work
Great job on interpreting the tables and the results. Now the task for this part of the homework is simple. Access the data for the homework and create a do-file replicating the results above. In other words, provide a do-file that replicates each table. You may have to create variables, clean data, etc. You will know you have the correct code once you can replicate the tables the same. We will grade this by running your code. Make sure you start your do-file with the following code. You can put an * before each of these if you want to comment it out