Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to data. The research design we will practice here is “mean comparison” with and without covariates. The tool you will use is OLS, and we will also practice things like interaction and non-linearities. In addition, this HW is born out of a paper, so we’ll start diving into “reading & understanding papers.” Finally, you will practice how to speak in technical and “colloquial” ways and use our framework to understand what they are trying to say. It’s a jam-packed homework of fun and excitement. Clean your desk, get a bottle of water, pick your favorite beverage, turn on “do not disturb,” set a timer for 45 min (then take breaks), put on some work tunes, and dive into the fun of learning.
Guidelines
- You can work by yourself or with groups of up to three.
- Submit your group answers to Gradescope (within Canvas). One submission per group, please.
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among accurate statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses. This will get more strict over time.
- Your responses should be professionally formatted and written.
- The due date is Friday, February 21st, at 11:59pm EDT.
- You can answer all of your questions to the nearest 0.0x decimal.
Preamble
As we have said in class, papers can entail evaluating a policy or helping us understand a social concept. This paper is about understanding the social ideas that drive people to change their perspectives on women’s rights: Does parenting daughters influence parental political behavior? This is the question we will try to answer with data. You can imagine why this question is interesting and its repercussions on policy. Sociologists have long argued that parenting daughters increase feminist sympathies. We could discuss all day about why it could or couldn’t and create many theories; that’s all fun, but in practice, what happens? Can we evaluate this empirically? We will assess this claim using data from a paper by Ebonya Washington that explores this.
Let’s get things ready before we start digging into the data:
Getting to work
- Report summary statistics of the following variables (proxies for) in the dataset: political party, age, race (only white or non-white), gender, AAUW score, the number of children, and the number of daughters. Compare the mean of each of these variables between legislators that have girls vs. legislators that do not have girls. Your table should include the means for both groups, the sample size of each group, a column for the difference between means, and a column with the p-value testing if the difference is statistically different from zero or not. (Note: there are many ways to construct this table, try to use as much STATA possible to construct the table, but fine if some of the process is out of STATA, as in doing it in word or excel). Check out this worksheet on how to do tables in STATA. We recommend having up to two decimal points for means or differences and up to 4 decimal points for p-values. You don’t need a column showing “total values” (i.e. the full sample).
- To disentangle the causal effect of parenting daughters on feminist sympathies, a peer suggested comparing the difference in AAUW scores before a congressperson had a girl and after they had a girl and averaging the difference. Let’s say your peer makes the above mentioned comparison and concludes that having a girl increases one’s AAUW score, on average. Assuming that the causal effect of having a girl on AAUW scores is 0, what’s one reason that could explain your peer’s finding? (Recall that you have to ability to elevate your answer given the “signs” implied here).
- Another peer looked into the data and told you the following statement: “I compared the share of legislators that report having a girl across different levels of the score and noticed that as score increases, the likelihood of reporting having a girl does not increase, therefore the having a girl doesn’t really have an effect on voting preference of the legislators. I did this by comparing increases of 10 percentage points in AAUW score and observing changes in share of people having a girl.”
- [Bonus] Write in conditional expectation language the exercise that this peer performed. Use variable names if it is easier.
- Another peer suggests comparing the AAUW score between legislators with girls and legislators with no girls.
- Write down this comparison in conditional expectation language. Use variable names if it is easier.
- Compute this difference in STATA. (Hint: using sum and display commands).
- Write down the regression that represents the main causal question of this exercise.
- Use STATA to run the regression to perform the comparison that your peer suggested. Upload a screenshot of the STATA output
- Read the main finding from the regression result in a technical way. Make sure you are aware of units. express the relationship as the percent change in AAUW score that results from having a girl
- Use a max of two sentences & non-technical language to answer the following question: What’s the conclusion from the empirical exercises above? (i.e. do not use numbers just state directions)
- Assuming that the causal effect of having a girl on AAUW is positive. What’s one reason that could explain the findings in 4e?
- We will continue with the exercise above and compare AAUW scores between legislators who have girls and legislators who don’t.
- The simple mean comparison may not provide the causal effect because of potential confounders. Therefore, we will add some covariates to mitigate bias and see how our result changes. Run the following regressions and report the results on a formatted table within STATA/R: (Hint: use the esttab command, there is a worksheet on it.)
- Could controlling for Republican be considered an issue here? Explain your answer.
- You should have found a difference between the coefficient on between equations (1) and (2). In technical language, or language we use in class, how do you explain the change from the result in equation (1) to equation (2).
- Now explain the difference in non-technical language, or in a way everyone can understand. Write the number of characters at the end of your answer. It should be less than 1,000 for the answer to be counted correctly.
- Would your qualitative conclusion change given the results of equation (2) to equation (3)? Which control variable is particularly important and why? (Hint: feel free to run another regression to help support your claim.)
- Consider the third specification (with three controls in addition to ). Conditional on the number of children and other variables, do you think is plausibly exogenous? What identifying assumption is necessary for to be interpreted as a causal estimate? What evidence does Washington give to support this assumption?
- Run the following regressions and show them in a nicely formatted table in STATA.
- Using results from equation (1) What is the full relationship between age and scores? Provide process.
- Using results from equation (2), what is the marginal effect of being white for non-females on scores? Show process.
- Using results from equation (2), what is the marginal effect of being female for a white person on scores? Show process.
- Using results from equation (3), what is the predicted scores for a male who has a daughter, and has the average number of children that male legislators have? (We’ll assume for the purposes of this question that non-female=male). Show process.
- Using results from equation (3), how does the effect of having a girl change by the gender of the legislator? Show process and formulate a technical and non-technical answer.
Extensions
This section is not graded and you don’t have to submit, but will help you push your thinking further. Think of the questions of “extensions” as questions that we could ask in this homework, but we decided not to grade them. Therefore, you should be able to know how to answer these questions or think of them as practice questions.
- Let’s say you want to test even more hypotheses with these data and model:
- If you wanted to explore not just the margin of having a girl but the number of girls, what regressions would you run?
- If you wanted to explore whether the effect on female legislators differs from men, what regressions would you run? Why doesn’t just controlling for “female” count as exploring the difference in effect between men and women?
- Using the residual regression approach, obtain the coefficient from equation (3).
- Obtain an approximate coefficient from equation using the “by hand” approach as seen in the worksheet.
- Checking the relationship between variables and our Y is important, install the command
binscatter2
. This commands produces graph where you can observe the relationship between two variables, and works great to visualize non-linear relationships, try the following commands: - Write out a single regression equation that represents the last graph produced.
binscatter2 aauw age
binscatter2 aauw age, linetype(qfit)
Then compare this to the regression with age and age squared. Now run the following command:
binscatter2 aauw age, linetype(qfit) by(female)