Sebastian Tello
  • Home
  • CV
  • Contact
  • Research
  • Resources
  • RMDA
  • APP
InstagramBluesky
🏡

Homework 4: Who runs the world? (AK)

Purpose

The objective of this homework is for you to practice concepts learned in class and apply them to data. The research design we will practice here is “mean comparison” with and without covariates. The tool you will use is OLS, and we will also practice things like interaction and non-linearities. In addition, this HW is born out of a paper, so we’ll start diving into “reading & understanding papers.” Finally, you will practice how to speak in technical and “colloquial” ways and use our framework to understand what they are trying to say. It’s a jam-packed homework of fun and excitement. Clean your desk, get a bottle of water, pick your favorite beverage, turn on “do not disturb,” set a timer for 45 min (then take breaks), put on some work tunes, and dive into the fun of learning.

☝🏽
Colloquial - (of language) used in ordinary or familiar conversation, not formal or literary. Example: "colloquial and everyday language”
☕
Designing a relaxing and cozy environment is not just for vibes; it helps the brain reduce anxiety-related hormones, putting it in a better learning and absorbing mode. The main things are water and no distractions; the rest are also good but not as necessary.

Guidelines

  • You can work by yourself or with groups of up to three.
  • Submit your group answers to Gradescope (within Canvas). One submission per group, please.
  • You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among accurate statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses. This will get more strict over time.
  • Your responses should be professionally formatted and written.
  • The due date is Monday, February 23rd, at 9pm EDT.
  • You can answer all of your questions to the nearest 0.0x decimal.

Preamble

As we have said in class, papers can entail evaluating a policy or helping us understand a social concept. This paper is about understanding the social ideas that drive people to change their perspectives on women’s rights: Does parenting daughters influence parental political behavior? This is the question we will try to answer with data. You can imagine why this question is interesting and its repercussions on policy. Sociologists have long argued that parenting daughters increase feminist sympathies. We could discuss all day about why it could or couldn’t and create many theories; that’s all fun, but in practice, what happens? Can we evaluate this empirically? We will assess this claim using data from a paper by Ebonya Washington that explores this.

Let’s get things ready before we start digging into the data:

‣
Open up the basic.dta attached here.
basic.dta411.3KB

You can also obtain it from this link. If you want to practice the skill of obtaining data from their main source, here is the link to where you can find all the data. See if you can download the basic.dta from there. Could be useful to practice for future homeworks.

‣
Get yourself familiar with the data.
‣
Give a brief read to the paper to get yourself familiar with the exercise. Or find the PDF on this link.
aer.98.1.311.pdf446.1KB
‣
Figure out the primary outcomes and explanatory variables from skimming the paper.

Practice the above skill before reading the following text: You can find a detailed description of each variable in the original paper. The main variable in this analysis is AAUW, a score created by the American Association of University Women (AAUW). For each congress, AAUW selects pieces of legislation in the areas of education, equality, and reproductive rights. The AAUW keeps track of how each legislator voted on these pieces of legislation and whether their vote aligned with the AAUW’s position. The legislator’s score is equal to the proportion of these votes made in agreement with the AAUW.

Getting to work

  1. Report summary statistics of the following variables (proxies for) in the dataset with a table: political party, age, race (only white or non-white), gender, AAUW score, the number of children, and the number of daughters. Compare the mean of each of these variables between legislators that have girls vs. legislators that do not have girls. Your table should include the means for both groups, the sample size of each group, a column for the difference between means, and a column with the p-value testing if the difference is statistically different from zero or not. (Note: there are many ways to construct this table, try to use as much STATA possible to construct the table, but fine if some of the process is out of STATA, as in doing it in word or excel). Check out this worksheet on how to do tables in STATA. We recommend having up to two decimal points for means or differences and up to 4 decimal points for p-values. You don’t need a column showing “total values” (i.e. the full sample).
  2. ‣
    Answer
  3. To disentangle the causal effect of parenting daughters on feminist sympathies, a peer suggested comparing the difference in AAUW scores before a Congressperson had a girl and after they had a girl and averaging the difference. Let’s say your peer makes the above mentioned comparison and concludes that having a girl increases one’s AAUW score, on average. Assuming that the causal effect of having a girl on AAUW scores is 0, what’s one reason that could explain your peer’s finding? (Recall that you have to ability to elevate your answer given the “signs” implied here).
  4. ‣
    Answer
    ✅
    This comparison may be biased if legislators become more progressive over time due to trends in culture norms. This means that anyone, regardless if they have a girl or not, would look more progressive over time. To improve this comparison, it would have been appropriate to compare that difference to the difference of passage over time for legislators who had no change in “having a girl”; this would net out the “over time” effect. Nevertheless, even with this improvement, one would still worry about the differences between legislators that had a change in “having a girl” and legislators that did not. Notice that we can apply the sign of the bias for this question: We know the true parameter is 0, and the biased estimate is positive. Therefore it is easy to tell that the bias is positive. Therefore any reason we give should imply that the Corr(OVB,AUUW)>0Corr(OVB,AUUW)>0Corr(OVB,AUUW)>0 and that Corr(AnyGirl,OVB)>0Corr(AnyGirl,OVB)>0Corr(AnyGirl,OVB)>0. In this case, the OVB is “time” or the “passage of time”. We first say “if legislators become more progressive over time” that’s indicating the first correlation Corr(OVB,AUUW)>0Corr(OVB,AUUW)>0Corr(OVB,AUUW)>0. We gave a reason (trends in culture norms). The second correlation being positive is almost by construction, since we are comparing people before and after they have a girl, then by construction of our comparison we know that AnyGirlAnyGirlAnyGirl will be positive correlated with time. Therefore this explanation checks out that the bias is positive and would explain why our comparison might give us a positive effect even though the causal effect is 0. Notice that “number of kids” or “total number of kids” is not a great OV in this comparison. The key here is that this is a different comparison like the one below, when we have different comparisons the set of variables that are OV changes. To understand this, imagine we would have said that an OV is a person’s race. This would not have been an appropriate OV. Why not? Because in this comparison race would have been net out given the comparison. Comparing the same person before and after, would “eliminate” the race effects because a person cannot change their race over time. Now to number of kids. Let’s say that number of kids is a constant, that is just a variable that doesn’t change over time. Similar to race, this would have been net out, and so it is not an OV. let’s say number of kids would have change, and specifically because that person had a girl, so in other words it changes by construction. If that’s the case then in this particular comparison, number of kids becomes an outcome because having a girl is affecting the number of kids directly. In the case below, since there is no time component, STATA just thinks this is a cross section, or in other words no time component, so number of kids can’t change. Finally let’s say number of kids doesn’t change because of having a girl, but having a boy. Then then OV is really having a boy. Even when we accept that having a boy is correlated with having a girl (which is a bit odd), this would be saying that having a boy affects AAUW scores, and is correlated with having a girl, but this wouldn’t explain why when we make this comparison we find a positive effect on having a girl on AAUW scores. In fact, with this explanation, we should have found that having a girl on AAUW gives us 0, but instead we find a positive number, so this explanation doesn’t really answer the question.
  5. Another peer looked into the data and told you the following statement: “I compared the share of legislators that report having a girl across different levels of the aauwaauwaauw score and noticed that as aauwaauwaauw score increases, the likelihood of reporting having a girl does not increase, therefore the having a girl doesn’t really have an effect on voting preference of the legislators. I did this by comparing increases of 10 percentage points in AAUW score and observing changes in share of people having a girl.”
    1. [Bonus] Write in conditional expectation language the exercise that this peer performed. Use variable names if it is easier.
    2. ‣
      Answer
      E(AnyGirli∣AAUWi=c+10)−E(AnyGirli∣AAUW=c)E(AnyGirl_i|AAUW_i=c+10)-E(AnyGirl_i|AAUW=c)E(AnyGirli​∣AAUWi​=c+10)−E(AnyGirli​∣AAUW=c)
  6. Another peer suggests comparing the AAUW score between legislators with girls and legislators with no girls.
    1. Write down this comparison in conditional expectation language. Use variable names if it is easier.
    2. ‣
      Answer
      E(AAUWi∣AnyGirli=1)−E(AAUWj∣AnyGirlj=0)E(AAUW_i|AnyGirl_i=1)-E(AAUW_j|AnyGirl_j=0)E(AAUWi​∣AnyGirli​=1)−E(AAUWj​∣AnyGirlj​=0)
    3. Compute this difference in STATA. (Hint: using sum and display commands).
    4. ‣
      Answer
    5. Write down the theoretical regression that represents the main causal question of this exercise.
    6. ‣
      Answer
      AAUWit=α0+β1AnyGirlit+ϵitAAUW_{it}=\alpha_0+\beta_1AnyGirl_{it}+\epsilon_{it}AAUWit​=α0​+β1​AnyGirlit​+ϵit​
    7. Use STATA to run the regression to perform the comparison that your peer suggested. Upload a screenshot of the STATA output
    8. ‣
      Answer
    9. Read the main finding from the regression result in a technical way. Make sure you are aware of units. Express the relationship as the percent change and percentage point change in AAUW score that results from having a girl
    10. ‣
      Answer
      ✅
      Having a girl decreases in AAUW score by 4.79 points. This represents a decrease of almost 10% relative to the average score.
    11. Use a max of two sentences & non-technical language to answer the following question: What’s the conclusion from the empirical exercises above? (i.e. do not use numbers just state directions)
    12. ‣
      Answer
      ✅
      Having a girl makes a legislator less likely to have voted in line with AAUW. In broad terms, parenting a girl makes a legislator less of a feminist.
    13. Assuming that the causal effect of having a girl on AAUW is positive. What’s one reason that could explain the findings in 4e?
    14. ‣
      Answers
      ✅
      Many reasons. One is that when we are comparing legislators with girls vs. legislators without girls, we are also comparing legislators with more kids vs. legislators with fewer kids. This is because the more kids one has, the more likely one is to have a girl. In addition, legislators with more kids are less likely to vote in favor of AAUW-supported legislation because they tend to be more conservative, religious, etc. This means that we can improve our comparison by comparing legislators with vs. without girls among legislators with the same number of children. To answer this question, one needs to apply the “sign of the bias” framework. Applying this framework will reveal that for an omitted variable bias story to work, we need to state two things: the relationship between the OVB and our primary explanatory variable (any girls is positively related to having more kids) AND the relationship between our OVB and the outcome (more kids, means less AAUW score because of more conservative or religious values). For each of these relationships, I have to establish a sign and give them a reason (more kids, more likely to have a girl, that’s the reason). The second component in the answer is the signs of this relationship have to make sense. Since the finding is a negative relationship (more girls, less AAUW) and the causal one is positive, one has to find a story in which the bias is negative. Why? Causal effect + (negative bias) = more negative numbers. To have a negative bias, I need the two components of the relationship to be of opposite signs: more kids, more girls (+), and more kids, less AAUW (-). Your story may be correct but have the “wrong” signs. The third component is giving a reason for why one believes the relationship between two variables may be positive or negative. It does not suffice to say “it’s positive”. One would have to say “it’s positive because..” Hopefully, you will start noticing how powerful the concept of the signing of the bias can be in explaining empirical findings.
  7. We will continue with the exercise above and compare AAUW scores between legislators who have girls and legislators who don’t.
    1. The simple mean comparison may not provide the causal effect because of potential confounders. Therefore, we will add some covariates to mitigate bias and see how our result changes. Run the following regressions and report the results on a formatted table within STATA/R: (Hint: use the esttab command, there is a worksheet on it.)
    2. (1)  aauwit=α0+α1anygirlit+ϵit(2)  aauwit=β0+β1anygirlit+β2totchiit+ηit(3)  aauwit=γ0+γ1anygirlit+γ2totchiit+γ3femaleit+γ4repubit+ιit\begin{aligned} (1)\ \ aauw_{it}=&\alpha_0+\alpha_1anygirl_{it}+\epsilon_{it}\\ (2)\ \ aauw_{it}=&\beta_0+\beta_1anygirl_{it}+\beta_2totchi_{it}+\eta_{it} \\ (3)\ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4repub_{it}+\iota_{it} \\ \end{aligned}(1)  aauwit​=(2)  aauwit​=(3)  aauwit​=​α0​+α1​anygirlit​+ϵit​β0​+β1​anygirlit​+β2​totchiit​+ηit​γ0​+γ1​anygirlit​+γ2​totchiit​+γ3​femaleit​+γ4​repubit​+ιit​​
      ‣
      Answer
    3. Could controlling for Republican be considered an issue here? Explain your answer.
    4. ‣
      Answer
      ✅
      Short answer: because this could be considered an outcome. That is, as people have girls they may change their political party and therefore this could be a way that is affecting aauw scores. Granted, this may not be a terrible issue if the change of political parties is something that doesn’t occur very often, which in this case, it doesn’t. If you think, that being a republican and anygirls are not very correlated, this is not inherently an issue then. Adding a covariate that one would argue brings little value is not an issue on itself.
    5. You should have found a difference between the coefficient on anygirlsanygirlsanygirls between equations (1) and (2). In technical language, or language we use in class, how do you explain the change from the result in equation (1) to equation (2).
    6. ‣
      Answer
      ✅
      Once we control for the total number of children, we see that legislators that have girls have higher AAUW scores than legislators that do not have girls. This means that number of children was an important confounder in the relationship between having a girl and aauw scores. In fact, using the sign of the bias framework, we can uncover a better answer. The sign of the bias is negative; just by noticing the change of the coefficients since we observe that corr(Y,totchi)<0corr(Y,totchi)<0corr(Y,totchi)<0, we infer that, that corr(anygirls,totchi)>0corr(anygirls,totchi)>0corr(anygirls,totchi)>0, which is what our intuition expected as well. Therefore in technical language, we can say, omitting total number of children was biasing our estimate downwards.
    7. Now explain the difference in non-technical language, or in a way everyone can understand. Write the number of characters at the end of your answer. It should be less than 1,000 for the answer to be counted correctly.
    8. ‣
      Answer
      ✅
      Before, we were comparing legislators that have girls to legislators that do not have girls, by doing that, we were also comparing legislators who had lots of kids to legislators with fewer kids. Since it seems that legislators with bigger families vote less in accordance with AAUW, our original comparison made it seem that having a girl meant lower aauw scores. Still, in reality we were comparing legislators with large families vs. legislators with smaller ones. Once we compare the legislators with the same number of kids, we see that having a girl makes a legislator vote more in accordance with AAUW. (639 characters).
    9. Would your qualitative conclusion change given the results of equation (2) to equation (3)? Which control variable is particularly important and why? (Hint: feel free to run another regression to help support your claim.)
    10. ‣
      Answer
      ✅
      Not really. Overall, our main conclusion would be that having a girl makes a legislator vote in a more “feminist” way, or if we are trying to be precise, more in line with AAUW. The coefficient changes by two points, but this is not as meaningful of a change relative to going from equation 1 to equation 2. The total number of children seems very important, given how much it helps to reduce the bias. Equation 3 adds additional controls that further help reduce bias. However, these controls are not as crucial since once we control for the total number of children, the number of girls a congressperson has is only weakly correlated with being female or republican, as you can see in the table below.
    11. Consider the third specification (with three controls in addition to anygirlsanygirlsanygirls). Conditional on the number of children and other variables, do you think anygirlsanygirlsanygirls is plausibly exogenous? What identifying assumption is necessary for γ1\gamma_1γ1​ to be interpreted as a causal estimate? What evidence does Washington give to support this assumption?
    12. ‣
      Answer
      ✅
      anygirlsanygirlsanygirls will be plausibly exogenous if the Conditional Independence Assumption holds. This will be the case if once we control for totchiitotchi_itotchii​, femaleifemale_ifemalei​ , and repubirepub_irepubi​, the number of girls is as good as randomly assigned. As discussed in the article, this assumption could be violated if couples follow fertility-stopping rules (i.e., keep having kids until they get both a girl and boy, for example). This assumption could also be violated if voters select their representatives based on the gender composition of their children. In the article, Washington presents evidence that these concerns are not driving her results. She looks at the gender of the firstborn and finds that it is predictive of the gender mix but not the total number of children in the sample. She also looks at numerous district characteristics and does not find that there is a concerning relationship between these and the number of daughters the congressperson they elect has.
  8. Run the following regressions and show them in a nicely formatted table in STATA.
    1. (1)  aauwit=γ0+γ1anygirlit+γ2totchiit+γ3femaleit+γ4Ageit+γ5Ageit2+ιit(2)  aauwit=γ0+γ1anygirlit+γ2totchiit+γ3femaleit+γ4whiteit+γ5white×femaleit+ιit(3)  aauwit=γ0+γ1anygirlit+γ2totchiit+γ3femaleit+γ4anygirl×femaleit+ιit\begin{aligned} (1) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4Age_{it}+\gamma_5Age^2_{it}+\iota_{it} \\ (2) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4white_{it}+\gamma_5white\times female_{it}+ \iota_{it} \\ (3) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4anygirl\times female_{it}+ \iota_{it} \\ \end{aligned}(1)  aauwit​=(2)  aauwit​=(3)  aauwit​=​γ0​+γ1​anygirlit​+γ2​totchiit​+γ3​femaleit​+γ4​Ageit​+γ5​Ageit2​+ιit​γ0​+γ1​anygirlit​+γ2​totchiit​+γ3​femaleit​+γ4​whiteit​+γ5​white×femaleit​+ιit​γ0​+γ1​anygirlit​+γ2​totchiit​+γ3​femaleit​+γ4​anygirl×femaleit​+ιit​​
    2. Show them in a nicely formatted table. (Again, use STATA as much as you can, but fine if need to resort to Word/Excel). Provide table and code
    3. ‣
      Answer
      image
    4. Using results from equation (1) What is the full relationship between age and aauwaauwaauw scores? Provide process.
    5. ‣
      Answer
      ✅
      As age increases, legislators tend to have higher aauwaauwaauw scores, however after they are 67 years of age, as they get older, they have lower scores. In order to get this answer, let’s look at the following process: In order to find the marginal effect we take the derivative: δaauwδAge=γ4+2γ5Age\frac{\delta aauw}{\delta Age}=\gamma_4+2\gamma_5AgeδAgeδaauw​=γ4​+2γ5​Age

      Now we set it equal to 0 to find the min or max:

      γ4+2γ5Age=02γ5Age=−γ4Age=−γ42γ5\gamma_4+2\gamma_5Age=0 \\ 2\gamma_5Age=-\gamma_4 \\ Age=\frac{-\gamma_4}{2\gamma_5}γ4​+2γ5​Age=02γ5​Age=−γ4​Age=2γ5​−γ4​​

      Now we plug in those values from the regression:

      Age=−2.7692×−0.0207=−2.769−.0414=66.88Age=\frac{-2.769}{2\times -0.0207}=\frac{-2.769}{-.0414}=66.88Age=2×−0.0207−2.769​=−.0414−2.769​=66.88

      Now, we need to find out if this is a min or a max, for that we take the second derivative:

      δaawuδ2Age=2γ5Age<0\frac{\delta aawu}{\delta^2 Age}=2\gamma_5Age<0δ2Ageδaawu​=2γ5​Age<0

      Since given the coefficient on γ5<0\gamma_5<0γ5​<0, this indicates that we are dealing with a max.

      Now, it could be the case that no legislator is 67 or older, so let's check in our data the range of age, cause otherwise, the min or max may not be hit. 
       
      sum age
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
               age |      1,740    52.92759    9.504845         26         87
    6. Using results from equation (2), what is the marginal effect of being white for non-females on aauwaauwaauw scores? Show process.
    7. ‣
      Answer
      ✅
      The marginal effect of being white for non-females is: -44.48. We obtain this from the marginal effect of being white and setting female=0 δaauwδwhite=γ4+γ5female\frac{\delta aauw}{\delta white}=\gamma_4+\gamma_5femaleδwhiteδaauw​=γ4​+γ5​female
    8. Using results from equation (2), what is the marginal effect of being female for a white person on aauwaauwaauw scores? Show process.
    9. ‣
      Answer
      ✅
      The marginal effect of being female for a white person is 25.6. We obtain this from the marginal effect of being female and setting white=1 δaauwδfemale=γ3+γ5whiteδaauwδfemale=5.259+20.33=25.6\frac{\delta aauw}{\delta female}=\gamma_3+\gamma_5white \\ \frac{\delta aauw}{\delta female}=5.259+20.33=25.6δfemaleδaauw​=γ3​+γ5​whiteδfemaleδaauw​=5.259+20.33=25.6
    10. Using results from equation (3), what is the predicted aauwaauwaauw scores for a male who has a daughter, and has the average number of children that male legislators have? (We’ll assume for the purposes of this question that non-female=male). Show process.
    11. ‣
      Answer
      ✅
      The predicted aauw score is 46.42, we obtain this by adding: 6.240+-6.155*2.466842+ 55.36γ0+γ1+γ2×(2.466842)\gamma_0+\gamma_1+\gamma_2\times (2.466842)γ0​+γ1​+γ2​×(2.466842)
    12. Using results from equation (3), how does the effect of having a girl change by the gender of the legislator? Show process and formulate a technical and non-technical answer.
    13. ‣
      Answer
      ✅
      For this we start by taking the derivative with respected to any girl:δaauwδanygirl=γ1+γ4female\frac{\delta aauw}{\delta anygirl}=\gamma_1+\gamma_4femaleδanygirlδaauw​=γ1​+γ4​female

      From here we see that the effect varies by whether we include γ4\gamma_4γ4​ or not, γ4\gamma_4γ4​=5.451. Now we have information for the whole answer: The effect of having a girl varies by the gender of the legislator, female legislators have a larger effect and non-females legislators, in fact they score 5.451 higher if they have girl than male legislators. In colloquial term, having a daughter has a larger effect for female legislators than male legislators.

Extensions

This section is not graded and you don’t have to submit, but will help you push your thinking further. Think of the questions of “extensions” as questions that we could ask in this homework, but we decided not to grade them. Therefore, you should be able to know how to answer these questions or think of them as practice questions.

  1. Let’s say you want to test even more hypotheses with these data and model:
  • If you wanted to explore not just the margin of having a girl but the number of girls, what regressions would you run?
  • ‣
    Answer

    We could just change the main explanatory variable to “ngirls” which represents the number of girls. We would then interpret the coefficient on ngirls as the effect of having one more girl.

    (3)  aauwi=γ0+γ1anygirls+γ2totchi+γ3femalei+γ4repub+ιi(4)  aauwi=γ0+γ1ngirls+γ2totchi+γ3femalei+γ4repub+ιi(3)\ \ aauw_i=\gamma_0+\gamma_1anygirls+\gamma_2totchi+\gamma_3female_i+\gamma_4repub+\iota_i \\ (4)\ \ aauw_i=\gamma_0+\gamma_1ngirls+\gamma_2totchi+\gamma_3female_i+\gamma_4repub+\iota_i \\(3)  aauwi​=γ0​+γ1​anygirls+γ2​totchi+γ3​femalei​+γ4​repub+ιi​(4)  aauwi​=γ0​+γ1​ngirls+γ2​totchi+γ3​femalei​+γ4​repub+ιi​
  • If you wanted to explore whether the effect on female legislators differs from men, what regressions would you run? Why doesn’t just controlling for “female” count as exploring the difference in effect between men and women?
  • ‣
    Answer
    ✅
    With the tools you have right now, you could run equation (2) + repubrepubrepub, and only use the sample of female legislators, and then a sample of male legislators, and then compare the coefficients on anygirlanygirlanygirl. Notice that the “female” covariate would be dropped in these new regressions because it would be a constant. We ran these regressions below. We find that the effect is larger for female legislators than for male legislators. Just controlling for female does not give us the question at hand; it just gives us the effect of being a female on AAUW scores, rather than the effect of being a female AND having girls, relative to being male AND having girls.
  1. Using the residual regression approach, obtain the coefficient γ1\gamma_1γ1​ from equation (3).
  2. ‣
    Answer
  3. Obtain an approximate coefficient β1\beta_1β1​ from equation aauwit=β0+β1anygirlit+β2totchiit+ηitaauw_{it}=\beta_0+\beta_1anygirl_{it}+\beta_2totchi_{it}+\eta_{it}aauwit​=β0​+β1​anygirlit​+β2​totchiit​+ηit​ using the “by hand” approach as seen in the worksheet.
  4. ‣
    Answer

    Using the “by-hand” method, the weighted mean is 8.61706968 or 8.61, which is near 7.04. We used excel for this.

    Number of Children
    E(AAUW∣AnyGirl==1)E(AAUW|AnyGirl==1)E(AAUW∣AnyGirl==1)
    E(AAUW∣AnyGirl==0)E(AAUW|AnyGirl==0)E(AAUW∣AnyGirl==0)
    Sample Size
    Difference
    0
    .
    56.41333
    225
    -
    1
    57.35714
    68.78082
    185
    -11.42368
    2
    55.34515
    44.37143
    563
    10.97372
    3
    44.45198
    29.46667
    399
    14.98531
    4
    37.47716
    26.71429
    204
    10.76287
    5
    42.00952
    4
    108
    38.00952
    6
    18
    100
    17
    -82
    7
    39.28571
    -
    14
    -
    8
    1.083333
    -
    12
    -
    9
    6
    -
    3
    -
    10
    3
    -
    4
    -
    11
    -
    -
    0
    -
    12
    0
    -
    1
    -
    Total
    494
    1,241
    1,735
  5. Checking the relationship between variables and our Y is important, install the command binscatter2. This commands produces graph where you can observe the relationship between two variables, and works great to visualize non-linear relationships, try the following commands:
    1. binscatter2 aauw age
      binscatter2 aauw age, linetype(qfit)

      Then compare this to the regression with age and age squared. Now run the following command:

      binscatter2 aauw age, linetype(qfit) by(female)
    2. Write out a single regression equation that represents the last graph produced.
    3. ‣
      Answer
      ✅

      aauw1=γ0+γ1Age+γ2Age2+γ3Female+γ4Female×Age+γ5Female×Age2aauw_1=\gamma_0+\gamma1 Age+\gamma_2 Age^2+\gamma3 Female+\gamma4Female\times Age+\gamma_5Female\times Age^2aauw1​=γ0​+γ1Age+γ2​Age2+γ3Female+γ4Female×Age+γ5​Female×Age2

use "$hw/homework 1/basic.dta", clear

* Creating the Table of Summary Stats with different means. Works in STATA 17 Only
local myresults "NoGirls= r(mu_1) N1=r(N_1) AnyGirl = r(mu_2) N2=r(N_2) Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)"
table (command) (result), ///
command(`myresults' : ttest repub,      by(anygirls)) ///
command(`myresults' : ttest age,      by(anygirls)) ///
command(`myresults' : ttest white,      by(anygirls)) ///
command(`myresults' : ttest female,      by(anygirls)) ///
command(`myresults' : ttest aauw,      by(anygirls)) ///
command(`myresults' : ttest totchi,      by(anygirls)) ///
command(`myresults' : ttest ngirls,      by(anygirls)) ///
nformat(%6.2f  NoGirls AnyGirl Diff)  ///
nformat(%6.3f  pvalue)

* Changing Labels
collect label levels command 1  "Share Republican" 2 "Age" 3 "Share White" 4 "Share Female" 5 "AAUW Score" 6 "Total Number of Children" 7 "Number of Girls", modify
Í
* Changing Style of Cells
collect style cell result[NoGirl AnyGirl Diff], nformat(%8.2f)
collect style cell result[pvalue], nformat(%6.4f)
collect style cell border_block, border(right, pattern(nil))
collect preview
-------------------------------------------------------------------------
                          NoGirls    N1   AnyGirl     N2    Diff   pvalue
-------------------------------------------------------------------------
Share Republican             0.47   495      0.54   1241    0.06   0.0190
Age                         50.67   495     53.84   1241    3.17   0.0000
Share White                  0.85   495      0.87   1241    0.02   0.3763
Share Female                 0.12   495      0.12   1241   -0.00   0.9657
AAUW Score                  51.72   494     46.93   1241   -4.79   0.0337
Total Number of Children     1.10   495      2.98   1241    1.89   0.0000
Number of Girls              0.00   495      1.73   1241    1.73   0.0000
-------------------------------------------------------------------------
. summ aauw if anygirls==1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        aauw |      1,241    46.93473    42.27097          0        100

. summ aauw if anygirls==0

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        aauw |        494    51.72267    42.53554          0        100

. display 46.93473  - 51.72267 
-4.78794
reg aauw anygirl

Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(1, 1733)      =      4.52
       Model |  8100.22373         1  8100.22373   Prob > F        =    0.0337
    Residual |  3107646.72     1,733  1793.21796   R-squared       =    0.0026
-------------+----------------------------------   Adj R-squared   =    0.0020
       Total |  3115746.94     1,734  1796.85522   Root MSE        =    42.346

------------------------------------------------------------------------------
        aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    anygirls |  -4.787942    2.25277    -2.13   0.034    -9.206377   -.3695074
       _cons |   51.72267   1.905255    27.15   0.000     47.98583    55.45951
------------------------------------------------------------------------------

estimates clear
eststo: reg aauw anygirl
eststo: reg aauw anygirl totchi
eststo: reg aauw anygirl totchi female repub
esttab, se 
------------------------------------------------------------
                      (1)             (2)             (3)   
                     aauw            aauw            aauw   
------------------------------------------------------------
anygirls           -4.788*          7.404**         3.508** 
                  (2.253)         (2.604)         (1.207)   

totchi                             -6.430***       -2.010***
                                  (0.731)         (0.343)   

female                                              12.05***
                                                  (1.421)   

repub                                              -72.91***
                                                  (0.947)   

_cons               51.72***        58.71***        86.95***
                  (1.905)         (2.027)         (1.044)   
------------------------------------------------------------
N                    1735            1735            1735   
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
------------------------------------------------------------
                      (1)             (2)             (3)   
                 anygirls        anygirls        anygirls   
------------------------------------------------------------
totchi              0.148***        0.150***        0.150***
                (0.00572)       (0.00579)       (0.00579)   

female             0.0247                          0.0187   
                 (0.0281)                        (0.0284)   

repub                             -0.0283         -0.0264   
                                 (0.0187)        (0.0189)   

_cons               0.349***        0.363***        0.360***
                 (0.0172)        (0.0183)        (0.0190)   
------------------------------------------------------------
N                    1736            1736            1736   
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
--------------------------------------------
                (Female Sample) (Male Sample)   
                     aauw            aauw   
--------------------------------------------
anygirls            9.058**         2.639*  
                  (3.150)         (1.299)   

totchi             -0.808          -2.147***
                  (0.977)         (0.365)   

repub              -71.26***       -73.18***
                  (2.526)         (1.016)   

_cons               91.78***        88.06***
                  (2.309)         (1.116)   
--------------------------------------------
N                     213            1522   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

reg aauw anygirls totchi female anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(4, 1730)      =     45.21
       Model |   294850.66         4  73712.6651   Prob > F        =    0.0000
    Residual |  2820896.28     1,730  1630.57589   R-squared       =    0.0946
-------------+----------------------------------   Adj R-squared   =    0.0925
       Total |  3115746.94     1,734  1796.85522   Root MSE        =     40.38

--------------------------------------------------------------------------------
          aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |   6.239591   2.657354     2.35   0.019     1.027626    11.45156
        totchi |  -6.155213   .7124918    -8.64   0.000    -7.552649   -4.757777
        female |   24.70641   5.525213     4.47   0.000     13.86961    35.54321
anygirlXfemale |    5.45111    6.53663     0.83   0.404    -7.369419    18.27164
         _cons |   55.36287   2.097626    26.39   0.000     51.24872    59.47702
--------------------------------------------------------------------------------

. 
. * Regressing female (variable of interest) on other covariates and obtaining the residual
. reg female  anygirls totchi anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,736
-------------+----------------------------------   F(3, 1732)      =   1442.09
       Model |  133.442728         3  44.4809094   Prob > F        =    0.0000
    Residual |  53.4230552     1,732   .03084472   R-squared       =    0.7141
-------------+----------------------------------   Adj R-squared   =    0.7136
       Total |  186.865783     1,735  .107703622   Root MSE        =    .17563

--------------------------------------------------------------------------------
        female | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |  -.1150578   .0111857   -10.29   0.000    -.1369966   -.0931189
        totchi |  -.0042955   .0030857    -1.39   0.164    -.0103476    .0017566
anygirlXfemale |   .9994065   .0152129    65.69   0.000      .969569    1.029244
         _cons |   .1279444    .008589    14.90   0.000     .1110986    .1447902
--------------------------------------------------------------------------------

. 
. predict res_female, res
(4 missing values generated)

. 
. 
. * regressing outcome variable on other covariates and obtaining the residual
. reg aauw anygirls totchi  anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(3, 1731)      =     53.03
       Model |  262247.296         3  87415.7653   Prob > F        =    0.0000
    Residual |  2853499.65     1,731  1648.46889   R-squared       =    0.0842
-------------+----------------------------------   Adj R-squared   =    0.0826
       Total |  3115746.94     1,734  1796.85522   Root MSE        =    40.601

--------------------------------------------------------------------------------
          aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |   3.384438   2.593617     1.30   0.192    -1.702516    8.471391
        totchi |  -6.257483   .7160212    -8.74   0.000    -7.661841   -4.853126
anygirlXfemale |   30.14339   3.516919     8.57   0.000     23.24553    37.04125
         _cons |   58.52484    1.98565    29.47   0.000     54.63031    62.41936
--------------------------------------------------------------------------------

. predict res_aauw, res
(5 missing values generated)

. 
. reg res_aauw res_female

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(1, 1733)      =     20.03
       Model |  32603.3059         1  32603.3059   Prob > F        =    0.0000
    Residual |  2820896.33     1,733  1627.75322   R-squared       =    0.0114
-------------+----------------------------------   Adj R-squared   =    0.0109
       Total |  2853499.64     1,734  1645.61686   Root MSE        =    40.345

------------------------------------------------------------------------------
    res_aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  res_female |   24.70637   5.520423     4.48   0.000     13.87898    35.53376
       _cons |  -.0014549   .9686002    -0.00   0.999    -1.901203    1.898293
------------------------------------------------------------------------------