🏡

Homework 4: Who runs the world? (AK)

Purpose

The objective of this homework is for you to practice concepts learned in class and apply them to data. The research design we will practice here is “mean comparison” with and without covariates. The tool you will use is OLS, and we will also practice things like interaction and non-linearities. In addition, this HW is born out of a paper, so we’ll start diving into “reading & understanding papers.” Finally, you will practice how to speak in technical and “colloquial” ways and use our framework to understand what they are trying to say. It’s a jam-packed homework of fun and excitement. Clean your desk, get a bottle of water, pick your favorite beverage, turn on “do not disturb,” set a timer for 45 min (then take breaks), put on some work tunes, and dive into the fun of learning.

☝🏽

Colloquial - (of language) used in ordinary or familiar conversation, not formal or literary. Example: "colloquial and everyday language”

☕

Designing a relaxing and cozy environment is not just for vibes; it helps the brain reduce anxiety-related hormones, putting it in a better learning and absorbing mode. The main things are water and no distractions; the rest are also good but not as necessary.

Guidelines

You can work by yourself or with groups of up to three.
Submit your group answers to Gradescope (within Canvas). One submission per group, please.
You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among accurate statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses. This will get more strict over time.
Your responses should be professionally formatted and written.
The due date is Monday, February 23rd, at 9pm EDT.
You can answer all of your questions to the nearest 0.0x decimal.

Preamble

As we have said in class, papers can entail evaluating a policy or helping us understand a social concept. This paper is about understanding the social ideas that drive people to change their perspectives on women’s rights: Does parenting daughters influence parental political behavior? This is the question we will try to answer with data. You can imagine why this question is interesting and its repercussions on policy. Sociologists have long argued that parenting daughters increase feminist sympathies. We could discuss all day about why it could or couldn’t and create many theories; that’s all fun, but in practice, what happens? Can we evaluate this empirically? We will assess this claim using data from a paper by Ebonya Washington that explores this.

Let’s get things ready before we start digging into the data:

‣

Open up the basic.dta attached here.

‣

Get yourself familiar with the data.

‣

Give a brief read to the paper to get yourself familiar with the exercise. Or find the PDF on this link.

‣

Figure out the primary outcomes and explanatory variables from skimming the paper.

Getting to work

Report summary statistics of the following variables (proxies for) in the dataset with a table: political party, age, race (only white or non-white), gender, AAUW score, the number of children, and the number of daughters. Compare the mean of each of these variables between legislators that have girls vs. legislators that do not have girls. Your table should include the means for both groups, the sample size of each group, a column for the difference between means, and a column with the p-value testing if the difference is statistically different from zero or not. (Note: there are many ways to construct this table, try to use as much STATA possible to construct the table, but fine if some of the process is out of STATA, as in doing it in word or excel). Check out this worksheet on how to do tables in STATA. We recommend having up to two decimal points for means or differences and up to 4 decimal points for p-values. You don’t need a column showing “total values” (i.e. the full sample).

‣

Answer

To disentangle the causal effect of parenting daughters on feminist sympathies, a peer suggested comparing the difference in AAUW scores before a Congressperson had a girl and after they had a girl and averaging the difference. Let’s say your peer makes the above mentioned comparison and concludes that having a girl increases one’s AAUW score, on average. Assuming that the causal effect of having a girl on AAUW scores is 0, what’s one reason that could explain your peer’s finding? (Recall that you have to ability to elevate your answer given the “signs” implied here).

‣

Answer

✅

This comparison may be biased if legislators become more progressive over time due to trends in culture norms. This means that anyone, regardless if they have a girl or not, would look more progressive over time. To improve this comparison, it would have been appropriate to compare that difference to the difference of passage over time for legislators who had no change in “having a girl”; this would net out the “over time” effect. Nevertheless, even with this improvement, one would still worry about the differences between legislators that had a change in “having a girl” and legislators that did not. Notice that we can apply the sign of the bias for this question: We know the true parameter is 0, and the biased estimate is positive. Therefore it is easy to tell that the bias is positive. Therefore any reason we give should imply that the

Corr(OVB,AUUW)>0

and that

Corr(AnyGirl,OVB)>0

. In this case, the OVB is “time” or the “passage of time”. We first say “if legislators become more progressive over time” that’s indicating the first correlation

Corr(OVB,AUUW)>0

. We gave a reason (trends in culture norms). The second correlation being positive is almost by construction, since we are comparing people before and after they have a girl, then by construction of our comparison we know that

AnyGirl

will be positive correlated with time. Therefore this explanation checks out that the bias is positive and would explain why our comparison might give us a positive effect even though the causal effect is 0. Notice that “number of kids” or “total number of kids” is not a great OV in this comparison. The key here is that this is a different comparison like the one below, when we have different comparisons the set of variables that are OV changes. To understand this, imagine we would have said that an OV is a person’s race. This would not have been an appropriate OV. Why not? Because in this comparison race would have been net out given the comparison. Comparing the same person before and after, would “eliminate” the race effects because a person cannot change their race over time. Now to number of kids. Let’s say that number of kids is a constant, that is just a variable that doesn’t change over time. Similar to race, this would have been net out, and so it is not an OV. let’s say number of kids would have change, and specifically because that person had a girl, so in other words it changes by construction. If that’s the case then in this particular comparison, number of kids becomes an outcome because having a girl is affecting the number of kids directly. In the case below, since there is no time component, STATA just thinks this is a cross section, or in other words no time component, so number of kids can’t change. Finally let’s say number of kids doesn’t change because of having a girl, but having a boy. Then then OV is really having a boy. Even when we accept that having a boy is correlated with having a girl (which is a bit odd), this would be saying that having a boy affects AAUW scores, and is correlated with having a girl, but this wouldn’t explain why when we make this comparison we find a positive effect on having a girl on AAUW scores. In fact, with this explanation, we should have found that having a girl on AAUW gives us 0, but instead we find a positive number, so this explanation doesn’t really answer the question.

Another peer looked into the data and told you the following statement: “I compared the share of legislators that report having a girl across different levels of the $aauw$ score and noticed that as $aauw$ score increases, the likelihood of reporting having a girl does not increase, therefore the having a girl doesn’t really have an effect on voting preference of the legislators. I did this by comparing increases of 10 percentage points in AAUW score and observing changes in share of people having a girl.”

[Bonus] Write in conditional expectation language the exercise that this peer performed. Use variable names if it is easier.

‣

Answer

E(AnyGirl_i|AAUW_i=c+10)-E(AnyGirl_i|AAUW=c)

Another peer suggests comparing the AAUW score between legislators with girls and legislators with no girls.

Write down this comparison in conditional expectation language. Use variable names if it is easier.

‣

Answer

E(AAUW_i|AnyGirl_i=1)-E(AAUW_j|AnyGirl_j=0)

Compute this difference in STATA. (Hint: using sum and display commands).

‣

Answer

Write down the theoretical regression that represents the main causal question of this exercise.

‣

Answer

AAUW_{it}=\alpha_0+\beta_1AnyGirl_{it}+\epsilon_{it}

Use STATA to run the regression to perform the comparison that your peer suggested. Upload a screenshot of the STATA output

‣

Answer

Read the main finding from the regression result in a technical way. Make sure you are aware of units. Express the relationship as the percent change and percentage point change in AAUW score that results from having a girl

‣

Answer

Use a max of two sentences & non-technical language to answer the following question: What’s the conclusion from the empirical exercises above? (i.e. do not use numbers just state directions)

‣

Answer

Assuming that the causal effect of having a girl on AAUW is positive. What’s one reason that could explain the findings in 4e?

‣

Answers

✅

Many reasons. One is that when we are comparing legislators with girls vs. legislators without girls, we are also comparing legislators with more kids vs. legislators with fewer kids. This is because the more kids one has, the more likely one is to have a girl. In addition, legislators with more kids are less likely to vote in favor of AAUW-supported legislation because they tend to be more conservative, religious, etc. This means that we can improve our comparison by comparing legislators with vs. without girls among legislators with the same number of children. To answer this question, one needs to apply the “sign of the bias” framework. Applying this framework will reveal that for an omitted variable bias story to work, we need to state two things: the relationship between the OVB and our primary explanatory variable (any girls is positively related to having more kids) AND the relationship between our OVB and the outcome (more kids, means less AAUW score because of more conservative or religious values). For each of these relationships, I have to establish a sign and give them a reason (more kids, more likely to have a girl, that’s the reason). The second component in the answer is the signs of this relationship have to make sense. Since the finding is a negative relationship (more girls, less AAUW) and the causal one is positive, one has to find a story in which the bias is negative. Why? Causal effect + (negative bias) = more negative numbers. To have a negative bias, I need the two components of the relationship to be of opposite signs: more kids, more girls (+), and more kids, less AAUW (-). Your story may be correct but have the “wrong” signs. The third component is giving a reason for why one believes the relationship between two variables may be positive or negative. It does not suffice to say “it’s positive”. One would have to say “it’s positive because..” Hopefully, you will start noticing how powerful the concept of the signing of the bias can be in explaining empirical findings.

We will continue with the exercise above and compare AAUW scores between legislators who have girls and legislators who don’t.

The simple mean comparison may not provide the causal effect because of potential confounders. Therefore, we will add some covariates to mitigate bias and see how our result changes. Run the following regressions and report the results on a formatted table within STATA/R: (Hint: use the esttab command, there is a worksheet on it.)

\begin{aligned} (1)\ \ aauw_{it}=&\alpha_0+\alpha_1anygirl_{it}+\epsilon_{it}\\ (2)\ \ aauw_{it}=&\beta_0+\beta_1anygirl_{it}+\beta_2totchi_{it}+\eta_{it} \\ (3)\ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4repub_{it}+\iota_{it} \\ \end{aligned}

‣

Answer

Could controlling for Republican be considered an issue here? Explain your answer.

‣

Answer

You should have found a difference between the coefficient on $anygirls$ between equations (1) and (2). In technical language, or language we use in class, how do you explain the change from the result in equation (1) to equation (2).

‣

Answer

✅

Once we control for the total number of children, we see that legislators that have girls have higher AAUW scores than legislators that do not have girls. This means that number of children was an important confounder in the relationship between having a girl and aauw scores. In fact, using the sign of the bias framework, we can uncover a better answer. The sign of the bias is negative; just by noticing the change of the coefficients since we observe that

corr(Y,totchi)<0

, we infer that, that

corr(anygirls,totchi)>0

, which is what our intuition expected as well. Therefore in technical language, we can say, omitting total number of children was biasing our estimate downwards.

Now explain the difference in non-technical language, or in a way everyone can understand. Write the number of characters at the end of your answer. It should be less than 1,000 for the answer to be counted correctly.

‣

Answer

Would your qualitative conclusion change given the results of equation (2) to equation (3)? Which control variable is particularly important and why? (Hint: feel free to run another regression to help support your claim.)

‣

Answer

Consider the third specification (with three controls in addition to $anygirls$ ). Conditional on the number of children and other variables, do you think $anygirls$ is plausibly exogenous? What identifying assumption is necessary for $\gamma_1$ to be interpreted as a causal estimate? What evidence does Washington give to support this assumption?

‣

Answer

✅

anygirls

will be plausibly exogenous if the Conditional Independence Assumption holds. This will be the case if once we control for

totchi_i

female_i

, and

repub_i

, the number of girls is as good as randomly assigned. As discussed in the article, this assumption could be violated if couples follow fertility-stopping rules (i.e., keep having kids until they get both a girl and boy, for example). This assumption could also be violated if voters select their representatives based on the gender composition of their children. In the article, Washington presents evidence that these concerns are not driving her results. She looks at the gender of the firstborn and finds that it is predictive of the gender mix but not the total number of children in the sample. She also looks at numerous district characteristics and does not find that there is a concerning relationship between these and the number of daughters the congressperson they elect has.

Run the following regressions and show them in a nicely formatted table in STATA.

\begin{aligned} (1) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4Age_{it}+\gamma_5Age^2_{it}+\iota_{it} \\ (2) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4white_{it}+\gamma_5white\times female_{it}+ \iota_{it} \\ (3) \ \ aauw_{it}=&\gamma_0+\gamma_1anygirl_{it}+\gamma_2totchi_{it}+\gamma_3female_{it}+\gamma_4anygirl\times female_{it}+ \iota_{it} \\ \end{aligned}

Show them in a nicely formatted table. (Again, use STATA as much as you can, but fine if need to resort to Word/Excel). Provide table and code

‣

Answer

Using results from equation (1) What is the full relationship between age and $aauw$ scores? Provide process.

‣

Answer

✅

As age increases, legislators tend to have higher

aauw

scores, however after they are 67 years of age, as they get older, they have lower scores. In order to get this answer, let’s look at the following process: In order to find the marginal effect we take the derivative:

\frac{\delta aauw}{\delta Age}=\gamma_4+2\gamma_5Age

Now we set it equal to 0 to find the min or max:

\gamma_4+2\gamma_5Age=0 \\ 2\gamma_5Age=-\gamma_4 \\ Age=\frac{-\gamma_4}{2\gamma_5}

Now we plug in those values from the regression:

Age=\frac{-2.769}{2\times -0.0207}=\frac{-2.769}{-.0414}=66.88

Now, we need to find out if this is a min or a max, for that we take the second derivative:

\frac{\delta aawu}{\delta^2 Age}=2\gamma_5Age<0

Since given the coefficient on $\gamma_5<0$ , this indicates that we are dealing with a max.

Now, it could be the case that no legislator is 67 or older, so let's check in our data the range of age, cause otherwise, the min or max may not be hit. 
 
sum age

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         age |      1,740    52.92759    9.504845         26         87

Using results from equation (2), what is the marginal effect of being white for non-females on $aauw$ scores? Show process.

‣

Answer

✅

The marginal effect of being white for non-females is: -44.48. We obtain this from the marginal effect of being white and setting female=0

\frac{\delta aauw}{\delta white}=\gamma_4+\gamma_5female

Using results from equation (2), what is the marginal effect of being female for a white person on $aauw$ scores? Show process.

‣

Answer

✅

The marginal effect of being female for a white person is 25.6. We obtain this from the marginal effect of being female and setting white=1

\frac{\delta aauw}{\delta female}=\gamma_3+\gamma_5white \\ \frac{\delta aauw}{\delta female}=5.259+20.33=25.6

Using results from equation (3), what is the predicted $aauw$ scores for a male who has a daughter, and has the average number of children that male legislators have? (We’ll assume for the purposes of this question that non-female=male). Show process.

‣

Answer

✅

The predicted aauw score is 46.42, we obtain this by adding: 6.240+-6.155*2.466842+ 55.36

\gamma_0+\gamma_1+\gamma_2\times (2.466842)

Using results from equation (3), how does the effect of having a girl change by the gender of the legislator? Show process and formulate a technical and non-technical answer.

‣

Answer

✅

For this we start by taking the derivative with respected to any girl:

\frac{\delta aauw}{\delta anygirl}=\gamma_1+\gamma_4female

From here we see that the effect varies by whether we include $\gamma_4$ or not, $\gamma_4$ =5.451. Now we have information for the whole answer: The effect of having a girl varies by the gender of the legislator, female legislators have a larger effect and non-females legislators, in fact they score 5.451 higher if they have girl than male legislators. In colloquial term, having a daughter has a larger effect for female legislators than male legislators.

Extensions

This section is not graded and you don’t have to submit, but will help you push your thinking further. Think of the questions of “extensions” as questions that we could ask in this homework, but we decided not to grade them. Therefore, you should be able to know how to answer these questions or think of them as practice questions.

Let’s say you want to test even more hypotheses with these data and model:

If you wanted to explore not just the margin of having a girl but the number of girls, what regressions would you run?

‣

Answer

We could just change the main explanatory variable to “ngirls” which represents the number of girls. We would then interpret the coefficient on ngirls as the effect of having one more girl.

(3)\ \ aauw_i=\gamma_0+\gamma_1anygirls+\gamma_2totchi+\gamma_3female_i+\gamma_4repub+\iota_i \\ (4)\ \ aauw_i=\gamma_0+\gamma_1ngirls+\gamma_2totchi+\gamma_3female_i+\gamma_4repub+\iota_i \\

If you wanted to explore whether the effect on female legislators differs from men, what regressions would you run? Why doesn’t just controlling for “female” count as exploring the difference in effect between men and women?

‣

Answer

✅

With the tools you have right now, you could run equation (2) +

repub

, and only use the sample of female legislators, and then a sample of male legislators, and then compare the coefficients on

anygirl

. Notice that the “female” covariate would be dropped in these new regressions because it would be a constant. We ran these regressions below. We find that the effect is larger for female legislators than for male legislators. Just controlling for female does not give us the question at hand; it just gives us the effect of being a female on AAUW scores, rather than the effect of being a female AND having girls, relative to being male AND having girls.

--------------------------------------------
                (Female Sample) (Male Sample)   
                     aauw            aauw   
--------------------------------------------
anygirls            9.058**         2.639*  
                  (3.150)         (1.299)   

totchi             -0.808          -2.147***
                  (0.977)         (0.365)   

repub              -71.26***       -73.18***
                  (2.526)         (1.016)   

_cons               91.78***        88.06***
                  (2.309)         (1.116)   
--------------------------------------------
N                     213            1522   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Using the residual regression approach, obtain the coefficient $\gamma_1$ from equation (3).

‣

Answer


reg aauw anygirls totchi female anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(4, 1730)      =     45.21
       Model |   294850.66         4  73712.6651   Prob > F        =    0.0000
    Residual |  2820896.28     1,730  1630.57589   R-squared       =    0.0946
-------------+----------------------------------   Adj R-squared   =    0.0925
       Total |  3115746.94     1,734  1796.85522   Root MSE        =     40.38

--------------------------------------------------------------------------------
          aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |   6.239591   2.657354     2.35   0.019     1.027626    11.45156
        totchi |  -6.155213   .7124918    -8.64   0.000    -7.552649   -4.757777
        female |   24.70641   5.525213     4.47   0.000     13.86961    35.54321
anygirlXfemale |    5.45111    6.53663     0.83   0.404    -7.369419    18.27164
         _cons |   55.36287   2.097626    26.39   0.000     51.24872    59.47702
--------------------------------------------------------------------------------

. 
. * Regressing female (variable of interest) on other covariates and obtaining the residual
. reg female  anygirls totchi anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,736
-------------+----------------------------------   F(3, 1732)      =   1442.09
       Model |  133.442728         3  44.4809094   Prob > F        =    0.0000
    Residual |  53.4230552     1,732   .03084472   R-squared       =    0.7141
-------------+----------------------------------   Adj R-squared   =    0.7136
       Total |  186.865783     1,735  .107703622   Root MSE        =    .17563

--------------------------------------------------------------------------------
        female | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |  -.1150578   .0111857   -10.29   0.000    -.1369966   -.0931189
        totchi |  -.0042955   .0030857    -1.39   0.164    -.0103476    .0017566
anygirlXfemale |   .9994065   .0152129    65.69   0.000      .969569    1.029244
         _cons |   .1279444    .008589    14.90   0.000     .1110986    .1447902
--------------------------------------------------------------------------------

. 
. predict res_female, res
(4 missing values generated)

. 
. 
. * regressing outcome variable on other covariates and obtaining the residual
. reg aauw anygirls totchi  anygirlXfemale 

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(3, 1731)      =     53.03
       Model |  262247.296         3  87415.7653   Prob > F        =    0.0000
    Residual |  2853499.65     1,731  1648.46889   R-squared       =    0.0842
-------------+----------------------------------   Adj R-squared   =    0.0826
       Total |  3115746.94     1,734  1796.85522   Root MSE        =    40.601

--------------------------------------------------------------------------------
          aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
      anygirls |   3.384438   2.593617     1.30   0.192    -1.702516    8.471391
        totchi |  -6.257483   .7160212    -8.74   0.000    -7.661841   -4.853126
anygirlXfemale |   30.14339   3.516919     8.57   0.000     23.24553    37.04125
         _cons |   58.52484    1.98565    29.47   0.000     54.63031    62.41936
--------------------------------------------------------------------------------

. predict res_aauw, res
(5 missing values generated)

. 
. reg res_aauw res_female

      Source |       SS           df       MS      Number of obs   =     1,735
-------------+----------------------------------   F(1, 1733)      =     20.03
       Model |  32603.3059         1  32603.3059   Prob > F        =    0.0000
    Residual |  2820896.33     1,733  1627.75322   R-squared       =    0.0114
-------------+----------------------------------   Adj R-squared   =    0.0109
       Total |  2853499.64     1,734  1645.61686   Root MSE        =    40.345

------------------------------------------------------------------------------
    res_aauw | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  res_female |   24.70637   5.520423     4.48   0.000     13.87898    35.53376
       _cons |  -.0014549   .9686002    -0.00   0.999    -1.901203    1.898293
------------------------------------------------------------------------------

Obtain an approximate coefficient $\beta_1$ from equation $aauw_{it}=\beta_0+\beta_1anygirl_{it}+\beta_2totchi_{it}+\eta_{it}$ using the “by hand” approach as seen in the worksheet.

‣

Answer

Using the “by-hand” method, the weighted mean is 8.61706968 or 8.61, which is near 7.04. We used excel for this.

Number of Children	$E(AAUW\|AnyGirl==1)$	$E(AAUW\|AnyGirl==0)$	Sample Size	Difference
0	.	56.41333	225	-
1	57.35714	68.78082	185	-11.42368
2	55.34515	44.37143	563	10.97372
3	44.45198	29.46667	399	14.98531
4	37.47716	26.71429	204	10.76287
5	42.00952	4	108	38.00952
6	18	100	17	-82
7	39.28571	-	14	-
8	1.083333	-	12	-
9	6	-	3	-
10	3	-	4	-
11	-	-	0	-
12	0	-	1	-
Total	494	1,241	1,735

Checking the relationship between variables and our Y is important, install the command binscatter2. This commands produces graph where you can observe the relationship between two variables, and works great to visualize non-linear relationships, try the following commands:

binscatter2 aauw age
binscatter2 aauw age, linetype(qfit)

Then compare this to the regression with age and age squared. Now run the following command:

binscatter2 aauw age, linetype(qfit) by(female)

Write out a single regression equation that represents the last graph produced.

‣

Answer

✅

$aauw_1=\gamma_0+\gamma1 Age+\gamma_2 Age^2+\gamma3 Female+\gamma4Female\times Age+\gamma_5Female\times Age^2$