Practice Problems for Exam 1

You can use these questions as a guide to practice for the exam. We have about 35 questions here, plus all the questions from your quizzes, homework, and worksheets. If you need more than that, you also have questions in your books and questions from the discussion section.

A couple of guidelines:

You can use these questions as either assessment or evaluative.

If you plan to use these questions as an “assessment,” I recommend you not study, take these questions, and then go back to exploring those topics in which you feel weaker.
If you plan to use these questions as "evaluative," I recommend also timing yourself. Since the exam is a time-constrained exercise, it's good also to practice questions with a time constraint.

We note that some answers are meant to be didactical (teaching moments) rather than answers that get straight to the point.
Some questions will say, "Show your work," but in the answers, we show numbers. One would want to show the process; the answer is to check if you use the proper method.
It would be best to try not to learn all types of questions, as this will train you to answer a particular question rather than "any question."

‣

Job Training Program

Examining job-training programs has been an important policy to evaluate, as these are pretty common. The following example follows from Scott Cunningham's Mixtape book. The National Supported Work Demonstration (NSW) job-training program was operated by the Manpower Demonstration Research Corp (MRDC) in the mid-1970s. The NSW was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment. It was also unique in randomly assigning qualified applicants to training positions. The treatment group received all the benefits of the NSW program. The controls were left to fend for themselves. The program admitted women receiving Aid to Families with Dependent Children (AFDC), recovering individuals with addiction, released individuals who were found guilty of an offense, and men and women of both sexes who had not completed high school. The MDRC collected earnings and demographic information from the treatment and the control group at baseline and every nine months after that. MDRC also conducted up to four post-baseline interviews. There were different sample sizes from study to study, which can be confusing.

Write the earnings comparison between people in the program vs. those not in the program in a regression format. Explain how you would code each variable.

‣

Answer

✅

Earnings_i= \alpha_0+ \beta_1NSW_i+ \epsilon_i

Where

NSW

is a binary variable with a value of 1 if you were in the program and 0 otherwise; earnings are measured as total monthly earnings. Notice that I should specify how the variables are measured and/or the units of the variables (unless this is already specified in the questions). It's good practice to think, “How does the data look?” “How are these variables coded?” can help you determine how the variables would be specified.

Write out the same comparison in conditional expectation language.

‣

Answer

✅

E(Earnings|NSW=1)-E(Earnings|NSW=0)

Conditional expectation language and regressions complement each other. If you are comfortable with regressions, this can inform you how to write conditional expectations. Similarly, if you are comfortable with conditional expectations, this can help you write the regression.

The good news for MDRC and the treatment group was that the treatment benefited the workers. Treatment group participants’ actual earnings post-treatment in 1978 were more than the earnings of the control group by approximately $1,600, as Dehejia and Wahba (2002) estimated. In the paper by Dehejia and Wahba (2002), they wanted to explore whether the results from an experimental setting (an RCT) could be replicated using covariates. The authors used the non-experimental data from the current population survey (CPS) and the panel study of income dynamics (PSID) (two publicly available surveys) to create a similar control group with the data of people in the treated group from the RCT. The adjusted models are the models which include covariates. The list of covariates varies by sample.

Practice reading the results from the table and interpreting each coefficient.

‣

Answer

✅

Here are some: Going through the program increases earnings by almost $1,800. This coefficient is statistically significant. Using PSID, people who did the NSW program had about $15,000 less in earnings than those who did not. This difference is statistically different from 0.

Practice knowing which estimates are statistically significant and which are not

‣

Answer

✅

The ones that are not statistically significant are the second column of the multivariate regressions.

What are the results of the naive comparison?

‣

Answer

✅

What’s the naive comparison here? It’s the simplest comparison, so that should tell us it is not the adjusted model. It should be one of the unadjusted models. Now, between the RCT and the plain regression, we would think the naive comparison is not the RCT since it’s being randomized. Hence, the naive comparison results say that people who do this program earn an average of $8.5K to $15k less than people not in the program. Not very encouraging for the program.

How does the naive comparison change when adding controls? What does this tell you about the signs of bias when adding controls?

‣

Answer

✅

Once you add controls, the results shift to the job training program being associated with an increase in earnings of about $731 and $972, although none of these estimates are statistically significant. The sign of the bias must be negative, pushing the naive estimates in a more negative direction when not adding controls. What could this be? That’s the next question! (Not directly about this, but notice that the direction from the multivariate model with controls and RCT is the same direction, negative bias).

The results from the multivariate regressions are different from those with an RCT. What could be explaining this difference?

‣

Answer

✅

The results from the model with covariates are not doing a good job of explaining the selection into the program. In this case, we compare people who need a job training program due to some unfortunate history of getting employment with earnings from people who chose not to attend the program. This makes it seem that the program doesn't work (and has negative earnings) when it works. In the reading about this program, you would learn that the program admitted women receiving Aid to Families with Dependent Children, recovering people with addiction, released offenders, and men and women of both sexes who had not completed high school. The regressions with controls are trying to capture this selection but do not do a good enough job of capturing the selection.

Starting with the model with covariates that don't come from the RCT. Write down one regression that will allow us to explore how the effect differs between females and males.

‣

Answer

✅

Earnings=\alpha_0+\beta_1 Program+\beta_2 Female+\beta_3Program\times Female + other\ covariates + \epsilon

The main effect in the model with covariates and using the PSID is $731. If we ran the same regression but only using males, the program's effect would be $800, and the program's effect when only using females would be $200. We also know that males' average earnings before the program is $600, and females earn 85 cents on the male dollar. Use these numbers to provide numerical values for the coefficient in the regression you proposed. Then, tell us if this program reduces or increases the earning gap, and if so, by how much.

‣

Answer

✅

This question has everything: reverse magnitude, marginal effects, interactions, and regression interpretation. Here are the answers, and find out what process works best for you to get to the answers:

$\beta_1 =800\ and\ \beta_3=-600$ $\alpha_0=600\ and\ \beta_2=-90$ In the second part, without the program, we know that females make 85% of what males make. After the program, males make $1,400 while females make $710. This means that now females make 50% of what males make, widening the gap by 35 percentage points or increasing the gap by 41%.

For this exercise, let's use the results from the second column. Let's assume the RCT was run using a representative sample of Virginia. The earnings for the control group were $2,000. If we were to use these results for the rest of the U.S., what would be the average salary of people in the program if the average salary for the target group of the rest of the U.S. is around $1,300?

‣

Answer

✅

The program increased income by 83.6%. We can see this by dividing the treatment effect over the control mean. Hence, we would expect a similar increase for the U.S, and so the average salary would increase to $2,386

‣

CEO's and Salary

In some economic frameworks, wages represent productivity. CEOs of medium and big businesses have their wages attached to the profits in a given year. We want to explore this relationship. Suppose you estimate a simple regression of CEO salary (in $ 1,000) on firm profits (in $ 1,000,000s) and obtain the following result:

CEOsalary_i= 476.92+0.572Profits_i

The average profits in this sample are $200 million. Given this information, can you determine the average annual salary for a CEO in this sample? Show your work.

‣

Answer

✅

Since the average profits are $200 million, we would insert this into our regression to determine the average annual salary: 476.92+0.572*200=$591.32. Notice that since the variable is already in millions, I need only to put 200 instead of 200 million. The average salary is $591,320.00.

Given the estimated model, what do you predict would be the salary for a CEO whose company breaks even? How much is the salary expected to increase for every additional million in profits? What would happen if the company lost a million dollars? Show your work.

‣

Answer

✅

If a company breaks even, its profits are 0 (costs equals revenue). This means the average CEO salary is 476.92+0.572*0= 476.92 thousand dollars or $476,920. We expect the CEO's salary to increase by $572 for every additional million dollars in profits. If the company lost a million dollars in profits, then we expect the salary of the CEO to decrease by $572

Does this simple regression necessarily capture the causal relationship between a firm's profits and its CEO's salary? Explain your answer.

‣

Answer

✅

No, this regression captures a correlation (or association)between profits and CEOs’ Salaries. Other potential determinants of a CEO's salary also affect profits. For example, the money raised by the CEO could affect profits and also be part of determining a CEO's salary.

Suppose the "true model" includes firm size. Since we have omitted company size in the model above, how is the coefficient on profits likely to be biased? Explain your answer.

‣

Answer

✅

Notice that we have not provided you the signs of the needed correlations in this question, so you can come up with any signs that make sense to you and practice answering the question. Here is one version: If the firm size is an OVB, to determine if the coefficient on profits is biased upward or downwards, we would have to determine how firm size is correlated with Y (CEO Salary) and D (Profits). We hypothesize that a larger firm (higher number of employees) would imply having a CEO earning more money (i.e., a positive relationship). We would also imagine that since a firm is larger, this is correlated positively with profits. Given these two relationships, the coefficient on profits (0.572) is expected to be larger (more positive) than the true beta. In other words, in the regression above, we are comparing firms with higher profits and lower profits and determining that firms that have higher profits have higher CEO salaries, but this could be the fact that we are comparing big and small firms and that in reality there may be no relationship between profits and CEO salary.

Unfortunately, you have lots of missing data from the CEO salary side, and we are wondering how the missing data will bias our estimates. Formulate a situation in which this missing data would imply that our current regression understates the relationship between profits and salary (i.e., our estimates are more negative because we don't include the missing data).

‣

Answer

✅

Again, this is a sign of the bias exercise. Let’s say that the missing data comes from CEOs whose salaries are really high, and so they would like to conceal them. Let’s say that these are CEOs whose profits are low but whose salaries are high. That’s the missing data. How would this affect our results? The sign of the bias here is not driven by an OVB per se but by missing data, but the key is how this component is correlated with Y and D. In our example, this component is negatively correlated with Profits and positively correlated with CEO salary. and so the bias will be negative, which means that our estimates when we don’t include the missing data will be more negative or understating the true relationship. How did I come up with the story? I did it backward, I noticed what sign I needed and then I created a story that gave me those signs. This also shows another powerful use of the sign of bias; we can infer (with small assumptions) if we are under or over-estimating our effects even when we have missing data. The key is to think what’s the bias that not having this missing data is having?

We are following the example of missing data. Imagine there is no pattern, or we don't have any reason to believe there is a pattern in the missing data. Will this still bias our results?

‣

Answer

✅

If we think the missing data is random, then no. We don’t think this will bias our results. The only way it would is if we think there is a pattern with respect to Y and our main variable of interest.

‣

An example from an APP

Last year, a Batten student had an Applied Policy Project (APP) about malnutrition in Guatemala. Her client has told her that malnutrition is prevalent in poor rural communities and indigenous communities. The student thinks that there may be discrimination against indigenous communities from the government, but her client thinks she is correlating poverty with being indigenous. She is given data that has information at the “county” level. It contains information on whether or not a given county has received government aid, the percentage of people living in poverty in that county, and whether that county is considered an Indigenous community or not (which takes the value of 1 if at least 60% of the population comes from an indigenous background and 0 if not). To disentangle the effect of both and test her hypothesis, the student first runs this regression:

Pr(Receiving\ Government\ Aid)=\beta_{0}+\beta_{1}Poverty\ Level_{c}+\beta_{2}IndigenousCommunity_{c}+\beta_{3}(IndigenousCommunity\times Poverty\ Level)_{c}+\epsilon_{c}

Practice what is the interpretation of each coefficient.

‣

Answer

✅

1) For non-indigenous communities, a one percentage point increase in poverty level is associated with a 0.75 percentage point increase in receiving aid. Another interpretation: a change of 0.75 percentage points is poverty's marginal effect on non-indigenous communities receiving aid. Now, the confusing part here is the 0.75 interpretation. Shouldn't it be 75 percentage points? The Y, being a binary variable, does give us the percentage point unit change for our beta. Notice however, what is a unit change in poverty level, the variable is going from 0 to 1 but not in a binary way, but a continuous way, so going from 0 to 1 is like going from 0 poverty to 100 percent poverty. Therefore, a one percentage point change is not a change from 0 to 1, but a change from 0 to 0.01, and so 0.01x0.75=0.0075, but now to have the percentage point interpretation, we multiply by 100, so 0.0075*100=0.75, and so we end up with the interpretation above. This exercise highlights thinking clearly about what a one-unit change in X is and what a one-unit change in Y is.

2) Indigenous communities with zero poverty are 35 percentage points more likely to receive aid than non-indigenous communities.

3) Compared to non-indigenous communities, indigenous communities are 1.2 percentage points less likely to receive aid per percentage point increase in poverty.

Using the estimates above, what is the marginal effect of being indigenous for a county with a 10% poverty level?

‣

Answer

✅

We take the derivative with respect to “indigenous” and then plug in the value: marginal effect of being indigenous: 0.35 - 1.2*0.1 = 0.23. Notice that 10% for the poverty-level variable should be expressed as 0.10. Since the value of indigenous is binary, we say the marginal effect of being indigenous for a county with a 10% poverty level is 23 percentage points. If you are having trouble with this, play with the exercise yourself in STATA

clear
sysuse auto
set seed 12345
generate scale_1 = runiform()
sum uniform
gen scale_100= scale_1*100

reg price scale_1
reg price scale_100
* Notice how a one unit change in scale_1 is really a 100% unit change in scale_100

Using the results from this model, she estimates the predicted probability for two counties with the same poverty level (10%), one county considered indigenous and the other not. She finds that the model predicts that indigenous counties have a higher likelihood of receiving government aid than non-indigenous counties. Does this finding contradict her hypothesis that indigenous communities are discriminated against?

‣

Answer

✅

No, because the marginal effect of poverty on indigenous communities is negative. At higher levels of poverty, indigenous communities will have a lower likelihood of receiving government aid. For example, at 50% poverty, indigenous communities are less likely to receive government aid than non-indigenous communities. In short, discrimination can change across poverty thresholds.

She now realizes that there are non-linear returns to poverty, and she estimates the following equation:

Pr(Receiving\ Government\ Aid)=\gamma_{0}+\gamma_{1}Poverty\ Level_{c}+\gamma_{2}Poverty\ Level_{c}^{2}+\gamma_{3}IndigenousCommunity{}_{c}+\epsilon_{c}

Where, $\gamma_0=3.99\ \gamma_1.=0.12,\ \gamma_2=-0.01\ \gamma_3=-0.3$ At what poverty level do the predicted probabilities reach a maximum? Explain how the effects of poverty on the likelihood of receiving aid change across poverty levels. For this problem, consider the poverty level ranging from 0 to 100 instead of 0-1. Another way of asking the same question is to “provide a full interpretation of the relationship between the poverty level and government aid.” Notice that the second version of the question is more general and does not hint at precisely what you should do. Still, hopefully, you recognize that seeing a squared term should mean a more careful interpretation.

‣

Answer

✅

Since the poverty level is squared, depending on the signs of the coefficient, there could be a min or a max. We find the min or max by taking the derivative and setting it equal to 0.

\frac{\delta Pr(RGA)}{\delta Pvrty\ Level}=\gamma_{1}+2*\gamma_{2}PL=0.12+2*(-0.01)*PL \\ 0.12+2*(-0.01)*PL=0 \\ PL=\frac{-0.12}{-0.02} \\ PL=6

Since $\gamma_1>0\ and\ \gamma_2<0$ , this means that as poverty increases, the likelihood of receiving government aid increases as well then once the poverty level hits 6% each marginal percentage point increase in the poverty level decreases the likelihood of receiving aid.

‣

Three times the charm

A researcher is interested in the effect of having a third child on a woman’s wages (where the data set contains women with at least two children). She wants to estimate the following model:

log(wage)=\beta_{0}+\beta_{1}ThirdKid+\beta_{2}Educ+\beta_{3}Exper+\beta_{4}Exper^{2}+\epsilon

Where wages are log hourly wages, $thirdkid$ is a dummy=1 if the woman has a third child, and the education and experience variables are defined in years.

The researcher decides to use “ $sexmix$ ” as an instrument for “ $thirdkid$ ,” where “ $SameSex$ ” is a dummy=1 if the first two children are of the same sex and is equal to zero if they are of the opposite sex. First, why might the researcher want to use an instrument for “ $thirdkid$ ?”

‣

Answer

✅

If we don't instrument for the third kid, We are comparing women who decided to have a third kid versus women who did not. Deciding to have a third kid could be driven by education level, family size, marriage status and a series of observable and unobservable characteristics that also have an effect on wage. In addition, even when we could control for those observable characteristics, we also have the problem of reverse causality. That is, higher wages lead to having more kids. IV can help soothe both of these problems.

Do you think the variable “ $SameSex$ ” meets the requirements for an instrument? Be sure to address each of the requirements for instrumental variables.

‣

Answer

✅

We would like to ask four major questions:

Does the instrument affect X (First-stage)? This is plausible. We have seen evidence that having two sets of same-sex kids makes certain people more likely to have a third kid. More importantly, this is testable.
Is the instrument randomly assigned? Conditional on having two kids, who gets two males or females vs female and males could be considered random. One could argue that in certain places families could “chose” this based on sex preferences before birth (i.e. making birthing decisions based on gender) and that families with higher resources could enact these preferences in a higher rate than families with lower resources.
Can the instrument affect Y through another mechanism that is not X? It is hard to come up with arguments of another path in which sex-mix could affect wages that is not through the number of children's mechanism. One argument could be that the gender compositions of the siblings directly affects educational investment because of gender preferences of the parents.
Monotonicity: Does the instrument only push people in one direction? This is credible; it's hard to think that having two same-sex children makes you more likely not to have a third child (relative to a non-same-sex pair). It's possible but less likely.

Write down the equation the researcher will estimate as the first stage using 2SLS.

‣

Answer

✅

ThidKid=\alpha_{0}+\alpha_{1}SameSex+\alpha_{2}Educ+\alpha_{3}Exper+\alpha_{4}Exper^{2}

note that is is important to have the controls.

Write down the equation the researcher will estimate as the second step. Which parameter tells you the effect of a third child on wages?

‣

Answer

log(wage)=\beta_{0}+\beta_{1}\widehat{ThirdKid}+\beta_{2}Educ+\beta_{3}Exper+\beta_{4}Exper^{2}+\epsilon

Where $\widehat{ThirdKid}$ comes from the predicted values of the first stage (which is important to note). $\beta_1$ would recover the parameter of interest. Note that it is important to have the controls and to specify what a third kid hat is, not enough to say it is just a “hat.”

Write down the equation to estimate if they were to use the reduced form.

‣

Answer

✅

log(wage)=\beta_{0}+\beta_{1}SameSex+\beta_{2}Educ+\beta_{3}Exper+\beta_{4}Exper^{2}+\epsilon

Who are the never-takers in this example?

‣

Answer

✅

The never-takers are individuals who have the same sex for their first two kids, would not make them have a third kid, and if their kids are different sex, they would also not make them less likely to have a third kid. They would stop at two.

Imagine you find a table with the following results.

Which of these estimates is the IV estimates? What is the interpretation?

‣

Answer

✅

It’s the -0.15. This means that having a third kid decreases wages by 15% for woman whose third kid was incentivized by same-sex first two children.

What are columns 1,2 and 3 representing, respectively?

‣

Answer

✅

Column 1 (First stage) Column 2 (Second Stage) Column 3 (ITT or Reduced Form)

What is the value of the coefficient on same-sex in the third column? Provide an interpretation

‣

Answer

✅

It’s -0.01185. The interpretation is, having first two kids being the same sex decreases a woman’s wages by 1.2 percent approximately.

‣

Get out the vote

To better understand the effects of “Get-out-the-vote” messages on voter turnout, Gerber and Green (2005) conducted an RCT involving approximately 30,000 individuals in New Haven, CT, in 1998. One of the treatments was randomly assigned in-person visits in which a volunteer visited the person's home and encouraged him or her to vote. Table 3 reflects the findings from the RCT. Before answering the questions, think about what the instrument (Z), the main explanatory variable (D), and the main outcome (Y) would be.

What is the estimate of the first stage (Effect of Z on D)? Show your calculation

‣

Answer

✅

Z is = assign or not to in-person contact. D=Actually contacted and Y=Voted in 1998. The effect of Z on X, is the effects on being assigned to a group on actually being contacted. This is just the difference between the first row 0.28-0.03=0.25. This says that being assign to in-person contact increases the likelihood of being contacted by 25 percentage points.

What is the estimate of the reduced form (Effect of Z on Y)? Show your calculation

‣

Answer

✅

This is the effect of being assigned to in-person contact on voting. This is just the difference of the second row 0.47-0.45=0.02. This says that being assigned to in-person contact increases the likelihood of voting by 2 percentage points.

What is the IV estimate of the effect of in-person contact on voting?

‣

Answer

✅

In order to obtain the IV estimate we need to obtain the

\frac{ITT}{FS}

, so in this case, the ITT is=0.02 and the first stage is 0.25, so the IV estimate is: 0.020.25=0.08.

Provide an interpretation of the IV estimate.

‣

Answer

✅

This says that having an in-person contact increases your likelihood to vote by 8 percentage points. Notice that is not “having been assigned to in person contact” it’s the effect of actually being contacted. One could add for people that were assigned to in person contact and they were contacted, but in this context that’s not very different from just saying “people who had an in person contact”

‣

Linearities

A hypothesis exists out there that discrimination happens to people with high BMI (Body Mass Index) and potentially low BMI as well. Hence, it is important to understand the relationship between BMI and wages. However, it’s unclear if the relationship is causal; if we see lower wages for people with high BMI, is it because having a high BMI is correlated with characteristics that would make them have lower wages? Or because they are discriminated against because they have high BMI? This graph tried to answer that question by dividing the effects of BMI on wages between jobs requiring high and low levels of social interaction. The idea is that if discrimination doesn’t contribute to the decreases in wages, then we should not see any difference between these two lines, but if there is, then we would see a difference. The following graph presents the relationship between BMI and wages across high and low-social jobs. Wages are measured and Ln(wages), which we have not seen yet, so for this exercise, you can treat them as just hourly wages.

Write out one regression that would allow you to draw the graph for the line of low-social jobs

‣

Answer

✅

The regression should look like this:

Ln(Wages)=\alpha_0+\beta_1BMI+\beta_2BMI^2\ if\ BlackFemale==1 \ and\ Low\ Social\ Job=1

Write out one regression that would allow you to draw the graph for the line of high-social jobs

‣

Answer

✅

Ln(Wages)=\alpha_0+\beta_1BMI+\beta_2BMI^2\ if\ BlackFemale==1 \ and\ Low\ Social\ Job=0

What would be the sign of the coefficient on the square term in both regressions proposed?

‣

Answer

✅

Since these are both local max or concave, it would have to be negative.

Write out one regression that would allow you to draw both lines from the graph

‣

Answer

✅

\ \ n(Wages)=\alpha_0+\beta_1BMI+\beta_2BMI^2\ + \beta_3 Low Social Job +\beta_4 BMI\times Low Social Job +\beta_5 BMI^2 \times Low Social Job \ \ if \ BlackFemale==1

‣

Practice multiple choice in quizzes

Recall that you can re-take the quizzes for more practice or work on examples from the worksheets.