Purpose

The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The concepts we will practice in this homework are related to understanding IV. You are building your skills: reading papers, interpreting tables, interpreting coefficients from complex designs, and using research to think about policy implications. Look at you!

Clean your desk, get a bottle of water, pick your favorite beverage, turn on “do not disturb,” set a timer for 45 min (then take breaks), put on some work tunes, and dive into the fun of learning.

Guidelines

You can work by yourself or in groups of up to two students.
Submit your answers to Gradescope (within Canvas). If you are a group, we encourage you to submit one assignment per group, though unnecessary.
We encourage you to use the boxes; PDFs, JPGs, and PNGs are preferable over Word documents or CSV. Remember, you can always save something as a PDF. You can also “Screenshot” anything. In Windows, you can do this by using the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
Submit your do-file to Gradescope (within Canvas).
You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
Your responses should be professionally formatted and written.
The due date is Monday, Match 18th, at 10pm EDT.

Private Prisons & Length of Service

For this homework, we will ask questions based on the study by Anita Mukherjee titled “Impacts of private prison contracting on inmate time served and recidivism”. This paper examines the relationship between private prisons and length of service. New prisons are usually built because there is a “demand” for putting people into prisons. The main argument for a private prison system is that it saves money. The arguments against it focus on the dangers of having a private agent managing a “public” service. For example, their incentives may be to maximize profits (like any other economic agent) rather than focus on reducing recidivism, a welfare-improving outcome. In this homework, we will go over a paper that tries to tackle part of this issue. In theory, having a prison be private or public should not affect the crimes committed nor the sentencing, and so we should see no difference in length of service between public and private prisons, especially for two individuals with the same characteristics. This paper analyzes that question.

A lot of context is needed to understand the importance of this question and the complexity of the issue. First, read the introduction of the paper to get yourself acquainted with the context.

Understand the Paper

If we wanted to understand the effect of going to a private prison on length of service, what regression would we want to run? Write it out and describe how you would code each variable and your unit of observation.

‣

Answer

✅

I would want to estimate something along the lines of

LengthofService_i= \alpha_0+ \delta_1 Private_i+ \epsilon_i

. The length of service is a variable measured in days that represents the length of service of a given individual. Private is a binary variable that takes the value of 1 if in a private prison and 0 if not, meaning a public prison.

Read the paper and answer the following questions:

What are the main outcome variables? Explain how each of them is coded.

‣

Answer

✅

The main outcome variables are “Days Served” and “Fraction Served”. “Days served” represents the number of days a particular individual has served in prison. “Fraction served” is the share of the sentence that was served. For example if the sentence was 100 days and the individual served 50 days, then the fraction would be 1/2.

What is the main explanatory variable? Explain how it is coded.

‣

Answer

✅

The main explanatory variables is “Private”, which takes the value of 1 if the individual went a private prison and 0 if it went to a government run prison.

What is the instrument in this setting (be as explicit as possible)? Who are the "compliers" in this case? (We are not looking for a textbook definition of what compliers are). How is this instrument coded?

‣

Answer

✅

The instrument is called Capacity Shock. During a prisoner's sentence there could be a number of prisons opening or closing. This means that the “beds” available may increase or decrease. The instrument is capturing the change across time and space of availability of beds. The idea is that the more private beds available during the spell of the sentence the more likely it is for the prisoner to be in a private prison. The way this is coded is the variable

capacityschock_{ij}

takes the value of

C_j

which is the number of private prison beds opening or closing, with the restriction that this change has to occur at least 90 days after their admission and at least 90 days before the sentence end date. If there is no change, then it takes the value of 0. The compliers are the individuals who got put into a private prison because there were more private beds available during their sentence spell.

What is the LATE in this context? (I am not looking for the definition of LATE, but the interpretation within this context).

‣

Answer

✅

LATE is the causal effect of being in a private prison on length of service among the individuals who got put into a private prison because more beds were available during sentence time.

Write down an equation representing the “First Stage” in this setting.

‣

Answer

✅

Private_i = \alpha + \delta CapacityShock_i+ \delta X_i + \nu

Write down an equation representing the “Reduced Form” in this setting.

‣

Answer

✅

LengthofService= \alpha + \beta_{iv} CapacityShock+ \delta X+i + \epsilon

Which table (column?) or figure provides evidence of the relevance condition?

‣

Answer

✅

Figure 3 provides evidence of the first stage and table 2, column 5/6.

Does the instrument satisfy the exclusion restriction? Can you provide a compelling argument against it?

‣

Answer

✅

This is an ungraded question. The challenge of this question is trying to think of a sensible story of why the instrument would not pass the exclusion restriction. At the end of the day, it is hard to argue that the timing of opening and closings of private prisons is correlated with other characteristics that would affect length of service other than the likelihood of a person going to private prison.

What are the main findings?

‣

Answer

✅

The main finding is that being a private prison increases the length of service by 90 days.

What does this research imply about the cost-saving argument for private prisons?

‣

Answer

✅

This implies that the cost-savings are less than we thought. Even though we may be saving money, about 48% of these savings are eroded by the fact that people are having longer stays in prison.

Replicating the results

In this part of the homework, we will try to replicate the results from the paper. Open the dataset “hw3_clean.dta” and get familiar with the data. This data was offered by Professor Mukherjee but stripped of some details. The actual data is available for everyone through a FOIA request. Search for the Y and D, and the instrument is something we will create. Notice that $SentencedDays$ captures the days that the sentence was for, while $TimeServed$ captures the days served by the individual.

Data can be found here

One important instrument component is knowing when a prison opens or closes. The information can be found in Figure 1. For your convenience, we have summarized the changes in the following table.

First, we need to create the instrument. Recall the information from pages 420 and 421 of the paper. As Mukherjee states in the paper, the instrument is the accumulation of changes on the availability of beds across the sentence spell of the individual. For example, say an individual sentence's start date is December 6, 1996, and the sentence's end date is December 5, 2001. Their sentence spell covers the number of openings of beds available. Looking at Table 1, their sentence spell covers the opening from January 98, August 98, and April 99. Since each of those openings was 500 beds, the accumulated change during their sentence spell is 1,500 beds. Since the instrument is adjusted to a per 1,000-bed rate, we would divide the number by 1,000, thus ending up with a final value of the instrument of 1.5.

Let's do another example. Imagine a sentence starting on May 25, 2000, and the assigned end date is May 24, 2005. This sentence spells cover the closing in 2002 and the opening of a prison in 2004. The total accumulated change is -1000+1000=0. Therefore, the final value of the instrument is 0.

There is an additional component: changes should occur at least 90 days after the sentence date and 90 days before the sentence end date. Remember this when creating your variable.

Before you create the instrument, you may need to know how to use the d(date) function in STATA. This can help with the issue of the 90 days after and 90 days before. The d(date) function is a way for STATA to understand that you are referring to a date. For example, if you want to create a variable that takes the value of 1 if the sentence start date is a year away from a particular date, say June 1st, 1996, and 0 otherwise. You would use the d(date) function the following way:

gen dummy =0 

replace dummy=1 if sentence_start_date>=d(01jun1996)+365

The date must be written in the following format: ddmmmyyyy

Now that your background is ready, think about how you would create this variable, and then complete the following steps:

Create a variable called “sentence_end” which represent the end of the sentence. This can be done by adding the SentencedDays to the sentence_start_date. Browse SentencedDays and sentence_start_date to understand the variables and then create the variable. Show a screenshot of your browse for your answer.

‣

Answer

* #1 Creating Sentence End 
gen sentence_end= SentencedDays+sentence_start_date

Notice that when you create this variable, it may have numbers that are hard to interpret. This is because the sentence_start_date variable is formatted to be a date variable. In order to understand our variable sentence_end let's format it as a date variable. This is done by typing format sentence_end %td. Browse the start and end date variable to see that the are readable. Take a screenshot and submit as your answer & write it in your do-file.

‣

Answer

* #2 Formatting the variable
format sentence_end %td

br sentence_start_date sentence_end

Now you are ready to create the instrument. Although there are many ways to create the instrument in STATA, you can use just generate and replace commands with the d(date) function. This should be enough to create the instrument. The challenge is to think about how to create the instrument. Provide a screenshot of your code for this variable and then show the output from the following code sum (instrument variable name), det

‣

Some hints:

If you are having trouble understanding the logic of the coding steps, draw a timeline where you have the dates of the changes from Table 1. Then put a given sentence spell in the timeline. Use that diagram to obtain the value of the instrument for a particular sentence date. Do this several times until you discover the pattern.
The following are summary statistics of the instrument you would have created. If you matched these statistics this means you are on the right path. If not, try again!

sum Z
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        Z |     26,593    .2409281    .7588198         -1        2.5

‣

Answer

* #4 Creating the instrument 

gen ramp1 = 0
gen ramp2 = 0
gen ramp3 = 0
gen ramp4 = 0
gen ramp5 = 0
gen ramp6 = 0
gen ramp7 = 0

gen z1=1000
gen z2=1000
gen z3=500
gen z4=500
gen z5=500
gen z6= -1000
gen z7 = 1000

replace ramp1 = z1 if sentence_start_date +90 <= d(01jun1996) & d(01jun1996)<=(sentence_end)-90
replace ramp2 = z2 if sentence_start_date +90 <= d(01sep1996) & d(01sep1996)<=(sentence_end)-90  
replace ramp3 = z3 if sentence_start_date +90 <= d(01jan1998) & d(01jan1998)<=(sentence_end)-90  
replace ramp4 = z4 if sentence_start_date +90 <= d(01aug1998) & d(01aug1998)<=(sentence_end)-90  
replace ramp5 = z5 if sentence_start_date +90 <= d(01apr1999) & d(01apr1999)<=(sentence_end)-90  
replace ramp6 = z6 if sentence_start_date +90 <= d(01oct2002) & d(01oct2002)<=(sentence_end)-90  
replace ramp7 = z7 if sentence_start_date +90 <= d(01nov2004) & d(01nov2004)<=(sentence_end)-90 

egen ramp =  rsum(ramp1 ramp2 ramp3 ramp4 ramp5 ramp6 ramp7)
replace ramp = ramp/1000
label var ramp "Ramp ($\div$ 1000)"

/*
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        ramp |     26,593    .2409281    .7588198         -1        2.5

*/

gen z=ramp 
sum z, det

                              z
-------------------------------------------------------------
      Percentiles      Smallest
 1%           -1             -1
 5%           -1             -1
10%           -1             -1       Obs              26,593
25%            0             -1       Sum of wgt.      26,593

50%            0                      Mean           .2409281
                        Largest       Std. dev.      .7588198
75%            1            2.5
90%            1            2.5       Variance       .5758075
95%          1.5            2.5       Skewness      -.0900772
99%          1.5            2.5       Kurtosis       2.252839

Reduced form IV & 2SLS

For the following regressions use the outcome: TimeServed. Present the screenshots of your analysis as part of the answers. Stata output is fine.

Run the first stage regression and report the estimates.
Run the reduced form regression and report the estimates
Use the display command to obtain the IV estimate using your estimate from 1 and 2.
Now use the ivregress command and report the results.
Interpret the obtained IV estimate in plain words.

‣

Answer

✅

Going to a private prison increases one’s length of stay by about 86.62 more days for the individuals who were more likely to be assigned into private prison once they were more beds available in private prisons.

‣

Answer

* ===============
. * Section B 
. * ===============
. estimates clear 

. * #1 First Stage regression 
. reg ever_private ramp 

      Source |       SS           df       MS      Number of obs   =    26,593
-------------+----------------------------------   F(1, 26591)     =      4.65
       Model |  .725371978         1  .725371978   Prob > F        =    0.0310
    Residual |  4146.40854    26,591  .155932779   R-squared       =    0.0002
-------------+----------------------------------   Adj R-squared   =    0.0001
       Total |  4147.13391    26,592  .155954193   Root MSE        =    .39488

------------------------------------------------------------------------------
ever_private | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        ramp |   .0068828   .0031912     2.16   0.031     .0006279    .0131377
       _cons |   .1916633   .0025406    75.44   0.000     .1866835    .1966431
------------------------------------------------------------------------------

. local fs =_b[ramp]

. 
. * #2 Reduced Form regression 
. reg TimeServedSIM ramp 

      Source |       SS           df       MS      Number of obs   =    26,593
-------------+----------------------------------   F(1, 26591)     =      4.80
       Model |  5443.45279         1  5443.45279   Prob > F        =    0.0285
    Residual |  30152590.1    26,591  1133.93968   R-squared       =    0.0002
-------------+----------------------------------   Adj R-squared   =    0.0001
       Total |  30158033.6    26,592  1134.10174   Root MSE        =    33.674

------------------------------------------------------------------------------
TimeServed~M | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        ramp |   .5962427   .2721327     2.19   0.028      .062848    1.129637
       _cons |   741.4693   .2166546  3422.36   0.000     741.0447     741.894
------------------------------------------------------------------------------

. local itt =_b[ramp]

. 
. * #3 IV estimate 
. display "iv="`itt'/`fs'
iv=86.627712

. 
. * #4 IV estimate using IV regress
. ivregress 2sls TimeServedSIM (ever_private = ramp)

Instrumental variables 2SLS regression            Number of obs   =     26,593
                                                  Wald chi2(1)    =    4170.93
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.9988
                                                  Root MSE        =     1.1424

------------------------------------------------------------------------------
TimeServed~M | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ever_private |   86.62771   1.341345    64.58   0.000     83.99872     89.2567
       _cons |    724.866   .2594055  2794.34   0.000     724.3575    725.3744
------------------------------------------------------------------------------
Instrumented: ever_private
 Instruments: ramp

. 
end of do-file

Getting Closer to Mukherjee’s results

You may notice that the estimates from the previous section are not the same as the ones from the paper. We need to add controls to the exercise above. The controls we want to include are: age, race, education level, a dummy if the inmate is single or not, a dummy for each county of conviction, a dummy for each level of care, a dummy for each classification, and a dummy for each medical classification. You can read the paper to understand what these controls represent.

Run the following regression in Stata: $TimeServed = α_0 +βPrivate_i +(Controls)+ε_i$
Repeat the exercises 1-4 from section “Reduced form IV & 2SLS”, now using controls for the outcome time served.
Use the estimates to complete a table like this. Feel free to use esttab or to do it in word. The results you get may not be exactly the same as the authors result because we are not using exactly the same data.

‣

Answer

--------------------------------------------
						               TimeServed                   
                      (1)             (2)   
                      OLS              IV   
--------------------------------------------
ever_private        85.24***        86.52***
                 (0.0156)         (1.309)   
--------------------------------------------
N                   26593           26593   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Homework 3 with AK

Purpose

Guidelines

Private Prisons & Length of Service

Understand the Paper

Replicating the results

Reduced form IV & 2SLS

Getting Closer to Mukherjee’s results