Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The concepts we will practice in this homework are related to understanding IV. You are building your skills: reading papers, interpreting tables, interpreting coefficients from complex designs, and using research to think about policy implications. Look at you!
Clean your desk, get a bottle of water, pick your favorite beverage, turn on “do not disturb,” set a timer for 45 min (then take breaks), put on some work tunes, and dive into the fun of learning.
Guidelines
- You can work by yourself or in groups of up to two students.
- Submit your answers to Gradescope (within Canvas). If you are a group, we encourage you to submit one assignment per group, though unnecessary.
- We encourage you to use the boxes; PDFs, JPGs, and PNGs are preferable over Word documents or CSV. Remember, you can always save something as a PDF. You can also “Screenshot” anything. In Windows, you can do this by using the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
- Submit your do-file to Gradescope (within Canvas).
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- Your responses should be professionally formatted and written.
- The due date is Monday, Match 18th, at 10pm EDT.
Private Prisons & Length of Service
For this homework, we will ask questions based on the study by Anita Mukherjee titled “Impacts of private prison contracting on inmate time served and recidivism”. This paper examines the relationship between private prisons and length of service. New prisons are usually built because there is a “demand” for putting people into prisons. The main argument for a private prison system is that it saves money. The arguments against it focus on the dangers of having a private agent managing a “public” service. For example, their incentives may be to maximize profits (like any other economic agent) rather than focus on reducing recidivism, a welfare-improving outcome. In this homework, we will go over a paper that tries to tackle part of this issue. In theory, having a prison be private or public should not affect the crimes committed nor the sentencing, and so we should see no difference in length of service between public and private prisons, especially for two individuals with the same characteristics. This paper analyzes that question.
A lot of context is needed to understand the importance of this question and the complexity of the issue. First, read the introduction of the paper to get yourself acquainted with the context.
Understand the Paper
- If we wanted to understand the effect of going to a private prison on length of service, what regression would we want to run? Write it out and describe how you would code each variable and your unit of observation.
- Read the paper and answer the following questions:
- What are the main outcome variables? Explain how each of them is coded.
- What is the main explanatory variable? Explain how it is coded.
- What is the instrument in this setting (be as explicit as possible)? Who are the "compliers" in this case? (We are not looking for a textbook definition of what compliers are). How is this instrument coded?
- What is the LATE in this context? (I am not looking for the definition of LATE, but the interpretation within this context).
- Write down an equation representing the “First Stage” in this setting.
- Write down an equation representing the “Reduced Form” in this setting.
- Which table (column?) or figure provides evidence of the relevance condition?
- Does the instrument satisfy the exclusion restriction? Can you provide a compelling argument against it?
- What are the main findings?
- What does this research imply about the cost-saving argument for private prisons?
- Create a variable called “sentence_end” which represent the end of the sentence. This can be done by adding the SentencedDays to the sentence_start_date. Browse SentencedDays and sentence_start_date to understand the variables and then create the variable. Show a screenshot of your browse for your answer.
- Notice that when you create this variable, it may have numbers that are hard to interpret. This is because the sentence_start_date variable is formatted to be a date variable. In order to understand our variable sentence_end let's format it as a date variable. This is done by typing
format sentence_end %td
. Browse the start and end date variable to see that the are readable. Take a screenshot and submit as your answer & write it in your do-file. - Now you are ready to create the instrument. Although there are many ways to create the instrument in STATA, you can use just generate and replace commands with the d(date) function. This should be enough to create the instrument. The challenge is to think about how to create the instrument. Provide a screenshot of your code for this variable and then show the output from the following code
sum (instrument variable name), det
- If you are having trouble understanding the logic of the coding steps, draw a timeline where you have the dates of the changes from Table 1. Then put a given sentence spell in the timeline. Use that diagram to obtain the value of the instrument for a particular sentence date. Do this several times until you discover the pattern.
- The following are summary statistics of the instrument you would have created. If you matched these statistics this means you are on the right path. If not, try again!
- Run the first stage regression and report the estimates.
- Run the reduced form regression and report the estimates
- Use the display command to obtain the IV estimate using your estimate from 1 and 2.
- Now use the ivregress command and report the results.
- Interpret the obtained IV estimate in plain words.
- Run the following regression in Stata:
- Repeat the exercises 1-4 from section “Reduced form IV & 2SLS”, now using controls for the outcome time served.
- Use the estimates to complete a table like this. Feel free to use esttab or to do it in word. The results you get may not be exactly the same as the authors result because we are not using exactly the same data.
Replicating the results
In this part of the homework, we will try to replicate the results from the paper. Open the dataset “hw3_clean.dta” and get familiar with the data. This data was offered by Professor Mukherjee but stripped of some details. The actual data is available for everyone through a FOIA request. Search for the Y and D, and the instrument is something we will create. Notice that captures the days that the sentence was for, while captures the days served by the individual.
Data can be found here
One important instrument component is knowing when a prison opens or closes. The information can be found in Figure 1. For your convenience, we have summarized the changes in the following table.
First, we need to create the instrument. Recall the information from pages 420 and 421 of the paper. As Mukherjee states in the paper, the instrument is the accumulation of changes on the availability of beds across the sentence spell of the individual. For example, say an individual sentence's start date is December 6, 1996, and the sentence's end date is December 5, 2001. Their sentence spell covers the number of openings of beds available. Looking at Table 1, their sentence spell covers the opening from January 98, August 98, and April 99. Since each of those openings was 500 beds, the accumulated change during their sentence spell is 1,500 beds. Since the instrument is adjusted to a per 1,000-bed rate, we would divide the number by 1,000, thus ending up with a final value of the instrument of 1.5.
Let's do another example. Imagine a sentence starting on May 25, 2000, and the assigned end date is May 24, 2005. This sentence spells cover the closing in 2002 and the opening of a prison in 2004. The total accumulated change is -1000+1000=0. Therefore, the final value of the instrument is 0.
There is an additional component: changes should occur at least 90 days after the sentence date and 90 days before the sentence end date. Remember this when creating your variable.
Before you create the instrument, you may need to know how to use the d(date) function in STATA. This can help with the issue of the 90 days after and 90 days before. The d(date) function is a way for STATA to understand that you are referring to a date. For example, if you want to create a variable that takes the value of 1 if the sentence start date is a year away from a particular date, say June 1st, 1996, and 0 otherwise. You would use the d(date) function the following way:
gen dummy =0
replace dummy=1 if sentence_start_date>=d(01jun1996)+365
The date must be written in the following format: ddmmmyyyy
Now that your background is ready, think about how you would create this variable, and then complete the following steps:
* #1 Creating Sentence End
gen sentence_end= SentencedDays+sentence_start_date
* #2 Formatting the variable
format sentence_end %td
br sentence_start_date sentence_end
sum Z
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
Z | 26,593 .2409281 .7588198 -1 2.5
* #4 Creating the instrument
gen ramp1 = 0
gen ramp2 = 0
gen ramp3 = 0
gen ramp4 = 0
gen ramp5 = 0
gen ramp6 = 0
gen ramp7 = 0
gen z1=1000
gen z2=1000
gen z3=500
gen z4=500
gen z5=500
gen z6= -1000
gen z7 = 1000
replace ramp1 = z1 if sentence_start_date +90 <= d(01jun1996) & d(01jun1996)<=(sentence_end)-90
replace ramp2 = z2 if sentence_start_date +90 <= d(01sep1996) & d(01sep1996)<=(sentence_end)-90
replace ramp3 = z3 if sentence_start_date +90 <= d(01jan1998) & d(01jan1998)<=(sentence_end)-90
replace ramp4 = z4 if sentence_start_date +90 <= d(01aug1998) & d(01aug1998)<=(sentence_end)-90
replace ramp5 = z5 if sentence_start_date +90 <= d(01apr1999) & d(01apr1999)<=(sentence_end)-90
replace ramp6 = z6 if sentence_start_date +90 <= d(01oct2002) & d(01oct2002)<=(sentence_end)-90
replace ramp7 = z7 if sentence_start_date +90 <= d(01nov2004) & d(01nov2004)<=(sentence_end)-90
egen ramp = rsum(ramp1 ramp2 ramp3 ramp4 ramp5 ramp6 ramp7)
replace ramp = ramp/1000
label var ramp "Ramp ($\div$ 1000)"
/*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
ramp | 26,593 .2409281 .7588198 -1 2.5
*/
gen z=ramp
sum z, det
z
-------------------------------------------------------------
Percentiles Smallest
1% -1 -1
5% -1 -1
10% -1 -1 Obs 26,593
25% 0 -1 Sum of wgt. 26,593
50% 0 Mean .2409281
Largest Std. dev. .7588198
75% 1 2.5
90% 1 2.5 Variance .5758075
95% 1.5 2.5 Skewness -.0900772
99% 1.5 2.5 Kurtosis 2.252839
Reduced form IV & 2SLS
For the following regressions use the outcome: TimeServed. Present the screenshots of your analysis as part of the answers. Stata output is fine.
* ===============
. * Section B
. * ===============
. estimates clear
. * #1 First Stage regression
. reg ever_private ramp
Source | SS df MS Number of obs = 26,593
-------------+---------------------------------- F(1, 26591) = 4.65
Model | .725371978 1 .725371978 Prob > F = 0.0310
Residual | 4146.40854 26,591 .155932779 R-squared = 0.0002
-------------+---------------------------------- Adj R-squared = 0.0001
Total | 4147.13391 26,592 .155954193 Root MSE = .39488
------------------------------------------------------------------------------
ever_private | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ramp | .0068828 .0031912 2.16 0.031 .0006279 .0131377
_cons | .1916633 .0025406 75.44 0.000 .1866835 .1966431
------------------------------------------------------------------------------
. local fs =_b[ramp]
.
. * #2 Reduced Form regression
. reg TimeServedSIM ramp
Source | SS df MS Number of obs = 26,593
-------------+---------------------------------- F(1, 26591) = 4.80
Model | 5443.45279 1 5443.45279 Prob > F = 0.0285
Residual | 30152590.1 26,591 1133.93968 R-squared = 0.0002
-------------+---------------------------------- Adj R-squared = 0.0001
Total | 30158033.6 26,592 1134.10174 Root MSE = 33.674
------------------------------------------------------------------------------
TimeServed~M | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ramp | .5962427 .2721327 2.19 0.028 .062848 1.129637
_cons | 741.4693 .2166546 3422.36 0.000 741.0447 741.894
------------------------------------------------------------------------------
. local itt =_b[ramp]
.
. * #3 IV estimate
. display "iv="`itt'/`fs'
iv=86.627712
.
. * #4 IV estimate using IV regress
. ivregress 2sls TimeServedSIM (ever_private = ramp)
Instrumental variables 2SLS regression Number of obs = 26,593
Wald chi2(1) = 4170.93
Prob > chi2 = 0.0000
R-squared = 0.9988
Root MSE = 1.1424
------------------------------------------------------------------------------
TimeServed~M | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ever_private | 86.62771 1.341345 64.58 0.000 83.99872 89.2567
_cons | 724.866 .2594055 2794.34 0.000 724.3575 725.3744
------------------------------------------------------------------------------
Instrumented: ever_private
Instruments: ramp
.
end of do-file
Getting Closer to Mukherjee’s results
You may notice that the estimates from the previous section are not the same as the ones from the paper. We need to add controls to the exercise above. The controls we want to include are: age, race, education level, a dummy if the inmate is single or not, a dummy for each county of conviction, a dummy for each level of care, a dummy for each classification, and a dummy for each medical classification. You can read the paper to understand what these controls represent.
--------------------------------------------
TimeServed
(1) (2)
OLS IV
--------------------------------------------
ever_private 85.24*** 86.52***
(0.0156) (1.309)
--------------------------------------------
N 26593 26593
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
X