Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The concepts we will practice in this homework are related to understanding IV.
Guidelines
- You can work by yourself or in groups of up to two students.
- Submit your answers to gradescope (within Canvas). If you are a group, we encourage you to submit one assignment per group, though not necessary.
- We encourage you to use the boxes, PDFs, JPGs, and PNGs are preferable over word documents or cvs. Recall you can always save something as a PDF. You can also “Screenshot” anything. In windows you can do this by using the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
- Submit your do-file to gradescope (within Canvas).
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- Your responses should be professionally formatted and written.
- The due date is Friday, February 17th, at 11:59pm EDT.
Private Prisons & Length of Service
For this homework we will ask questions based on the study by Anita Mukherjee titled “Impacts of private prison contracting on inmate time served and recidivism”. This paper examines the relationship between private prisons and length of service. New prisons are usually built because there is “demand” for putting people into prisons. The main argument for a private prison system is that it saves money. The arguments against it focus on the dangers of having a private agent managing a “public” service. For example, their incentives may be to maximize profits (like any other economic agent) rather than focusing on reducing recidivism, a welfare improving outcome. In this homework we will go over a paper that tries to tackle part of this issue. In theory, having a prison be private or public should not affect the crimes committed, nor the sentencing, and so we should see no difference of length of service between public and private prisons. Specially for two individuals with the same characteristics. This paper analyzes that question.
There is a lot of context needed to understand the importance of this question and the complexity of the issue, first read the introduction of the paper to get yourself acquaintance with the context.
Understand the Paper
- If we wanted to understand the effect of going to a private prison on length of service, what would be the regression we would want to run? Write it out and describe how you would code each variable, and what your unit of observation would be?
- Read the paper an answer the following questions:
- What are the main outcome variables? Explain how each of them are coded. (R
- What is the main explanatory variable? Explain how it is coded.
- What is the instrument in this setting (be as explicit as you can)? Who are the "compliers" in this case? (We are not looking for a textbook definition of what compliers are). How is this instrument coded?
- What is the LATE, in this context? (not looking for the definition of LATE, but the interpretation within this context).
- Write down an equation that represents the “First Stage” in this setting
- Write down an equation that represent the “Reduced Form” in this setting
- Which table (column?) or figure provides evidence of the relevance condition?
- Does the instrument satisfy the exclusion restriction? Can you provide a compelling argument against it?
- What are the main findings?
- What does this research imply about the cost-saving argument for private prisons?
- Create a variable called “sentence_end” which represent the end of the sentence. This can be done by adding the SentencedDays to the sentence_start_date. Browse SentencedDays and sentence_start_date to understand the variables and then create the variable. Show a screenshot of your browse for your answer.
- Notice that when you create this variable, it may have numbers that are hard to interpret. This is because the sentence_start_date variable is formatted to be a date variable. In order to understand our variable sentence_end let's format it as a date variable. This is done by typing format sentence_end %td. Browse the start and end date variable to see that the are readable. Take a screenshot and submit as your answer & write it in your do-file.
- Now you are ready to create the instrument. Although there are many ways to create the instrument in STATA, you can use just gen and replace commands with the d(date) function. This should be enough to create the instrument. The challenge is to think about how to create the instrument. Provide a screenshot of your code for this variable and then show the output from the following code “sum (instrument variable name), det”.
- Run the first stage regression and report the estimates.
- Run the reduced form regression and report the estimates
- Use the display command obtain the IV estimate using your estimate from 1 and 2.
- Now use the ivregress command and report the results.
- Interpret the obtained IV estimate in plain words.
- Run the following regression in Stata:
- Repeat the exercises 1-4 from section 2, now using controls for the outcome time served.
- Use the estimates to complete a table like this. Feel free to use esttab or to do it in word.
Replicating the results
In this part of the homework we will try to replicate the results from the paper. Open the dataset “hw4_clean.dta” and get familiar with the data. This data were offered by Professor Mukherjee, but stripped of some details. The actual data is available for everyone through a FOIA request. Search for the Y and D, the instrument is something we will create. Notice that captures the days that the sentence was for, while captures the days actually served by the individual.
One important component of the instrument is knowing when a prison opens or closes. The information can be found in Figure 1. For your convenience we have summarized the changes in the following table.

First, we need to create the instrument. Recall the information from pages 420 and 421 of the paper. As Mukherjee states in the paper, the instrument is the accumulation of changes on the availability of beds across the sentence spell of the individual. For example, say an individual sentence start date is December 6, 1996 and the sentence end date is December 5, 2001. Their sentence spell covers a number of openings of beds available. In fact, looking at Table 1, their sentence spell covers the opening from January 98, August 98 and April 99. Since each of those openings were 500 beds each, the accumulated change during their sentence spell is 1,500 beds. Since the instrument is adjusted to a per 1,000 beds rate, we would divide the number by 1,000, thus ending up with a final value of the instrument of 1.5.
Let's do another example. Imagine a sentence date started in May 25, 2000 and the end date assigned is May 24, 2005. This sentence spell covers the closing in 2002 and an opening of a prison in 2004. The total accumulated change is -1000+1000=0. Therefore the final value of the instrument is 0.
There is an additional component which is: changes should occur at least 90 days after the sentence date and 90 days before the sentence end date. Remember this when creating your variable.
Before you create the instrument you may need to know how to use the d(date) function in STATA. This can help with the issue of the 90 days after and 90 days before. The d(date) function is a way for STATA to understand that you are referring to a date. For example, if you want to create a variable that takes the value of 1 if the sentence start date is a year away from a particular date, say June 1st, 1996, and 0 otherwise. You would use the d(date) function the following way
gen dummy =0
replace dummy=1 if sentence_start_date>=d(01jun1996)+365The date must be written in the following format: ddmmmyyyy
Now that your background is ready, think about how you would create this variable, and then complete the following steps:
Reduced form IV & 2SLS
For the following regressions use the outcome: TimeServed. Present the screenshots of your analysis as part of the answers. Stata output is fine.
Getting Closer to Mukherjee’s results
You may notice that the estimates from the previous section are not the same as the ones from the paper. We need to add controls to the exercise above. The controls we want to include are: age, race, education level, a dummy if the inmate is single or not, a dummy for each county of conviction, a dummy for each level of care, a dummy for each classification, and a dummy for each medical classification. You can read the paper to understand what these controls represent.
