Purpose
The objective of this homework is for you to practice concepts learned in class and apply them. The ideas we will practice in this homework relate to understanding fixed effects. As you may notice, the questions are becoming more colloquial. There may be many paths toward the correct answer in some cases, and in others, there is only one path.
col·lo·qui·al
/kəˈlōkwēəl/
adjective
- (of language) used in ordinary or familiar conversation; not formal or literary.
"colloquial and everyday language"
Guidelines
- Work will be independent.
- Submit your answers to Gradescope (within Canvas).
- We encourage you to use the answer boxes, PDFs, JPGs, and PNGs, preferably over Word documents or CVs. Recall you can always save something as a PDF. You can also “Screenshot” anything. In Windows, you can use the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
- Submit your do-file to Gradescope (within Canvas).
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- Your responses should be professionally formatted and written.
- The due date is Friday April 11th 11:59pm EDT
- For statistical significance, we’ll count it if its significant at the 95% confidence or higher.
Preamble
For this homework, we will work with the dataset called “Ch8_Exercise3_Teaching_evals.dta”, which you can find at this link.
With these data, we want to understand the relationship between teaching evaluations and class grades. Faculty have always hypothesized that classes with higher overall grades obtain better teaching evaluations. Many reasons could explain this relationship (higher class grade, higher evals), and we will evaluate this claim using data in this homework. The questions asked here are similar to questions I got asked when evaluating data for a school. You will notice that the questions are broad, but I was able to use tools and skills from RMDA to provide concrete answers.
Understanding the panel
Open the dataset and notice its structure. Answer the following questions (for yourself): What type of panel is this? What could be two dimensions? Before you keep reading, see if you can determine the variables' meaning. This is good practice for starting to notice what data contains without a data dictionary. Some Tips
The variable Apct
indicates the percentage of As in the class. Eval
indicates the average teaching evaluation score the instructor got for that particular class. One (1) indicates the lowest score (not great evals), and 5 indicates the highest score (great evals). The rest of the variables should be self-explanatory. Notice that the year
variable indicates the academic year instead of the calendar year, so the value 200304 is indicating the academic year 2003-2004.
The race
variable indicates the instructor's race. Unfortunately, we don’t know the meaning of the values 1 or 2 for the race variable. We only know that the value of 0 indicates “white”, so convert this variable into a binary variable where white takes the value of 1 and 0 otherwise. Math
indicates if the course if quantitative or not.
You’ve been hired as an intern to analyze Batten data on classes and see if we can understand from the data the claim that higher average grade in the class correlated to better faculty evaluations. Think about what other confounders there may be (maybe whether the class in quantitative or not is a confounder for example).
- Amanda Crombie is eager to hear what you find in the data. She emailed you a set of questions you could answer with the data. For each of the following question, use whatever statistical method to determine which class type has higher evals. However, in addition to reporting the actual difference for each of these comparisons, indicate the difference in evals as a percentage difference and the p-value that suggests if the difference is statistically significant. In Gradescope, submit a table with the three numbers needed and then for the questions write sentences that explain your findings from the table. These questions aim to describe patterns instead of untangling the causal effect. We will, however, test the hypothesis and see what the data says about those. Sample Table
- Submit the table with the results
- Who gets higher evaluations, male or female instructors? Does this difference matter?
- Who gets higher evaluations, white or non-white instructors? Does this difference matter?
- Which classes get higher evaluations, spring or fall semester classes? Does this difference matter?
- Which classes get higher evaluations, required or non-required classes? Does this difference matter?
- Which classes get higher evaluations, classes that use math or don’t? Does this difference matter?
- Amanda also wants to know which classes get higher evaluations: Classes with a large number of students or a small number of students. Use a tool learned in class to respond to this question. Again, the goal is to describe, not to detect causal effects.
- Present a screen shot of your analysis
- Provide an answer based on what you find in the analysis.
- Now let’s move on to understanding how students’ performance in the class is related to instructor’s evaluations. Please run the following models and present them in a table; only report the coefficient on
Apct
- Model 1: regression of evaluations on the percent of students with As
- Model 2: Model 1 + all variables related to observable characteristics about the course (year, required, enrollment, spring, math)
- Model 3: Model 2 + all variables related to observable characteristics of the instructor (race and gender)
- Model 4: Model 1 + course FE
- Model 5: Model 1 + instructor FE
- Model 6: Model 1 + course FE + instructor FE
- Use the results from the table you created in question 3 to answer the following questions:
- Using the results from Model 1, what would be your overall conclusion about the relationship between students’ performance and evals?
- Using results from model 1, 2 and 3 How does adding either instructor or course characteristics change this conclusion?
- What are some observable characteristics that course FE accounts for?
- What are some unobservable characteristics that course FE accounts for?
- Which characteristics about the course that you are controlling for in model 2 are also controlled in model 4?
- Amanda sees the result from Model 5 and worries that you haven’t accounted for the instructor’s gender. You mentioned this seems important, but she hasn’t seen it in the regression. What would be your response?
- Amanda sees the result from Model 5 and worries that you haven’t accounted for enrollment in the course. You mentioned to her before that this seems like an important variable. What would be your response?
- Building model 7: Model 6 includes course and instructor fixed effects, now let’s add to that model things that change across those two margins: enrollment and the semester when the class is taught. Finally add “year” fixed-effects into the model. Use the results from this model to report a final conclusion about how student performance affects instructors’ evaluations. Report a screenshot of your result with just the main explanatory variable. Present your results in technical and non-technical way and remember to assess significance!
Extensions
As always, these are ungraded questions, for you to practice
Provide examples of what a fixed effect for each of the following would control: instructor, course, and year. Then, we invite you to think about whether the characteristic you thought it would control for is relevant or not.
Provide a graph that shows the evolutions of percentage of As over time, for quantitative vs. non-quantitative separately
Code
* Start of Do File For Students!
global hw5 "$dropbox/1_Classes/Research Methods/Spring 2023/homework/homework 5"
use "$hw5/hw5.dta", clear
gen white =1 if race==0
replace white=0 if race!=0
foreach var of varlist female white spring required math {
ttest eval, by(`var')
display (`r(mu_1)'-`r(mu_2)')/`r(mu_1)'
}
reg eval enrollment
estimates clear
eststo: reg eval apct
* Add in Characteristics about the course
eststo: reg eval apct year required enrollment spring math
* Add in Characteristics about the instructor
eststo: reg eval apct year required enrollment spring math white female
* Add in FE:
eststo: reg eval apct i.courseid
eststo: reg eval apct i.instrid
eststo: reg eval apct i.courseid i.instrid
eststo: reg eval apct i.courseid i.instrid enrollment
esttab, se keep(apct) mtitle("Baseline" "+Course Char" "+Instr Char" "Course FE" "Instrct FE" "Course +Instructor Fe" "+enrollment")