Purpose
The objective of this homework is for you to practice concepts learned in class and apply them to a real-case scenario. The ideas we will practice in this homework relate to understanding fixed effects. As you may notice, the questions are becoming more “colloquial.” There may be many paths toward the correct answer in some cases, and in others, there is only one path.
Guidelines
- Work will be independent.
- Submit your answers to Gradescope (within Canvas).
- We encourage you to use the answer boxes, PDFs, JPGs, and PNGs, preferably over Word documents or CVs. Recall you can always save something as a PDF. You can also “Screenshot” anything. In Windows, you can use the snipping tool or Windows+Shift+S. In Mac, you can do this by command+shift+4.
- Submit your do-file to Gradescope (within Canvas).
- You will get points for correct answers. You will get points deducted if the answer contains more information that’s not necessary or if the answer contains incorrect statements among correct statements. In short, we are trying to incentivize students to use the least amount of characters while maximizing the accuracy of responses.
- Your responses should be professionally formatted and written.
- The due date is Monday, April 15th, at 10 pm EDT.
Preamble
For this homework, we will work with the dataset called “Ch8_Exercise3_Teaching_evals.dta”, which you can find at this link.
With these data, we want to understand the relationship between teaching evaluations and class grades. Faculty have always hypothesized that classes with higher overall grades obtain better teaching evaluations. Many reasons could explain this relationship (higher class grade, higher evals), and we will evaluate this claim using data in this homework. The questions asked here are similar to questions I got asked when evaluating data for a school. You will notice that the questions are broad, but I was able to use tools and skills from RMDA to provide concrete answers.
Understanding the panel
Open the dataset and notice its structure. Answer the following questions (for yourself): What type of panel is this? What could be my two dimensions? Before you keep reading, see if you can determine the variables' meaning. This is good practice for starting to notice what data contains without a data dictionary.
The variable Apct
indicates the percentage of As in the class. Eval
indicates the average teaching evaluation score the instructor got for that particular class. One (1) indicates the lowest score (not great evals), and 5 indicates the highest score (great evals). The rest of the variables should be self-explanatory. Notice that the year
variable indicates the academic year instead of the calendar year.
The race
variable indicates the instructor's race. Unfortunately, we don’t know the meaning of the values 1 or 2 for the race variable. We only know that the value of 0 indicates “white.”
- Amanda is eager to hear what you find in the data. She emailed you a set of questions you could answer with the data. For each question, use whatever statistical method to determine which class type has higher evals. For each of these comparisons, indicate the difference in evals as a percentage difference and the p-value that suggests if the difference is statistically significant. For Gradescope, submit a table with the two numbers needed. These questions aim to describe patterns instead of untangling the causal effect. We will, however, test the hypothesis and see what the data says about those.
- Who gets higher evaluations, male or female instructors?
- Who gets higher evaluations, white or non-white instructors?
- Which classes get higher evaluations, in the spring or fall semester?
- Which classes get higher evaluations, required classes or non-required classes?
- Which classes get higher evaluations, classes that use math or don’t?
- Amanda also wants to know which classes get higher evaluations: Classes with a large number of students or a small number of students. Use a tool learned in class to respond to this question. Again, the goal is to describe, not to detect causal effects.
- Based on your exercise on (1), Amanda would like to know which characteristics of instructors or courses seem to matter for evaluations. (Hint: What does matter mean here? What concepts could one use to answer this?) Your answer should point out the characteristics and provide reasoning for why you picked those.
- Now let’s move on to understanding how students’ performance in the class is related to instructor’s evaluations. Please run the following models and present them in a table; only report the coefficient on
Apct
- Model 1: regression of evaluations on the percent of students with As
- Model 2: Model 1 + all variables related to observable characteristics about the course (year, required, enrollment spring, math)
- Model 3: Model 2 + all variables related to observable characteristics of the instructor.
- Model 4: Model 1 + course FE
- Model 5: Model 1 + Instructor FE
- Model 6: Model 1 + course FE + instructor FE.
- Use the results from the table to answer the following questions:
- Using the results from Model 1, what would be your overall conclusion about the relationship between students’ performance and evals?
- How does adding either instructor or course characteristics change this conclusion? (Model 2 and Model 3)
- What are some observable characteristics that course FE accounts for?
- What are some unobservable characteristics that course FE accounts for?
- Amanda sees the result from Model 5 and worries that you haven’t accounted for the instructor’s gender. You mentioned this seems important, but she hasn’t seen it in the regression. What would be your response?
- Amanda sees the result from Model 6 and worries that you haven’t accounted for enrollment in the course. You mentioned to her before that this seems like an important variable. What would be your response?
Notice that these questions do not ask for causal effects, just associations. A simple regression would suffice.
If you use a t-test or reg eval on the variables below, you’ll get approximately the results below.
Category | % | p-value | Sentence |
Female | 2.5 | 0.000 | Females score 2.5% less than males |
White | 0.4% | 0.5230 | White instructors obtain 0.4% more score than non-white |
Spring | 0.2% | 0.6912 | Spring instructors obtain 0.2% lower evaluation scores than fall instructors. |
Required | 4.9% | 0.0000 | Required courses get 4.9% less scores than non-required courses |
Math | 2.9% | 0.0000 | Math courses get 2.9% fewer evals score than non-math courses |
If you used regression with to obtain the coefficients in percent, then you’ll get approximately the results below
Category | Reg Eval on | Reg Log(Eval) -approx- on | Reg Log(Eval) - Actual |
Female | 2.5%*** | 2.46***% | 2.434%*** |
White | 0.4% | 0.179% | .179% |
Spring | 0.2% | -0.128% | -.128% |
Required | 4.9%*** | 5.21***% | 5.08%*** |
Math | 2.9%*** | 2.74***% | 2.67%*** |
We could also use a graphical representation of this through binscatter:
without logs
With logs
6. Run a final model, Model 6 + enrollment. Use the results from this model to report a conclusion about how student performance affects instructors’ evaluations. Report a screenshot of your result with just the main explanatory variable.
Extensions
As always, these are ungraded questions, for you to practice
Provide examples of what a fixed effect for each of the following would control: instructor, course, and year. Then, we invite you to think about whether the characteristic you thought it would control for is relevant or not.
Code
* Start of Do File For Students!
global hw5 "$dropbox/1_Classes/Research Methods/Spring 2023/homework/homework 5"
use "$hw5/hw5.dta", clear
gen white =1 if race==0
replace white=0 if race!=0
foreach var of varlist female white spring required math {
ttest eval, by(`var')
display (`r(mu_1)'-`r(mu_2)')/`r(mu_1)'
}
reg eval enrollment
estimates clear
eststo: reg eval apct
* Add in Characteristics about the course
eststo: reg eval apct year required enrollment spring math
* Add in Characteristics about the instructor
eststo: reg eval apct year required enrollment spring math white female
* Add in FE:
eststo: reg eval apct i.courseid
eststo: reg eval apct i.instrid
eststo: reg eval apct i.courseid i.instrid
eststo: reg eval apct i.courseid i.instrid enrollment
esttab, se keep(apct) mtitle("Baseline" "+Course Char" "+Instr Char" "Course FE" "Instrct FE" "Course +Instructor Fe" "+enrollment")