This worksheet walks through the first-stage F-statistic β the central diagnostic for whether an instrumental variable is strong enough to use. We go from scratch: what the F-stat is, what it tests, how to compute it by hand and in Stata, and how the presence of covariates changes everything.
Dataset: nlsw88 (National Longitudinal Survey of Women, 1988), built into Stata.
Running example throughout:
- Outcome Y:
wage(hourly wage) - Endogenous variable D:
union(union membership β workers self-select into unions, so OLS is biased) - Instrument Z:
south(lives in the South β Southern states have historically lower unionization rates due to right-to-work laws) - Covariates X:
age,ttl_exp,collgrad
Note on the instrument: We use south as a teaching instrument. This is probably a bad instrument, but just wanted to use a dataset that yβall ca touch!Section 1: What Is the F-Statistic?
The F-statistic is a test statistic for a joint hypothesis β it tests whether a group of coefficients is simultaneously equal to zero. It is named after the statistician Ronald Fisher.
Technical definition: Given an unrestricted model and a restricted model (where some coefficients are forced to zero), the F-statistic measures how much the fit deteriorates when we impose the restrictions:
Where:
- = Residual Sum of Squares from the restricted model (without the tested variables)
- = Residual Sum of Squares from the unrestricted model (with the tested variables)
- = number of restrictions (number of variables being tested jointly)
- = sample size
- = number of slope parameters in the unrestricted model (not counting the constant)
- = denominator degrees of freedom
Less-technical definition: The F-statistic is a signal-to-noise ratio. The "signal" is the share of variation in D that Z accounts for; the "noise" is the residual variation in D that Z cannot explain. A large F means Z's signal stands out clearly above the background noise. A small F means Z is essentially indistinguishable from noise.
In plain language, an F of, say, 20 means: the variation in D explained by Z is about 20 times larger than the unexplained variation per degree of freedom β so the instrument's signal is clearly not just noise.
Section 2: What Is the F-Stat Testing in IV?
Before running IV, we must run the first stage: regress the endogenous variable D on the instrument Z (and any covariates X). The first-stage F-statistic tests:
- : The instrument(s) have no effect on the endogenous variable (coefficient on Z = 0)
- : The instrument(s) do affect the endogenous variable
Technical interpretation: A large first-stage F means the Z-driven variation in D is large relative to the residual variation in D that Z cannot explain. This is a signal-to-noise comparison β not a statement about the size of the coefficient on Z. A small coefficient can produce a high F if Z varies widely across observations; a large coefficient can produce a low F if Z barely varies. If we fail to reject , Z's signal in D is swamped by noise, and the IV strategy collapses.
Non-technical interpretation: Imagine trying to pick up a radio station while there is a lot of static. The F-statistic asks: how much of the variation we observe in D comes from the clean signal of Z, compared to the random noise that Z cannot explain? A high F means Z's signal is loud and clear β it accounts for a noticeable share of D's variation relative to the leftover noise. A low F means Z's signal is barely distinguishable from static β almost all the variation in D looks like noise from Z's perspective.
Notice this is very different from asking "how big is the coefficient on Z" β the F-stat is about explained variation relative to noise, not about the size of a marginal effect.
If the instrument is weak (low F), two bad things happen:
- IV estimates are biased β they drift back toward the biased OLS estimates
- Standard errors explode β the estimates become very imprecise
Section 3: The Formula in Detail
Symbol | Name | Meaning |
Restricted RSS | Misfit when we exclude the instrument | |
Unrestricted RSS | Misfit when we include the instrument | |
Improvement in fit | How much better the model fits when Z is added | |
Number of restrictions | How many instruments are being tested (usually 1) | |
Sample size | Total observations | |
Slope parameters | Regressors in unrestricted model (not counting the constant) | |
Denominator df | Observations "left over" after estimating all parameters |
A Brief Detour: What Are Degrees of Freedom?
Every time you estimate a parameter, you "use up" one observation. If you have observations and estimate 3 parameters (intercept + two slopes), you have degrees of freedom remaining.
Technical: Dividing SSR by its degrees of freedom gives an unbiased estimate of . If we divided by instead of , we would systematically underestimate .
Non-technical: Think of degrees of freedom as "free choices." If you have 100 exam scores and are told the mean is 75, you can freely choose 99 of them β the 100th is determined, like it has to be something specific because they mean is 75 and youβve picked the other numbers so the last number has to help you make the mean 75, you have 99 free choices but the last ainβt one. You've used up one degree of freedom to estimate the mean. Each additional parameter costs one more degree of freedom.
In the F-statistic:
- Numerator df = (how many restrictions we're testing; 1 for a single instrument)
- Denominator df = (observations minus parameters in the unrestricted first stage)
Special Case: One Instrument β
When there is a single instrument, there is a beautiful shortcut:
where is the t-statistic on the instrument in the first-stage regression. This holds because . You can simply read the t-stat off the Stata table and square it.
Section 4: Without Covariates β The Simple Case
sysuse nlsw88, clear
* First stage: does living in the South predict union membership?
reg union southFocus on two parts of the output:
- Top-right corner:
F(1, 1876) = [A]β this is the first-stage F - Coefficient table: the t-statistic on
south - Verify:
Computing F "By Hand"
Method 1: From the t-statistic
reg union south
scalar t_south = _b[south] / _se[south]
scalar F_hand = t_south^2
display "F from t-squared: " F_hand
display "F shown in header: " e(F)Method 2: From the reg components
reg union south
* e(mss) = model sum of squares
* e(rss) = residual sum of squares
* e(df_m) = numerator df (= q = 1)
* e(df_r) = denominator df (= n - 2)
scalar F_anova = (e(mss) / e(df_m)) / (e(rss) / e(df_r))
display "F from ANOVA: " F_anova
display "F from header: " e(F)Using the test Command
reg union south
test southOutput:
( 1) south = 0
F( 1, 1876) = [same as header]
Prob > F = [p-value]Section 5: With Covariates β The Critical Difference
* First stage with covariates
reg union south age ttl_exp collgradThe Trap: Two Different F-Statistics
The Stata output again shows an F-statistic in the header. But now this F tests:
(ALL slopes jointly zero)
That is not what we want. We want: Does south specifically predict union, after controlling for age, experience, and education?
This is the partial F-statistic for the instrument.
Why Does the Partial F Differ?
Model | Regressors included |
Unrestricted | south, age, ttl_exp, collgrad |
Restricted (partial F) | age, ttl_exp, collgrad (dropping only south) |
Restricted (overall F) | (nothing β just the constant) |
Imagine judging whether a new ingredient improves a recipe that already has five good ingredients. The overall F is like asking "Is this dish better than plain bread?" Of course it is. The partial F asks: "Does adding this one new ingredient improve the already-good dish?" The partial question is more relevant for assessing whether the instrument is doing work.
Degrees of Freedom Change
With covariates, the unrestricted first stage has 5 parameters (intercept + 4 slopes), so denominator df = , not .
quietly reg union south age ttl_exp collgrad
display "Denominator df with covariates: " e(df_r)The Correct Command: test south
reg union south age ttl_exp collgrad
test southOutput:
( 1) south = 0
F( 1, [n-5]) = [F_partial]
Prob > F = [p]This is the number to report as your first-stage F-statistic.
Does Still Hold?
Yes β but now is the t-statistic on south from the full regression with covariates.
reg union south age ttl_exp collgrad
scalar t_south_full = _b[south] / _se[south]
scalar F_partial = t_south_full^2
display "Partial F from tΒ²: " F_partial
test south
* Compare to the F shown by test southSection 6: The Rule of Thumb β F > 10
Stock and Yogo (2005) derived critical values for the first-stage F to ensure IV estimates are not severely biased relative to OLS.
F-statistic | Interpretation |
F > 10 | Instrument is "strong" β safe to proceed |
5 < F < 10 | Weak instrument β interpret with caution |
F < 5 | Very weak β IV estimates may be more biased than OLS |
Non-technical: Think of Z as a spotlight on D. A strong spotlight (high F) illuminates a clear share of D's variation that can be credibly attributed to Z, standing out above the surrounding noise. A dim spotlight (low F) can barely be distinguished from ambient light β Z's contribution to D's variation is swamped by everything else. When the spotlight is too dim, any IV estimate that relies on it becomes unreliable.
In plain language: an F of 20 means the signal from Z is roughly 20 times bigger than the noise β Z clearly explains a meaningful share of D's variation.
A weak instrument causes:
- Large IV standard errors (little variation to work with)
- IV estimates biased toward OLS (defeats the purpose)
- Confidence intervals with poor coverage
Modern alternatives to F > 10:
- Cragg-Donald Wald F (exact critical values under homoskedasticity)
- Kleibergen-Paap rk Wald F (robust to heteroskedasticity β preferred in practice)
- Both are reported by
estat firststageafterivregress
Section 7: ivregress and estat firststage
sysuse nlsw88, clear
* Run IV (without covariates)
ivregress 2sls wage (union = south), robust
* Get formal first-stage diagnostics
estat firststageestat firststage reports the first-stage F, the Cragg-Donald statistic, the Kleibergen-Paap statistic, and Stock-Yogo critical values.
* With covariates
ivregress 2sls wage (union = south) age ttl_exp collgrad, robust
estat firststage
* Manual verification β should match estat firststage
reg union south age ttl_exp collgrad
test southSummary Table
Situation | Command | What the F tests |
No covariates | reg D Z β header F | All regressors = instrument only β |
No covariates | test Z | Same β |
With covariates | reg D Z X1 X2 β header F | ALL regressors jointly β NOT what you want β |
With covariates | test Z after full reg | Partial F for instrument only β |
After ivregress | estat firststage | First-stage F + formal weak instrument tests β |
Always use test Z (or estat firststage) to get the first-stage F. Never report the header F when covariates are present.