📝

Diving into F-Statistic in IV Regression

This worksheet walks through the first-stage F-statistic — the central diagnostic for whether an instrumental variable is strong enough to use. We go from scratch: what the F-stat is, what it tests, how to compute it by hand and in Stata, and how the presence of covariates changes everything.

Dataset: nlsw88 (National Longitudinal Survey of Women, 1988), built into Stata.

Running example throughout:

Outcome Y: wage (hourly wage)
Endogenous variable D: union (union membership — workers self-select into unions, so OLS is biased)
Instrument Z: south (lives in the South — Southern states have historically lower unionization rates due to right-to-work laws)
Covariates X: age, ttl_exp, collgrad

Note on the instrument: We use south as a teaching instrument. This is probably a bad instrument, but just wanted to use a dataset that y’all ca touch!

Section 1: What Is the F-Statistic?

The F-statistic is a test statistic for a joint hypothesis — it tests whether a group of coefficients is simultaneously equal to zero. It is named after the statistician Ronald Fisher.

Technical definition: Given an unrestricted model and a restricted model (where some coefficients are forced to zero), the F-statistic measures how much the fit deteriorates when we impose the restrictions:

$F = \frac{(RSS_R - RSS_{UR})/ q}{RSS_{UR} / (n - k - 1)}$

Where:

$RSS_R$ = Residual Sum of Squares from the restricted model (without the tested variables)
$RSS_{UR}$ = Residual Sum of Squares from the unrestricted model (with the tested variables)
$q$ = number of restrictions (number of variables being tested jointly)
$n$ = sample size
$k$ = number of slope parameters in the unrestricted model (not counting the constant)
$n - k - 1$ = denominator degrees of freedom

Less-technical definition: The F-statistic is a signal-to-noise ratio. The "signal" is the share of variation in D that Z accounts for; the "noise" is the residual variation in D that Z cannot explain. A large F means Z's signal stands out clearly above the background noise. A small F means Z is essentially indistinguishable from noise.

In plain language, an F of, say, 20 means: the variation in D explained by Z is about 20 times larger than the unexplained variation per degree of freedom — so the instrument's signal is clearly not just noise.

Section 2: What Is the F-Stat Testing in IV?

Before running IV, we must run the first stage: regress the endogenous variable D on the instrument Z (and any covariates X). The first-stage F-statistic tests:

$H_0$ : The instrument(s) have no effect on the endogenous variable (coefficient on Z = 0)
$H_1$ : The instrument(s) do affect the endogenous variable

Technical interpretation: A large first-stage F means the Z-driven variation in D is large relative to the residual variation in D that Z cannot explain. This is a signal-to-noise comparison — not a statement about the size of the coefficient on Z. A small coefficient can produce a high F if Z varies widely across observations; a large coefficient can produce a low F if Z barely varies. If we fail to reject $H_0$ , Z's signal in D is swamped by noise, and the IV strategy collapses.

Non-technical interpretation: Imagine trying to pick up a radio station while there is a lot of static. The F-statistic asks: how much of the variation we observe in D comes from the clean signal of Z, compared to the random noise that Z cannot explain? A high F means Z's signal is loud and clear — it accounts for a noticeable share of D's variation relative to the leftover noise. A low F means Z's signal is barely distinguishable from static — almost all the variation in D looks like noise from Z's perspective.

Notice this is very different from asking "how big is the coefficient on Z" — the F-stat is about explained variation relative to noise, not about the size of a marginal effect.

If the instrument is weak (low F), two bad things happen:

IV estimates are biased — they drift back toward the biased OLS estimates
Standard errors explode — the estimates become very imprecise

Section 3: The Formula in Detail

$F = \frac{(RSS_R - RSS_{UR}) / q}{RSS_{UR} / (n - k - 1)}$

Symbol	Name	Meaning
$RSS_R$	Restricted RSS	Misfit when we exclude the instrument
$RSS_{UR}$	Unrestricted RSS	Misfit when we include the instrument
$RSS_R - RSS_{UR}$	Improvement in fit	How much better the model fits when Z is added
$q$	Number of restrictions	How many instruments are being tested (usually 1)
$n$	Sample size	Total observations
$k$	Slope parameters	Regressors in unrestricted model (not counting the constant)
$n - k - 1$	Denominator df	Observations "left over" after estimating all parameters

A Brief Detour: What Are Degrees of Freedom?

Every time you estimate a parameter, you "use up" one observation. If you have $n = 100$ observations and estimate 3 parameters (intercept + two slopes), you have $100 - 3 = 97$ degrees of freedom remaining.

Technical: Dividing SSR by its degrees of freedom gives an unbiased estimate of $\sigma^2$ . If we divided by $n$ instead of $n-k-1$ , we would systematically underestimate $\sigma^2$ .

Non-technical: Think of degrees of freedom as "free choices." If you have 100 exam scores and are told the mean is 75, you can freely choose 99 of them — the 100th is determined, like it has to be something specific because they mean is 75 and you’ve picked the other numbers so the last number has to help you make the mean 75, you have 99 free choices but the last ain’t one. You've used up one degree of freedom to estimate the mean. Each additional parameter costs one more degree of freedom.

In the F-statistic:

Numerator df = $q$ (how many restrictions we're testing; 1 for a single instrument)
Denominator df = $n - k - 1$ (observations minus parameters in the unrestricted first stage)

Special Case: One Instrument — $F = t^2$

When there is a single instrument, there is a beautiful shortcut:

$\boxed{F = t^2}$

where $t$ is the t-statistic on the instrument in the first-stage regression. This holds because $F(1, m) = [t(m)]^2$ . You can simply read the t-stat off the Stata table and square it.

Section 4: Without Covariates — The Simple Case

sysuse nlsw88, clear

* First stage: does living in the South predict union membership?
reg union south

Focus on two parts of the output:

Top-right corner: F(1, 1876) = [A] — this is the first-stage F
Coefficient table: the t-statistic on south
Verify: $F = t^2$

Computing F "By Hand"

Method 1: From the t-statistic

reg union south

scalar t_south = _b[south] / _se[south]
scalar F_hand  = t_south^2
display "F from t-squared:      " F_hand
display "F shown in header:     " e(F)

Method 2: From the reg components

reg union south

* e(mss)  = model sum of squares
* e(rss)  = residual sum of squares
* e(df_m) = numerator df (= q = 1)
* e(df_r) = denominator df (= n - 2)

scalar F_anova = (e(mss) / e(df_m)) / (e(rss) / e(df_r))
display "F from ANOVA:    " F_anova
display "F from header:   " e(F)

Using the `test` Command

reg union south
test south

Output:

( 1)  south = 0

      F(  1,  1876) =    [same as header]
           Prob > F =    [p-value]

‣

Question 1: Run reg union south. What is the t-stat on south? What is t²? Does it match F(1,1876) in the header and test south?

All three should be equal: the F in the header, $t^2$ , and the F from test south. This equality holds because:

The header F tests $H_0$ : all slope coefficients = 0. With only one slope (south), this is exactly the test on south.
test south states this explicitly.
$F = t^2$ is the algebraic identity between F and t for a single restriction.

With one instrument and no covariates, all three give you the same number.

‣

Question 2: What does the p-value on the F-statistic mean?

The p-value is the probability of observing an F this large (or larger) if $H_0$ were true (i.e., if south had no effect on union). A p-value near zero means: if south truly had no effect, we'd almost never see an F this large by chance. We reject $H_0$ and conclude that south is a relevant predictor of union status.

Section 5: With Covariates — The Critical Difference

* First stage with covariates
reg union south age ttl_exp collgrad

The Trap: Two Different F-Statistics

The Stata output again shows an F-statistic in the header. But now this F tests:

$H_0: south = age = ttl_exp = collgrad = 0$ (ALL slopes jointly zero)

That is not what we want. We want: Does south specifically predict union, after controlling for age, experience, and education?

This is the partial F-statistic for the instrument.

Why Does the Partial F Differ?

Model	Regressors included
Unrestricted	south, age, ttl_exp, collgrad
Restricted (partial F)	age, ttl_exp, collgrad (dropping only south)
Restricted (overall F)	(nothing — just the constant)

Imagine judging whether a new ingredient improves a recipe that already has five good ingredients. The overall F is like asking "Is this dish better than plain bread?" Of course it is. The partial F asks: "Does adding this one new ingredient improve the already-good dish?" The partial question is more relevant for assessing whether the instrument is doing work.

Degrees of Freedom Change

With covariates, the unrestricted first stage has 5 parameters (intercept + 4 slopes), so denominator df = $n - 5$ , not $n - 2$ .

quietly reg union south age ttl_exp collgrad
display "Denominator df with covariates: " e(df_r)

The Correct Command: `test south`

reg union south age ttl_exp collgrad
test south

Output:

( 1)  south = 0

      F(  1,  [n-5]) =    [F_partial]
           Prob > F   =    [p]

This is the number to report as your first-stage F-statistic.

Does $F = t^2$ Still Hold?

Yes — but now $t$ is the t-statistic on south from the full regression with covariates.

reg union south age ttl_exp collgrad

scalar t_south_full = _b[south] / _se[south]
scalar F_partial    = t_south_full^2

display "Partial F from t²:   " F_partial
test south
* Compare to the F shown by test south

‣

Question 3: Compare (a) the header F, (b) t² on south, and (c) F from test south. Which ones match?

(b) and (c) match: $t^2_{\text{south}} = F$ from test south. Both measure the partial contribution of south.

(a) does NOT match — the header F tests all regressors jointly and will typically be much larger.

Always use test south to get the first-stage F when covariates are present.

Section 6: The Rule of Thumb — F > 10

Stock and Yogo (2005) derived critical values for the first-stage F to ensure IV estimates are not severely biased relative to OLS.

F-statistic	Interpretation
F > 10	Instrument is "strong" — safe to proceed
5 < F < 10	Weak instrument — interpret with caution
F < 5	Very weak — IV estimates may be more biased than OLS

Non-technical: Think of Z as a spotlight on D. A strong spotlight (high F) illuminates a clear share of D's variation that can be credibly attributed to Z, standing out above the surrounding noise. A dim spotlight (low F) can barely be distinguished from ambient light — Z's contribution to D's variation is swamped by everything else. When the spotlight is too dim, any IV estimate that relies on it becomes unreliable.

In plain language: an F of 20 means the signal from Z is roughly 20 times bigger than the noise — Z clearly explains a meaningful share of D's variation.

A weak instrument causes:

Large IV standard errors (little variation to work with)
IV estimates biased toward OLS (defeats the purpose)
Confidence intervals with poor coverage

Modern alternatives to F > 10:

Cragg-Donald Wald F (exact critical values under homoskedasticity)
Kleibergen-Paap rk Wald F (robust to heteroskedasticity — preferred in practice)
Both are reported by estat firststage after ivregress

Section 7: `ivregress` and `estat firststage`

sysuse nlsw88, clear

* Run IV (without covariates)
ivregress 2sls wage (union = south), robust

* Get formal first-stage diagnostics
estat firststage

estat firststage reports the first-stage F, the Cragg-Donald statistic, the Kleibergen-Paap statistic, and Stock-Yogo critical values.

* With covariates
ivregress 2sls wage (union = south) age ttl_exp collgrad, robust
estat firststage

* Manual verification — should match estat firststage
reg union south age ttl_exp collgrad
test south

‣

Question 4: After estat firststage, is south a strong instrument by the F > 10 rule?

Summary Table

Situation	Command	What the F tests
No covariates	`reg D Z` → header F	All regressors = instrument only ✓
No covariates	`test Z`	Same ✓
With covariates	`reg D Z X1 X2` → header F	ALL regressors jointly — NOT what you want ✗
With covariates	`test Z` after full `reg`	Partial F for instrument only ✓
After ivregress	`estat firststage`	First-stage F + formal weak instrument tests ✓

Always use test Z (or estat firststage) to get the first-stage F. Never report the header F when covariates are present.