Sebastian Tello
  • Home
  • CV
  • Contact
  • Research
  • Resources
  • RMDA
  • APP
πŸ“

Diving into F-Statistic in IV Regression

This worksheet walks through the first-stage F-statistic β€” the central diagnostic for whether an instrumental variable is strong enough to use. We go from scratch: what the F-stat is, what it tests, how to compute it by hand and in Stata, and how the presence of covariates changes everything.

Dataset: nlsw88 (National Longitudinal Survey of Women, 1988), built into Stata.

Running example throughout:

  • Outcome Y: wage (hourly wage)
  • Endogenous variable D: union (union membership β€” workers self-select into unions, so OLS is biased)
  • Instrument Z: south (lives in the South β€” Southern states have historically lower unionization rates due to right-to-work laws)
  • Covariates X: age, ttl_exp, collgrad
Note on the instrument: We use south as a teaching instrument. This is probably a bad instrument, but just wanted to use a dataset that y’all ca touch!

Section 1: What Is the F-Statistic?

The F-statistic is a test statistic for a joint hypothesis β€” it tests whether a group of coefficients is simultaneously equal to zero. It is named after the statistician Ronald Fisher.

Technical definition: Given an unrestricted model and a restricted model (where some coefficients are forced to zero), the F-statistic measures how much the fit deteriorates when we impose the restrictions:

F=(RSSRβˆ’RSSUR)/qRSSUR/(nβˆ’kβˆ’1)F = \frac{(RSS_R - RSS_{UR})/ q}{RSS_{UR} / (n - k - 1)}F=RSSUR​/(nβˆ’kβˆ’1)(RSSRβ€‹βˆ’RSSUR​)/q​

Where:

  • RSSRRSS_RRSSR​ = Residual Sum of Squares from the restricted model (without the tested variables)
  • RSSURRSS_{UR}RSSUR​ = Residual Sum of Squares from the unrestricted model (with the tested variables)
  • qqq = number of restrictions (number of variables being tested jointly)
  • nnn = sample size
  • kkk = number of slope parameters in the unrestricted model (not counting the constant)
  • nβˆ’kβˆ’1n - k - 1nβˆ’kβˆ’1 = denominator degrees of freedom

Less-technical definition: The F-statistic is a signal-to-noise ratio. The "signal" is the share of variation in D that Z accounts for; the "noise" is the residual variation in D that Z cannot explain. A large F means Z's signal stands out clearly above the background noise. A small F means Z is essentially indistinguishable from noise.

In plain language, an F of, say, 20 means: the variation in D explained by Z is about 20 times larger than the unexplained variation per degree of freedom β€” so the instrument's signal is clearly not just noise.

Section 2: What Is the F-Stat Testing in IV?

Before running IV, we must run the first stage: regress the endogenous variable D on the instrument Z (and any covariates X). The first-stage F-statistic tests:

  • H0H_0H0​: The instrument(s) have no effect on the endogenous variable (coefficient on Z = 0)
  • H1H_1H1​: The instrument(s) do affect the endogenous variable

Technical interpretation: A large first-stage F means the Z-driven variation in D is large relative to the residual variation in D that Z cannot explain. This is a signal-to-noise comparison β€” not a statement about the size of the coefficient on Z. A small coefficient can produce a high F if Z varies widely across observations; a large coefficient can produce a low F if Z barely varies. If we fail to reject H0H_0H0​, Z's signal in D is swamped by noise, and the IV strategy collapses.

Non-technical interpretation: Imagine trying to pick up a radio station while there is a lot of static. The F-statistic asks: how much of the variation we observe in D comes from the clean signal of Z, compared to the random noise that Z cannot explain? A high F means Z's signal is loud and clear β€” it accounts for a noticeable share of D's variation relative to the leftover noise. A low F means Z's signal is barely distinguishable from static β€” almost all the variation in D looks like noise from Z's perspective.

Notice this is very different from asking "how big is the coefficient on Z" β€” the F-stat is about explained variation relative to noise, not about the size of a marginal effect.

If the instrument is weak (low F), two bad things happen:

  1. IV estimates are biased β€” they drift back toward the biased OLS estimates
  2. Standard errors explode β€” the estimates become very imprecise

Section 3: The Formula in Detail

F=(RSSRβˆ’RSSUR)/qRSSUR/(nβˆ’kβˆ’1)F = \frac{(RSS_R - RSS_{UR}) / q}{RSS_{UR} / (n - k - 1)}F=RSSUR​/(nβˆ’kβˆ’1)(RSSRβ€‹βˆ’RSSUR​)/q​

Symbol
Name
Meaning
RSSRRSS_RRSSR​
Restricted RSS
Misfit when we exclude the instrument
RSSURRSS_{UR}RSSUR​
Unrestricted RSS
Misfit when we include the instrument
RSSRβˆ’RSSURRSS_R - RSS_{UR}RSSRβ€‹βˆ’RSSUR​
Improvement in fit
How much better the model fits when Z is added
qqq
Number of restrictions
How many instruments are being tested (usually 1)
nnn
Sample size
Total observations
kkk
Slope parameters
Regressors in unrestricted model (not counting the constant)
nβˆ’kβˆ’1n - k - 1nβˆ’kβˆ’1
Denominator df
Observations "left over" after estimating all parameters

A Brief Detour: What Are Degrees of Freedom?

Every time you estimate a parameter, you "use up" one observation. If you have n=100n = 100n=100 observations and estimate 3 parameters (intercept + two slopes), you have 100βˆ’3=97100 - 3 = 97100βˆ’3=97 degrees of freedom remaining.

Technical: Dividing SSR by its degrees of freedom gives an unbiased estimate of Οƒ2\sigma^2Οƒ2. If we divided by nnn instead of nβˆ’kβˆ’1n-k-1nβˆ’kβˆ’1, we would systematically underestimate Οƒ2\sigma^2Οƒ2.

Non-technical: Think of degrees of freedom as "free choices." If you have 100 exam scores and are told the mean is 75, you can freely choose 99 of them β€” the 100th is determined, like it has to be something specific because they mean is 75 and you’ve picked the other numbers so the last number has to help you make the mean 75, you have 99 free choices but the last ain’t one. You've used up one degree of freedom to estimate the mean. Each additional parameter costs one more degree of freedom.

In the F-statistic:

  • Numerator df = qqq (how many restrictions we're testing; 1 for a single instrument)
  • Denominator df = nβˆ’kβˆ’1n - k - 1nβˆ’kβˆ’1 (observations minus parameters in the unrestricted first stage)

Special Case: One Instrument β€” F=t2F = t^2F=t2

When there is a single instrument, there is a beautiful shortcut:

F=t2\boxed{F = t^2}F=t2​

where ttt is the t-statistic on the instrument in the first-stage regression. This holds because F(1,m)=[t(m)]2F(1, m) = [t(m)]^2F(1,m)=[t(m)]2. You can simply read the t-stat off the Stata table and square it.

Section 4: Without Covariates β€” The Simple Case

sysuse nlsw88, clear

* First stage: does living in the South predict union membership?
reg union south

Focus on two parts of the output:

  1. Top-right corner: F(1, 1876) = [A] β€” this is the first-stage F
  2. Coefficient table: the t-statistic on south
  3. Verify: F=t2F = t^2F=t2

Computing F "By Hand"

Method 1: From the t-statistic

reg union south

scalar t_south = _b[south] / _se[south]
scalar F_hand  = t_south^2
display "F from t-squared:      " F_hand
display "F shown in header:     " e(F)

Method 2: From the reg components

reg union south

* e(mss)  = model sum of squares
* e(rss)  = residual sum of squares
* e(df_m) = numerator df (= q = 1)
* e(df_r) = denominator df (= n - 2)

scalar F_anova = (e(mss) / e(df_m)) / (e(rss) / e(df_r))
display "F from ANOVA:    " F_anova
display "F from header:   " e(F)

Using the test Command

reg union south
test south

Output:

( 1)  south = 0

      F(  1,  1876) =    [same as header]
           Prob > F =    [p-value]
β€£
Question 1: Run reg union south. What is the t-stat on south? What is tΒ²? Does it match F(1,1876) in the header and test south?
β€£
Question 2: What does the p-value on the F-statistic mean?

Section 5: With Covariates β€” The Critical Difference

* First stage with covariates
reg union south age ttl_exp collgrad

The Trap: Two Different F-Statistics

The Stata output again shows an F-statistic in the header. But now this F tests:

H0:south=age=ttlexp=collgrad=0H_0: south = age = ttl_exp = collgrad = 0H0​:south=age=ttle​xp=collgrad=0 (ALL slopes jointly zero)

That is not what we want. We want: Does south specifically predict union, after controlling for age, experience, and education?

This is the partial F-statistic for the instrument.

Why Does the Partial F Differ?

Model
Regressors included
Unrestricted
south, age, ttl_exp, collgrad
Restricted (partial F)
age, ttl_exp, collgrad (dropping only south)
Restricted (overall F)
(nothing β€” just the constant)

Imagine judging whether a new ingredient improves a recipe that already has five good ingredients. The overall F is like asking "Is this dish better than plain bread?" Of course it is. The partial F asks: "Does adding this one new ingredient improve the already-good dish?" The partial question is more relevant for assessing whether the instrument is doing work.

Degrees of Freedom Change

With covariates, the unrestricted first stage has 5 parameters (intercept + 4 slopes), so denominator df = nβˆ’5n - 5nβˆ’5, not nβˆ’2n - 2nβˆ’2.

quietly reg union south age ttl_exp collgrad
display "Denominator df with covariates: " e(df_r)

The Correct Command: test south

reg union south age ttl_exp collgrad
test south

Output:

( 1)  south = 0

      F(  1,  [n-5]) =    [F_partial]
           Prob > F   =    [p]

This is the number to report as your first-stage F-statistic.

Does F=t2F = t^2F=t2 Still Hold?

Yes β€” but now ttt is the t-statistic on south from the full regression with covariates.

reg union south age ttl_exp collgrad

scalar t_south_full = _b[south] / _se[south]
scalar F_partial    = t_south_full^2

display "Partial F from tΒ²:   " F_partial
test south
* Compare to the F shown by test south
β€£
Question 3: Compare (a) the header F, (b) tΒ² on south, and (c) F from test south. Which ones match?

Section 6: The Rule of Thumb β€” F > 10

Stock and Yogo (2005) derived critical values for the first-stage F to ensure IV estimates are not severely biased relative to OLS.

F-statistic
Interpretation
F > 10
Instrument is "strong" β€” safe to proceed
5 < F < 10
Weak instrument β€” interpret with caution
F < 5
Very weak β€” IV estimates may be more biased than OLS

Non-technical: Think of Z as a spotlight on D. A strong spotlight (high F) illuminates a clear share of D's variation that can be credibly attributed to Z, standing out above the surrounding noise. A dim spotlight (low F) can barely be distinguished from ambient light β€” Z's contribution to D's variation is swamped by everything else. When the spotlight is too dim, any IV estimate that relies on it becomes unreliable.

In plain language: an F of 20 means the signal from Z is roughly 20 times bigger than the noise β€” Z clearly explains a meaningful share of D's variation.

A weak instrument causes:

  1. Large IV standard errors (little variation to work with)
  2. IV estimates biased toward OLS (defeats the purpose)
  3. Confidence intervals with poor coverage

Modern alternatives to F > 10:

  • Cragg-Donald Wald F (exact critical values under homoskedasticity)
  • Kleibergen-Paap rk Wald F (robust to heteroskedasticity β€” preferred in practice)
  • Both are reported by estat firststage after ivregress

Section 7: ivregress and estat firststage

sysuse nlsw88, clear

* Run IV (without covariates)
ivregress 2sls wage (union = south), robust

* Get formal first-stage diagnostics
estat firststage

estat firststage reports the first-stage F, the Cragg-Donald statistic, the Kleibergen-Paap statistic, and Stock-Yogo critical values.

* With covariates
ivregress 2sls wage (union = south) age ttl_exp collgrad, robust
estat firststage

* Manual verification β€” should match estat firststage
reg union south age ttl_exp collgrad
test south
β€£
Question 4: After estat firststage, is south a strong instrument by the F > 10 rule?

Summary Table

Situation
Command
What the F tests
No covariates
reg D Z β†’ header F
All regressors = instrument only βœ“
No covariates
test Z
Same βœ“
With covariates
reg D Z X1 X2 β†’ header F
ALL regressors jointly β€” NOT what you want βœ—
With covariates
test Z after full reg
Partial F for instrument only βœ“
After ivregress
estat firststage
First-stage F + formal weak instrument tests βœ“

Always use test Z (or estat firststage) to get the first-stage F. Never report the header F when covariates are present.

InstagramBluesky