💻

STATA: A Beginner’s Guide to Esttab

In this worksheet we will work through how to create tables in Stata. We will be using esttab commands.

Know that you can find more info in the following links:

Creating Publication-Quality Tables in Stata

sscc.wisc.edu

Intro to making Publication Style Tables Using Esttab

dariotoman.com

estout - Making Regression Tables in Stata

repec.org

You’ll first want to download the esttab commands, if you haven’t already. “estout” contains many commands, like the command estout, and the command esttab.

ssc install estout, replace

Let’s work with the following dataset

dataset_2.13.dta93.4KB

Before we start adding regressions to our esttab, we’ll want to clear anything we’ve “Stored” as estimates. This will get rid of anything we had saved to our esttab previously.

estimates clear

Now let’s add some regressions! We’ll start by adding a simple linear regression of total work experience (in years) on hourly wage. (This regression shows us the association/effect of work experience on wage). To add this regression to our esttab, you’ll add “eststo:” before your regression code. Then to produce the table, you just want to run esttab.

eststo: reg wage ttl_exp
esttab
----------------------------
                      (1)   
                     wage   
----------------------------
ttl_exp             0.331***
                  (13.04)   

_cons               3.612***
                  (10.65)   
----------------------------
N                    2246   
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We now will look at this regression with controls. Let’s add age and race as control variables, because we think that, by omitting these variables, they are would be adding bias to our estimates. Let’s add this regression to our esttab. Notice that because we haven’t typed “estimates clear” when we type esttab, the new regression and the old regression appear in the same table

eststo: reg wage ttl_exp age race

esttab
--------------------------------------------
                      (1)             (2)   
                     wage            wage   
--------------------------------------------
ttl_exp             0.331***        0.346***
                  (13.04)         (13.49)   

age                                -0.138***
                                  (-3.57)   

race                               -1.389***
                                  (-5.21)   

_cons               3.612***        9.187***
                  (10.65)          (6.05)   
--------------------------------------------
N                    2246            2220   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Next, we want to see how this effect of total work experience on hourly wage differs between those in a union and those not in a union. So, we’ll add an interaction term to this regression, and then add this regression to our esttab.

generate interact = ttl_exp*union

eststo: reg wage ttl_exp age race union interact

esttab
------------------------------------------------------------
                      (1)             (2)             (3)   
                     wage            wage            wage   
------------------------------------------------------------
ttl_exp             0.331***        0.346***        0.342***
                  (13.04)         (13.49)         (15.55)   

age                                -0.138***      -0.0609*  
                                  (-3.57)         (-2.08)   

race                               -1.389***       -1.323***
                                  (-5.21)         (-6.66)   

union                                               2.040** 
                                                   (3.25)   

interact                                          -0.0467   
                                                  (-1.03)   

_cons               3.612***        9.187***        5.559***
                  (10.65)          (6.05)          (4.79)   
------------------------------------------------------------
N                    2246            2220            1854   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We’re also interested in the effect of tenure in a position on wages, but we think this is a quadratic relationship (non-linear, or the effect of tenure in a position on wages changes depending on the level of tenure). So, we’ll generate a squared term of tenure, and then run a regression with a squared term, and add this regression to our esttab.

generate tenure_sq = tenure^2
eststo: reg wage tenure tenure_sq
esttab
----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)   
                     wage            wage            wage            wage   
----------------------------------------------------------------------------
ttl_exp             0.331***        0.346***        0.342***                
                  (13.04)         (13.49)         (15.55)                   

age                                -0.138***      -0.0609*                  
                                  (-3.57)         (-2.08)                   

race                               -1.389***       -1.323***                
                                  (-5.21)         (-6.66)                   

union                                               2.040**                 
                                                   (3.25)                   

interact                                          -0.0467                   
                                                  (-1.03)                   

tenure                                                              0.344***
                                                                   (4.84)   

tenure_sq                                                        -0.00891*  
                                                                  (-2.34)   

_cons               3.612***        9.187***        5.559***        6.327***
                  (10.65)          (6.05)          (4.79)         (27.14)   
----------------------------------------------------------------------------
N                    2246            2220            1854            2231   
----------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We realize that we aren’t sure if the relationship between tenure and wages is quadratic or exponential. We’ll run a new regression but instead of a quadratic term we will use the natural log of tenure. First, we’ll generate the new ln(tenure) term and run a regression of wage on ln(tenure)

generate ln_tenure = ln(tenure)
eststo: reg wage ln_tenure
esttab
--------------------------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)             (5)   
                     wage            wage            wage            wage            wage   
--------------------------------------------------------------------------------------------
ttl_exp             0.331***        0.346***        0.342***                                
                  (13.04)         (13.49)         (15.55)                                   

age                                -0.138***      -0.0609*                                  
                                  (-3.57)         (-2.08)                                   

race                               -1.389***       -1.323***                                
                                  (-5.21)         (-6.66)                                   

union                                               2.040**                                 
                                                   (3.25)                                   

interact                                          -0.0467                                   
                                                  (-1.03)                                   

tenure                                                              0.344***                
                                                                   (4.84)                   

tenure_sq                                                        -0.00891*                  
                                                                  (-2.34)                   

ln_tenure                                                                           0.970***
                                                                                   (9.53)   

_cons               3.612***        9.187***        5.559***        6.327***        6.562***
                  (10.65)          (6.05)          (4.79)         (27.14)         (36.87)   
--------------------------------------------------------------------------------------------
N                    2246            2220            1854            2231            2180   
--------------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Now let’s edit our table to ensure it’s displaying everything we are interested in. You can edit the table using options. To figure out which options are available and how to add them, use “help esttab.” Lets display the standard errors in parenthesis under the coefficient, instead of “t-statistics”. Notice the note at the end of the table.

esttab, se
--------------------------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)             (5)   
                     wage            wage            wage            wage            wage   
--------------------------------------------------------------------------------------------
ttl_exp             0.331***        0.346***        0.342***                                
                 (0.0254)        (0.0257)        (0.0220)                                   

age                                -0.138***      -0.0609*                                  
                                 (0.0387)        (0.0292)                                   

race                               -1.389***       -1.323***                                
                                  (0.267)         (0.199)                                   

union                                               2.040**                                 
                                                  (0.628)                                   

interact                                          -0.0467                                   
                                                 (0.0452)                                   

tenure                                                              0.344***                
                                                                 (0.0709)                   

tenure_sq                                                        -0.00891*                  
                                                                (0.00381)                   

ln_tenure                                                                           0.970***
                                                                                  (0.102)   

_cons               3.612***        9.187***        5.559***        6.327***        6.562***
                  (0.339)         (1.518)         (1.159)         (0.233)         (0.178)   
--------------------------------------------------------------------------------------------
N                    2246            2220            1854            2231            2180   
--------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Let’s also change the column names to represent what each regression showing. This is a bit challenging, and it isn’t clear how to do this using “help esttab.” However, a key coding skill is being able to google until we find some sample code that helps us achieve what we want. Go ahead and try to figure this out yourself! When we say google we mean not just google it also means searching within twitter or youtube.

esttab, se mtitle ("Basic" "Controls" "Interaction" "Quadratic" "Linear-Log")

--------------------------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)             (5)   
                    Basic        Controls     Interaction       Quadratic      Linear-Log   
--------------------------------------------------------------------------------------------
ttl_exp             0.331***        0.346***        0.342***                                
                 (0.0254)        (0.0257)        (0.0220)                                   

age                                -0.138***      -0.0609*                                  
                                 (0.0387)        (0.0292)                                   

race                               -1.389***       -1.323***                                
                                  (0.267)         (0.199)                                   

union                                               2.040**                                 
                                                  (0.628)                                   

interact                                          -0.0467                                   
                                                 (0.0452)                                   

tenure                                                              0.344***                
                                                                 (0.0709)                   

tenure_sq                                                        -0.00891*                  
                                                                (0.00381)                   

ln_tenure                                                                           0.970***
                                                                                  (0.102)   

_cons               3.612***        9.187***        5.559***        6.327***        6.562***
                  (0.339)         (1.518)         (1.159)         (0.233)         (0.178)   
--------------------------------------------------------------------------------------------
N                    2246            2220            1854            2231            2180   
--------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Now say that you want to add the mean of the Y variable in a row at the bottom. We do this by collecting the mean from each regression. For this we’ll start with a fresh new table

estimates clear 
eststo: reg wage ttl_exp
estadd ysumm 
eststo: reg wage ttl_exp age race
estadd ysumm 

esttab, se stats(N r2 ymean)
--------------------------------------------
                      (1)             (2)   
                     wage            wage   
--------------------------------------------
ttl_exp             0.331***        0.346***
                 (0.0254)        (0.0257)   

age                                -0.138***
                                 (0.0387)   

race                               -1.389***
                                  (0.267)   

_cons               3.612***        9.187***
                  (0.339)         (1.518)   
--------------------------------------------
N                    2246            2220   
r2                 0.0705          0.0855   
ymean               7.767           7.758   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

* We can then change the titles of N, and r2 and ymean 

esttab, se stats(N r2 ymean, label("n" "R-Squared" "Pre-mean of Y"))
--------------------------------------------
                      (1)             (2)   
                     wage            wage   
--------------------------------------------
ttl_exp             0.331***        0.346***
                 (0.0254)        (0.0257)   

age                                -0.138***
                                 (0.0387)   

race                               -1.389***
                                  (0.267)   

_cons               3.612***        9.187***
                  (0.339)         (1.518)   
--------------------------------------------
n                    2246            2220   
R-Squared          0.0705          0.0855   
Pre-mean o~Y        7.767           7.758   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

* We can also use the variable labels instead of the variables name
* by adding the options "label" after the comma.
esttab, se label stats(N r2 ymean, label("N" "R-Squared" "Pre-mean of Y"))
----------------------------------------------------
                              (1)             (2)   
                      Hourly wage     Hourly wage   
----------------------------------------------------
Total work experie~)        0.331***        0.346***
                         (0.0254)        (0.0257)   

Age in current year                        -0.138***
                                         (0.0387)   

Race                                       -1.389***
                                          (0.267)   

Constant                    3.612***        9.187***
                          (0.339)         (1.518)   
----------------------------------------------------
N                            2246            2220   
R-Squared                  0.0705          0.0855   
Pre-mean of Y               7.767           7.758   
----------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

* if you wanted a different lable, you can change the label for the variable
* or you can change it in esttab

* For example you want the variable age to be label "Age" and not "Age in current year"
* For this you use the command coeflabels

esttab, se label coeflabels(age "Age") stats(N r2 ymean, label("N" "R-Squared" "Pre-mean of Y"))

----------------------------------------------------
                              (1)             (2)   
                      Hourly wage     Hourly wage   
----------------------------------------------------
Total work experie~)        0.331***        0.346***
                         (0.0254)        (0.0257)   

Age                                        -0.138***
                                         (0.0387)   

Race                                       -1.389***
                                          (0.267)   

Constant                    3.612***        9.187***
                          (0.339)         (1.518)   
----------------------------------------------------
N                            2246            2220   
R-Squared                  0.0705          0.0855   
Pre-mean of Y               7.767           7.758   
----------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

* Let's say you only want to keep the coefficients from experience
* and not the other coefficients. For this you use the options "keep"

esttab, keep(ttl_exp) se label coeflabels(ttl_exp "Years of Experience") stats(N r2 ymean, label("N" "R-Squared" "Pre-mean of Y"))

----------------------------------------------------
                              (1)             (2)   
                      Hourly wage     Hourly wage   
----------------------------------------------------
Years of Experience         0.331***        0.346***
                         (0.0254)        (0.0257)   
----------------------------------------------------
N                            2246            2220   
R-Squared                  0.0705          0.0855   
Pre-mean of Y               7.767           7.758   
----------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Now that we have everything we need for our table, and we confirmed our table looks the way we want in STATA, let’s export it to a word document. To do this, we will save the table in some location as an rtf file. Then, we can open it in word document. Make sure you use “replace” to replace any old files with that name. Then open that file that you just created. If you use a Mac, you should open the RTF with word rather than the default textedit program.

esttab using "filepath", keep(ttl_exp) se label coeflabels(ttl_exp "Years of Experience") stats(N r2 ymean, label("N" "R-Squared" "Pre-mean of Y"))

esttab using "/Users/laptop/Downloads/test.rtf", replace keep(ttl_exp) se label coeflabels(ttl_exp "Years of Experience") stats(N r2 ymean, label("N" "R-Squared" "Pre-mean of Y"))

If you want even more advanced stuff, and to use LaTex, then check out this guide:

The Stata-to-LaTeX guide

The guide provides a set of templates for exporting tables from Stata to LaTeX. The LaTeX code is provided in a shared Overleaf document.

medium.com

What if you wanted one title for each colum?

sysuse auto 
estimates clear
eststo: reg price mpg
eststo: reg price mpg weight
eststo: reg trunk mpg
eststo: reg trunk weight
esttab, se mgroups("Price" "Trunk", pattern(1 0 1 0)) nomtitle

Not that this may create group column titles that are not “centered” when you open them in word. There is a fix for this if you were to use LaTex, but currently no fix in word (that I know of!).