In this worksheet we will work through multiple ways to create summary statistic tables in Stata. We go from the most basic approach to the most polished, publication-ready table. We use the built-in nlsw88 dataset (National Longitudinal Survey of Women, 1988). Our binary grouping variable is union (1 = union member, 0 = not).
Recall that for the homework you can build this table with excel as well, but if you wanted to learn a bit on ways of doing it with STATA, here are a couple:
Setup
Before we begin, let’s load the data and see what we’re working with.
sysuse nlsw88, clear
describe union wage age tenure ttl_exp hoursOutput:
Let’s check how our grouping variable looks:
tab union Union |
worker | Freq. Percent Cum.
------------+-----------------------------------
Nonunion | 1,417 75.45 75.45
Union | 461 24.55 100.00
------------+-----------------------------------
Total | 1,878 100.00About 75% are non-union and 25% are union. Now let’s define a global with the variables we want to summarize:
global sumvars "wage age tenure ttl_exp hours"Method 1: summarize with if (The Basics)
The simplest approach. Just run summarize twice, once for each group.
summarize $sumvars if union == 1summarize $sumvars if union == 0Pros: Very easy. No extra packages needed.
Cons: You get two separate tables. No difference or significance test. You have to eyeball everything. Not something you’d put in a paper.
Method 2: tabstat with by() (A Step Up)
tabstat lets you see both groups in one table. You can also request specific statistics.
tabstat $sumvars, by(union) stats(mean sd n) columns(statistics)You can clean it up with formatting:
tabstat $sumvars, by(union) stats(mean sd min max n) columns(statistics) format(%9.2f)Pros: Both groups in one view. You can pick your statistics (mean, sd, n, min, max, etc.).
Cons: Still no difference in means or significance test. You’d have to calculate those yourself.
Method 3: ttest (Getting the Difference + Significance)
The ttest command gives you group means, the difference, and a p-value for whether the difference is statistically significant. The downside: it only works one variable at a time.
ttest wage, by(union)To do this for all your variables, you can loop:
foreach var of global sumvars {
ttest `var', by(union)
}How to read this output: The diff row shows that non-union workers earn $1.47 less per hour on average than union workers. The two-sided p-value is 0.0000, meaning this difference is statistically significant.
Pros: You get the difference and a formal significance test with p-values and confidence intervals.
Cons: Output is verbose - one giant table per variable. Hard to present in a paper. You’d need to manually extract numbers.
Method 4: estpost tabstat + esttab (Formatted Table)
Now we start using the estout package (which contains esttab). If you haven’t installed it:
ssc install estout, replaceestpost tabstat stores the results from tabstat so that esttab can display them in a clean table.
estpost tabstat $sumvars, by(union) statistics(mean sd count) columns(statistics)
esttab ., cells("mean(fmt(2)) sd(fmt(2)) count(fmt(0))") noobs nonumber labelYou can also display the standard deviation in parentheses (a common formatting convention):
estpost tabstat $sumvars, by(union) statistics(mean sd) columns(statistics) listwise
esttab ., cells("mean(fmt(2)) sd(par fmt(2))") noobs nonumber labelPros: Clean, formatted tables. Uses variable labels. Easy to customize.
Cons: Still no difference column or significance test. But we’re getting there…
Method 5: estpost ttest + esttab (The Balance Table)
This is the most important method in this worksheet. estpost ttest runs t-tests for all your variables at once and stores everything. Then esttab can display group means, differences, standard errors, significance stars, and p-values all in one clean table.
estpost ttest $sumvars, by(union)This stores a bunch of matrices: mu_1 (group 0 means), mu_2 (group 1 means), b (difference), se (standard error), p (p-value), N_1 and N_2 (sample sizes), and more.
Version A: Means + Difference with Stars
esttab ., cells("mu_1(fmt(2)) mu_2(fmt(2)) b(fmt(2) star)") ///
wide noobs nonumber star(* 0.10 ** 0.05 *** 0.01) ///
collabels("Non-Union" "Union" "Difference") labelThis is already very useful! You can immediately see that union workers have higher wages, more tenure, more experience, and work more hours. Age is balanced (no significant difference). The stars tell you significance at a glance.
Version B: Add Standard Errors of the Difference
esttab ., cells("mu_1(fmt(2)) mu_2(fmt(2)) b(fmt(2) star) se(fmt(2) par)") ///
wide noobs nonumber star(* 0.10 ** 0.05 *** 0.01) ///
collabels("Non-Union" "Union" "Difference" "SE") labelVersion C: With p-values Instead of Stars
esttab ., cells("mu_1(fmt(2)) mu_2(fmt(2)) b(fmt(2)) p(fmt(3))") ///
wide noobs nonumber ///
collabels("Non-Union" "Union" "Difference" "p-value") labelMethod 5b: Adding Sample Sizes to the Balance Table
You might want to show the N for each group directly in the table. Good news: estpost ttest stores N_1 and N_2 as matrices (one value per variable), so you can include them directly as cells in esttab.
estpost ttest $sumvars, by(union)
esttab ., cells("mu_1(fmt(2)) N_1(fmt(0)) mu_2(fmt(2)) N_2(fmt(0)) b(fmt(2) star) se(fmt(2) par)") ///
wide noobs nonumber star(* 0.10 ** 0.05 *** 0.01) ///
collabels("Non-Union" "N" "Union" "N" "Difference" "SE") ///
labelThis is the full balance table: group means, sample sizes, difference, standard error, and significance stars all in one table.
Method 6: Export to Word (.rtf)
Once you’re happy with how the table looks, export it to an RTF file you can open in Word.
This creates a file called balance_table.rtf. Open it in Microsoft Word and it will look like a nicely formatted table. Make sure to change "/your/filepath/" to wherever you want to save the file.
You can also export to CSV (for Excel) or LaTeX (for academic papers) by changing the file extension:
* For CSV (Excel):
esttab . using "/your/filepath/balance_table.csv", replace ...
* For LaTeX:
esttab . using "/your/filepath/balance_table.tex", replace ...Method 7: iebaltab (The One-Liner Balance Table)
iebaltab is a command from the World Bank’s ietoolkit package. It was designed specifically for creating balance tables in impact evaluations. One line of code gives you everything: group means, standard errors (or SDs), sample sizes, difference in means, and significance stars.
First, install it:
ssc install ietoolkit, replaceBasic usage
iebaltab wage age tenure ttl_exp hours, grpvar(union)That single line produces a full balance table. By default it shows: N and Mean/(SE) for each group, the total N, and the mean difference with significance stars. When run, iebaltab opens the table in the data browser. The table looks like this:
Using variable labels instead of variable names
Add rowvarlabels to display the full variable labels:
iebaltab wage age tenure ttl_exp hours, grpvar(union) rowvarlabelsNow “wage” becomes “Hourly wage”, “ttl_exp” becomes “Total work experience (years)”, etc.
Showing Standard Deviations instead of Standard Errors
By default, iebaltab shows standard errors in parentheses under each mean. If you want standard deviations instead (which is more common for descriptive/summary stats), use the stats() option:
iebaltab wage age tenure ttl_exp hours, grpvar(union) stats(desc(sd)) rowvarlabelsNotice the parentheses now show SD values (e.g., 4.104 for wage) instead of SE values (0.109).
Exporting to LaTeX or CSV
You can export directly to LaTeX (for academic papers) or CSV (for Excel):
* Export to LaTeX
iebaltab wage age tenure ttl_exp hours, grpvar(union) rowvarlabels ///
savetex("/your/filepath/balance_table.tex") replace texnotewidth(1)
* Export to CSV (opens in Excel)
iebaltab wage age tenure ttl_exp hours, grpvar(union) rowvarlabels ///
savecsv("/your/filepath/balance_table.csv") replacePros: One command does everything. Built-in N per group, means, SE/SD, difference, stars. Designed for balance tables. Exports directly to LaTeX or CSV.
Cons: Requires installing ietoolkit. Less flexible than the estpost ttest + esttab approach for custom column layouts. The stats() syntax can be finicky.
Method 8: table command (Stata 17+)
Stata 17 introduced a completely revamped table command. It can compute statistics by groups and is deeply integrated with the collect framework for export.
Basic table of means by group
table union, statistic(mean wage age tenure ttl_exp hours) nformat(%9.2f)Adding standard deviations
table union, statistic(mean wage age tenure ttl_exp hours) ///
statistic(sd wage age tenure ttl_exp hours) nformat(%9.2f)This produces a wider table with both means and SDs for each variable. The output looks best in the Stata results window (the console log wraps the wide table).
Adding counts (sample sizes)
table union, statistic(mean wage age tenure ttl_exp hours) ///
statistic(count wage age tenure ttl_exp hours) ///
nformat(%9.2f mean) nformat(%9.0f count) totals(union)The totals(union) option adds a total row. The nformat() options let you format different statistics differently (2 decimals for means, 0 for counts).
Exporting the table
The table command stores its output in a collect framework, so you can export:
table union, statistic(mean wage age tenure ttl_exp hours) ///
statistic(sd wage age tenure ttl_exp hours) nformat(%9.2f)
collect export "/your/filepath/my_table.html", replaceYou can export to .html, .docx, .xlsx, .tex, or .pdf.
Pros: Built into Stata (no packages needed). Very flexible with the collect framework. Can export to many formats including Word and Excel. Clean syntax.
Cons: No built-in difference-in-means or significance test. The layout is “wide” (variables as columns, groups as rows), which is the opposite of most balance tables. Not ideal for the specific “balance table” use case.
Method 9: dtable (Stata 18)
Stata 18 introduced dtable, which is purpose-built for descriptive statistics tables. It’s the easiest built-in way to get group comparisons with a significance test.
Basic dtable by group
dtable wage age tenure ttl_exp hours, by(union) nformat(%9.2f)That’s very clean! Each cell shows mean (sd) by default. It also includes N and percentages for each group at the top.
Adding a significance test
Just add tests inside the by() option:
dtable wage age tenure ttl_exp hours, by(union, tests) nformat(%9.2f)Now you get a Test column with p-values. You can immediately see that wage, tenure, experience, and hours are significantly different between groups, while age is not (p = 0.63).
Customizing the statistics shown
You can specify exactly which statistics to display for continuous variables:
dtable wage age tenure ttl_exp hours, by(union, tests) ///
continuous(wage age tenure ttl_exp hours, stat(mean sd)) ///
nformat(%9.2f)Exporting
dtable can export directly:
dtable wage age tenure ttl_exp hours, by(union, tests) nformat(%9.2f) ///
export("/your/filepath/my_dtable.html", replace)Supported formats: .html, .docx, .xlsx, .tex, .pdf.
Pros: Built into Stata 18 (no packages). One command gives you means, SDs, N, and p-values. Clean, compact output. Easy export.
Cons: Only available in Stata 18+. Doesn’t show the raw difference in means as a separate column (only the p-value). Less customizable than estpost ttest + esttab. No significance stars (uses p-values instead).