How to generate publication quality tables#
Estimagic can create publication quality tables of parameter estimates in LaTeX or HTML. It works with the results from estimate_ml
and estimate_msm
but also supports statsmodels results out of the box.
You can get almost limitless flexibility if you split the table generation into two steps. The fist generates a DataFrame which you can customize to your liking, the second renders that DataFrame in LaTeX or HTML. If you are interested in this feature, search for “render_inputs” below.
# Make necessary imports
import estimagic as em
import pandas as pd
import statsmodels.formula.api as sm
from estimagic.config import EXAMPLE_DIR
from IPython.core.display import HTML
Create tables from statsmodels results#
df = pd.read_csv(EXAMPLE_DIR / "diabetes.csv", index_col=0)
mod1 = sm.ols("target ~ Age + Sex", data=df).fit()
mod2 = sm.ols("target ~ Age + Sex + BMI + ABP", data=df).fit()
models = [mod1, mod2]
HTML(em.estimation_table(models, return_type="html"))
target | ||
---|---|---|
(1) | (2) | |
Intercept | 152.00$^{*** }$ | 152.00$^{*** }$ |
(3.61) | (2.85) | |
Age | 301.00$^{*** }$ | 37.20$^{ }$ |
(77.10) | (64.10) | |
Sex | 17.40$^{ }$ | -107.00$^{* }$ |
(77.10) | (62.10) | |
BMI | 787.00$^{*** }$ | |
(65.40) | ||
ABP | 417.00$^{*** }$ | |
(69.50) | ||
Observations | 442 | 442 |
R$^2$ | 0.04 | 0.40 |
Adj. R$^2$ | 0.03 | 0.40 |
Residual Std. Error | 75.90 | 60 |
F Statistic | 8.06$^{***}$ | 72.90$^{***}$ |
Note: | ***p<0.01; **p<0.05; *p<0.1 |
Adding estimagic results#
estimate_ml
and estimate_msm
can both generate summaries of estimation results. Those summaries are either DataFrames with the columns "value"
, "standard_error"
, "p_value"
and "stars"
or pytrees containing such DataFrames.
For examples, check out our tutorials on estimate_ml
and estimate_msm
.
Assume we got the following DataFrame from an estimation summary:
params = pd.DataFrame(
{
"value": [142.123, 51.456, -33.789],
"standard_error": [3.1415, 2.71828, 1.6180],
"p_value": [1e-8] * 3,
},
index=["Intercept", "Age", "Sex"],
)
params
value | standard_error | p_value | |
---|---|---|---|
Intercept | 142.123 | 3.14150 | 1.000000e-08 |
Age | 51.456 | 2.71828 | 1.000000e-08 |
Sex | -33.789 | 1.61800 | 1.000000e-08 |
You can either use just the params DataFrame or a dictionary containing “params” and additional information in estimation_table
.
mod3 = {"params": params, "name": "target", "info": {"n_obs": 445}}
models = [mod1, mod2, mod3]
HTML(em.estimation_table(models, return_type="html"))
target | |||
---|---|---|---|
(1) | (2) | (3) | |
Intercept | 152.00$^{*** }$ | 152.00$^{*** }$ | 142.00$^{*** }$ |
(3.61) | (2.85) | (3.14) | |
Age | 301.00$^{*** }$ | 37.20$^{ }$ | 51.50$^{*** }$ |
(77.10) | (64.10) | (2.72) | |
Sex | 17.40$^{ }$ | -107.00$^{* }$ | -33.80$^{*** }$ |
(77.10) | (62.10) | (1.62) | |
BMI | 787.00$^{*** }$ | ||
(65.40) | |||
ABP | 417.00$^{*** }$ | ||
(69.50) | |||
Observations | 442 | 442 | 445 |
R$^2$ | 0.04 | 0.40 | |
Adj. R$^2$ | 0.03 | 0.40 | |
Residual Std. Error | 75.90 | 60 | |
F Statistic | 8.06$^{***}$ | 72.90$^{***}$ | |
Note: | ***p<0.01; **p<0.05; *p<0.1 |
Selecting the right return_type#
The following return types are supported:
"latex"
: Returns a string that you can save and import into a LaTeX document"html"
: Returns a string that you can save and import into a HTML document."render_inputs"
: Returns a dictionary with the following entries:"body"
: A DataFrame containing the main table"footer"
: A DataFrame containing the statisicsother stuff that you should ignore
"dataframe"
: Returns a DataFrame you can look at in a notebook
Use render_inputs
for maximum flexibility#
As an example, let’s assume we want to remove a few rows from the footer.
Let’s first look at the footer we get from estimation_table
render_inputs = em.estimation_table(models, return_type="render_inputs")
footer = render_inputs["footer"]
footer
target | |||
---|---|---|---|
(1) | (2) | (3) | |
Observations | 442 | 442 | 445 |
R$^2$ | 0.04 | 0.40 | |
Adj. R$^2$ | 0.03 | 0.40 | |
Residual Std. Error | 75.90 | 60 | |
F Statistic | 8.06$^{***}$ | 72.90$^{***}$ |
Now we can remove the rows we don’t need and render it to html.
render_inputs["footer"] = footer.loc[["R$^2$", "Observations"]]
HTML(em.render_html(**render_inputs))
target | |||
---|---|---|---|
(1) | (2) | (3) | |
Intercept | 152.00$^{*** }$ | 152.00$^{*** }$ | 142.00$^{*** }$ |
(3.61) | (2.85) | (3.14) | |
Age | 301.00$^{*** }$ | 37.20$^{ }$ | 51.50$^{*** }$ |
(77.10) | (64.10) | (2.72) | |
Sex | 17.40$^{ }$ | -107.00$^{* }$ | -33.80$^{*** }$ |
(77.10) | (62.10) | (1.62) | |
BMI | 787.00$^{*** }$ | ||
(65.40) | |||
ABP | 417.00$^{*** }$ | ||
(69.50) | |||
R$^2$ | 0.04 | 0.40 | |
Observations | 442 | 442 | 445 |
Note: | ***p<0.01; **p<0.05; *p<0.1 |
Using this 2-step-procedure, we can also easily add additional rows to the footer.
Note that we add the row using .loc[("Statsmodels", )]
since the index of render_inputs["footer"]
is a MultiIndex.
render_inputs["footer"].loc[("Statsmodels",)] = ["Yes"] * 2 + ["No"]
HTML(em.render_html(**render_inputs))
target | |||
---|---|---|---|
(1) | (2) | (3) | |
Intercept | 152.00$^{*** }$ | 152.00$^{*** }$ | 142.00$^{*** }$ |
(3.61) | (2.85) | (3.14) | |
Age | 301.00$^{*** }$ | 37.20$^{ }$ | 51.50$^{*** }$ |
(77.10) | (64.10) | (2.72) | |
Sex | 17.40$^{ }$ | -107.00$^{* }$ | -33.80$^{*** }$ |
(77.10) | (62.10) | (1.62) | |
BMI | 787.00$^{*** }$ | ||
(65.40) | |||
ABP | 417.00$^{*** }$ | ||
(69.50) | |||
R$^2$ | 0.04 | 0.40 | |
Observations | 442 | 442 | 445 |
Statsmodels | Yes | Yes | No |
Note: | ***p<0.01; **p<0.05; *p<0.1 |
Advanced options#
Below is an exmample that demonstrates how to use advanced options to customize your table.
stats_dict = {
"n_obs": "Observations",
"rsquared": "R$^2$",
"rsquared_adj": "Adj. R$^2$",
"resid_std_err": "Residual Std. Error",
"fvalue": "F Statistic",
"show_dof": True,
}
HTML(
em.estimation_table(
models=models,
return_type="html",
custom_param_names={"Intercept": "Constant", "Sex": "Gender"},
custom_col_names=["Model 1", "Model 2", "Model 3"],
custom_col_groups={"target": "Dependent variable: target"},
render_options={"caption": "Table Latex(render_latex(**render_inputs))Title"},
stats_options=stats_dict,
number_format="{0:.3f}",
)
)
Dependent variable: target | |||
---|---|---|---|
Model 1 | Model 2 | Model 3 | |
Constant | 152.133$^{*** }$ | 152.133$^{*** }$ | 142.123$^{*** }$ |
(3.610) | (2.853) | (3.142) | |
Age | 301.161$^{*** }$ | 37.241$^{ }$ | 51.456$^{*** }$ |
(77.060) | (64.117) | (2.718) | |
Gender | 17.392$^{ }$ | -106.578$^{* }$ | -33.789$^{*** }$ |
(77.060) | (62.125) | (1.618) | |
BMI | 787.179$^{*** }$ | ||
(65.424) | |||
ABP | 416.674$^{*** }$ | ||
(69.495) | |||
Observations | 442 | 442 | 445 |
R$^2$ | 0.035 | 0.400 | |
Adj. R$^2$ | 0.031 | 0.395 | |
Residual Std. Error | 75.888(df=439) | 59.976(df=437) | |
F Statistic | 8.059$^{***}$(df=2;439) | 72.913$^{***}$(df=4;437) | |
Note: | ***p<0.01; **p<0.05; *p<0.1 |
Note 1: You can pass a dictionary for custom_col_names
to rename specific columns, e.g. custom_col_names={"(1)": "Model 1"}
, leaving names of the other columns at default values.
Note 2: In addition to renaming the default column groups by passing a dictionary for custom_col_groups
, you can also pass a list to create custom column groups, e.g. custom_col_groups=["target", "target", "not target"]
will group the first two columns under the name "target"
, and the last column under the name "not target"
.
LaTeX peculiarities#
By default, tables in render_latex
are structured in compliance with siunitx
package. This is done by setting column formats to S
in the default rendering options defined internally.
To get nicely formatted tables, you need to add the following to your LaTeX preamble:
\usepackage{siunitx}
\sisetup{
input-symbols = (),
table-align-text-post = false
group-digits = false,
}
The first line in \sisetup
is necessary if you have parentheses in your table cells (e.g. when displaying standard errors or confidence intervals), otherwise LaTex will raise an error.
The second argument is necessary so that there is no spacing between the significance stars and the numerical values.
The third line prevents digits in numbers being grouped into groups of threes, which is the default behaviour. This line is optional, but recommended.
By default, whenever calling render_latex
, a warning will be raised about this. To silence the warning, set siunitx_warning=False
in the relvant function calls (when calling estimation_table
with return_type=tex
or when calling render_latex
)
If you don’t want to generate siunitx
style tables, you can pass render_options={"column_format":<desired formats>}
to your function calls.
You can influence the format of the output table with keyword arguments passed via render_options
. For the list of supported keyword arguments see the documentation of pandas.io.formats.style.Styler.to_latex
By default, siunitx
will center table columns around the decimal point. This means, that if there is a number in a column that has many comparatively larger number of symbols after the decimal point (e.g. when there is a number with scientific notation), there will be extra spacing between that column and the preceeding one, since there is as much space reserved for the column before the decimal point, as there is after it.
You can adjust the spacing between columns, by using the format S[table-format =x.y]
for the numeric columns, where x
and y
control the space pre and post the decimal point, respecitvely. We further show a case with the described problem and the solution to that problem. For number with scientific notations, use S[table-format=x.yez]
, where y
reserves the space for the exponential, and z
reserves the space for the column after the decimal point.
Compiling the following LaTex table will result in extra spacing between columns (2)
and (3)
:
\begin{tabular}{lSSS}
\toprule
& \multicolumn{3}{c}{target} \\
\cmidrule(lr){2-4}
& (1) & (2) & (3) \\
\midrule
Intercept & 152.00$^{*** }$ & 152.00$^{*** }$ & 1.43e08$^{*** }$ \\
& (3.61) & (2.85) & (3.14) \\
Age & 301.00$^{*** }$ & 37.20$^{ }$ & 51.50$^{*** }$ \\
& (77.10) & (64.10) & (2.72) \\
Sex & 17.40$^{ }$ & -107.00$^{* }$ & -33.80$^{*** }$ \\
& (77.10) & (62.10) & (1.62) \\
BMI & & 787.00$^{*** }$ & \\
& & (65.40) & \\
ABP & & 417.00$^{*** }$ & \\
& & (69.50) & \\
\midrule
R$^2$ & 0.04 & 0.40 & \\
Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
\midrule
\textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
\bottomrule
\end{tabular}
We can get a nicer output by setting the format of the last column to, for example, S[table-format=3.2e4]
, via passing render_options={'column_format':'lSSS[table-format = 3.2e4]'}
. The resulting table of render_latex
will look like the following:
\begin{tabular}{lSSS[table-format = 3.2e4]}
\toprule
& \multicolumn{3}{c}{target} \\
\cmidrule(lr){2-4}
& (1) & (2) & (3) \\
\midrule
Intercept & 152.00$^{*** }$ & 152.00$^{*** }$ & 1.43e08$^{*** }$ \\
& (3.61) & (2.85) & (3.14) \\
Age & 301.00$^{*** }$ & 37.20$^{ }$ & 51.50$^{*** }$ \\
& (77.10) & (64.10) & (2.72) \\
Sex & 17.40$^{ }$ & -107.00$^{* }$ & -33.80$^{*** }$ \\
& (77.10) & (62.10) & (1.62) \\
BMI & & 787.00$^{*** }$ & \\
& & (65.40) & \\
ABP & & 417.00$^{*** }$ & \\
& & (69.50) & \\
\midrule
R$^2$ & 0.04 & 0.40 & \\
Observations & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{442} & \multicolumn{1}{c}{445} \\
\midrule
\textit{Note:} & \multicolumn{3}{r}{$^{***}$p$<$0.01;$^{**}$p$<$0.05;$^{*}$p$<$0.1} \\
\bottomrule
\end{tabular}