estimagic API#
Estimation#
estimate_ml
- estimagic.estimate_ml(loglike, params, optimize_options, *, bounds=None, constraints=None, logging=None, loglike_kwargs=None, jacobian=None, jacobian_kwargs=None, jacobian_numdiff_options=None, hessian=None, hessian_kwargs=None, hessian_numdiff_options=None, design_info=None, log_options=None, lower_bounds=None, upper_bounds=None, numdiff_options=None)[source]#
Do a maximum likelihood (ml) estimation.
This is a high level interface of our lower level functions for maximization, numerical differentiation and inference. It does the full workflow for maximum likelihood estimation with just one function call.
While we have good defaults, you can still configure each aspect of each step via the optional arguments of this function. If you find it easier to do the maximization separately, you can do so and just provide the optimal parameters as
params
and setoptimize_options=False
- Parameters
loglike (callable) – Likelihood function that takes a params (and potentially other keyword arguments) a pytree containing the likelihood contributions for each observation or a FunctionValue object.
params (pytree) – A pytree containing the estimated or start parameters of the likelihood model. If the supplied parameters are estimated parameters, set optimize_options to False. Pytrees can be a numpy array, a pandas Series, a DataFrame with “value” column, a float and any kind of (nested) dictionary or list containing these elements. See How to specify params for examples.
optimize_options (dict, Algorithm, str or False) – Keyword arguments that govern the numerical optimization. Valid entries are all arguments of
minimize()
except for those that are passed explicilty toestimate_ml
. If you pass False as optimize_options you signal thatparams
are already the optimal parameters and no numerical optimization is needed. If you pass a str as optimize_options it is used as thealgorithm
option.bounds – Lower and upper bounds on the parameters. The most general and preferred way to specify bounds is an optimagic.Bounds object that collects lower, upper, soft_lower and soft_upper bounds. The soft bounds are used for sampling based optimizers but are not enforced during optimization. Each bound type mirrors the structure of params. Check our how-to guide on bounds for examples. If params is a flat numpy array, you can also provide bounds via any format that is supported by scipy.optimize.minimize.
constraints (list, dict) – List with constraint dictionaries or single dict. See How to specify constraints.
logging (pathlib.Path, str or False) – Path to sqlite3 file (which typically has the file extension
.db
. If the file does not exist, it will be created.log_options (dict) – Additional keyword arguments to configure the logging. - “fast_logging”: A boolean that determines if “unsafe” settings are used to speed up write processes to the database. This should only be used for very short running criterion functions where the main purpose of the log is monitoring and it would not be catastrophic to get a corrupted database in case of a sudden system shutdown. If one evaluation of the criterion function (and gradient if applicable) takes more than 100 ms, the logging overhead is negligible. - “if_table_exists”: (str) One of “extend”, “replace”, “raise”. What to do if the tables we want to write to already exist. Default “extend”. - “if_database_exists”: (str): One of “extend”, “replace”, “raise”. What to do if the database we want to write to already exists. Default “extend”.
loglike_kwargs (dict) – Additional keyword arguments for loglike.
jacobian (callable or None) – A function that takes
params
and potentially other keyword arguments and returns the jacobian of loglike[“contributions”] with respect to the params. Note that you only need to pass a Jacobian function if you have a closed form Jacobian. If you pass None, a numerical Jacobian will be calculated.jacobian_kwargs (dict) – Additional keyword arguments for the Jacobian function.
jacobian_numdiff_options (dict) – Keyword arguments for the calculation of numerical derivatives for the calculation of standard errors. See Derivatives for details.
hessian (callable or None or False) – A function that takes
params
and potentially other keyword arguments and returns the Hessian of loglike[“value”] with respect to the params. If you pass None, a numerical Hessian will be calculated. If you passFalse
, you signal that no Hessian should be calculated. Thus, no result that requires the Hessian will be calculated.hessian_kwargs (dict) – Additional keyword arguments for the Hessian function.
hessian_numdiff_options (dict) – Keyword arguments for the calculation of numerical derivatives for the calculation of standard errors.
design_info (pandas.DataFrame) – DataFrame with one row per observation that contains some or all of the variables “psu” (primary sampling unit), “strata” and “fpc” (finite population corrector). See Robust Likelihood inference for details.
- Returns
A LikelihoodResult object.
- Return type
estimate_msm
- estimagic.estimate_msm(simulate_moments, empirical_moments, moments_cov, params, optimize_options, *, bounds=None, constraints=None, logging=None, simulate_moments_kwargs=None, weights='diagonal', jacobian=None, jacobian_kwargs=None, jacobian_numdiff_options=None, log_options=None, lower_bounds=None, upper_bounds=None, numdiff_options=None)[source]#
Do a method of simulated moments or indirect inference estimation.
This is a high level interface for our lower level functions for minimization, numerical differentiation, inference and sensitivity analysis. It does the full workflow for MSM or indirect inference estimation with just one function call.
While we have good defaults, you can still configure each aspect of each steps vial the optional arguments of this functions. If you find it easier to do the minimization separately, you can do so and just provide the optimal parameters as
params
and setoptimize_options=False
.- Parameters
simulate_moments (callable) – Function that takes params and potentially other keyword arguments and returns a pytree with simulated moments. If the function returns a dict containing the key
"simulated_moments"
we only use the value corresponding to that key. Other entries are stored in the log database if you use logging.empirical_moments (pandas.Series) – A pytree with the same structure as the result of
simulate_moments
.moments_cov (pandas.DataFrame) – A block-pytree containing the covariance matrix of the empirical moments. This is typically calculated with our
get_moments_cov
function.params (pytree) – A pytree containing the estimated or start parameters of the model. If the supplied parameters are estimated parameters, set optimize_options to False. Pytrees can be a numpy array, a pandas Series, a DataFrame with “value” column, a float and any kind of (nested) dictionary or list containing these elements. See How to specify params for examples.
optimize_options (dict, Algorithm, str or False) – Keyword arguments that govern the numerical optimization. Valid entries are all arguments of
minimize()
except for those that can be passed explicitly toestimate_msm
. If you pass False asoptimize_options
you signal thatparams
are already the optimal parameters and no numerical optimization is needed. If you pass a str as optimize_options it is used as thealgorithm
option.bounds – Lower and upper bounds on the parameters. The most general and preferred way to specify bounds is an optimagic.Bounds object that collects lower, upper, soft_lower and soft_upper bounds. The soft bounds are used for sampling based optimizers but are not enforced during optimization. Each bound type mirrors the structure of params. Check our how-to guide on bounds for examples. If params is a flat numpy array, you can also provide bounds via any format that is supported by scipy.optimize.minimize.
simulate_moments_kwargs (dict) – Additional keyword arguments for
simulate_moments
.weights (str) – One of “diagonal” (default), “identity” or “optimal”. Note that “optimal” refers to the asymptotically optimal weighting matrix and is often not a good choice due to large finite sample bias.
constraints (list, dict) – List with constraint dictionaries or single dict. See How to specify constraints.
logging (pathlib.Path, str or False) – Path to sqlite3 file (which typically has the file extension
.db
. If the file does not exist, it will be created.log_options (dict) –
Additional keyword arguments to configure the logging.
- ”fast_logging” (bool):
A boolean that determines if “unsafe” settings are used to speed up write processes to the database. This should only be used for very short running criterion functions where the main purpose of the log is a monitoring and it would not be catastrophic to get a corrupted database in case of a sudden system shutdown. If one evaluation of the criterion function (and gradient if applicable) takes more than 100 ms, the logging overhead is negligible.
- ”if_table_exists” (str):
One of “extend”, “replace”, “raise”. What to do if the tables we want to write to already exist. Default “extend”.
- ”if_database_exists” (str):
One of “extend”, “replace”, “raise”. What to do if the database we want to write to already exists. Default “extend”.
jacobian (callable) – A function that take
params
and potentially other keyword arguments and returns the jacobian of simulate_moments with respect to the params.jacobian_kwargs (dict) – Additional keyword arguments for the jacobian function.
jacobian_numdiff_options (dict) – Keyword arguments for the calculation of numerical derivatives for the calculation of standard errors. See Derivatives for details. Note that by default we increase the step_size by a factor of 2 compared to the rule of thumb for optimal step sizes. This is because many msm criterion functions are slightly noisy.
Returns –
- dict: The estimated parameters, standard errors and sensitivity measures
and covariance matrix of the parameters.
get_moments_cov
- estimagic.get_moments_cov(data, calculate_moments, *, moment_kwargs=None, bootstrap_kwargs=None)[source]#
Bootstrap the covariance matrix of the moment conditions.
- Parameters
data (pandas.DataFrame) – DataFrame with empirical data.
calculate_moments (callable) – Function that calculates that takes data and moment_kwargs as arguments and returns a 1d numpy array or pandas Series with moment conditions.
moment_kwargs (dict) – Additional keyword arguments for calculate_moments.
bootstrap_kwargs (dict) – Additional keyword arguments that govern the bootstrapping. Allowed arguments are “n_draws”, “seed”, “n_cores”, “batch_evaluator”, “cluster_by” and “error_handling”. For details see the bootstrap function.
- Returns
- The covariance matrix of the moment
conditions for msm estimation.
- Return type
lollipop_plot
- estimagic.lollipop_plot(data, *, sharex=True, plot_bar=True, n_rows=1, scatterplot_kws=None, barplot_kws=None, combine_plots_in_grid=True, template='simple_white', palette=['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)', 'rgb(231,138,195)', 'rgb(166,216,84)', 'rgb(255,217,47)', 'rgb(229,196,148)', 'rgb(179,179,179)'])[source]#
Make a lollipop plot.
- Parameters
data (pandas.DataFrame) – The datapoints to be plotted. The whole data will be
need (plotted. Thus if you want to plot just some variables or rows you) –
it. (to restrict the dataset before passing) –
sharex (bool) – Whether the x-axis is shared across variables, default True.
plot_bar (bool) – Whether thin bars are plotted, default True.
n_rows (int) – Number of rows for a grid if plots are combined in a grid, default 1. The number of columns is determined automatically.
scatterplot_kws (dict) – Keyword arguments to plot the dots of the lollipop plot via the scatter function.
barplot_kws (dict) – Keyword arguments to plot the lines of the lollipop plot via the barplot function.
combine_plots_in_grid (bool) – decide whether to return a one
dictionary (figure containing subplots for each factor pair or a) –
True. (of individual plots. Default) –
template (str) – The template for the figure. Default is “plotly_white”.
palette – The coloring palette for traces. Default is “qualitative.Plotly”.
- Returns
The grid plot or dict of individual plots
- Return type
plotly.Figure
estimation_table
- estimagic.estimation_table(models, *, return_type='dataframe', render_options=None, show_col_names=True, show_col_groups=None, show_index_names=False, show_inference=True, show_stars=True, show_footer=True, custom_param_names=None, custom_col_names=None, custom_col_groups=None, custom_index_names=None, custom_notes=None, confidence_intervals=False, significance_levels=(0.1, 0.05, 0.01), append_notes=True, notes_label='Note:', stats_options=None, number_format=('{0:.3g}', '{0:.5f}', '{0:.4g}'), add_trailing_zeros=True, escape_special_characters=True, siunitx_warning=True)[source]#
Generate html or LaTex tables provided (lists of) of models.
The function can create publication quality tables in various formats from statsmodels or estimagic results.
It allows for extensive customization via optional arguments and almost limitless flexibility when using a two-stage approach where the
return_type
is set to"render_inputs"
, the resulting dictionary representation of the table is modified and that modified version is then passed torender_latex
orrender_html
.The formatting of the numbers in the table is completely configurable via the
number_format
argument. By default we round to three significant digits (i.e. the three leftmost non-zero digits are displayed). This is very different from other table packages and motivated by the fact that most estimation tables give a wrong feeling of precision by showing too many decimal points.- Parameters
models (list) – list of estimation results. The models can come from statmodels or be constructed from the outputs of estimagic.estimate_ml or estimagic.estimate_msm. With a little bit of work it is also possible to construct them out of R or other results. If a model is not a statsmodels results they must be dictionaries with the following entries: “params” (a DataFrame with value column), “info” (a dictionary with summary statistics such as “n_obs”, “rsquared”, …) and “name” (a string), or a DataFrame with value column. If a models is a statsmodels result, model.endog_names is used as name and the rest is extracted from corresponding statsmodels attributes. The model names do not have to be unique but if they are not, models with the same name need to be grouped together.
return_type (str) – Can be “dataframe”, “latex”, “html”, “render_inputs” or a file path with the extension .tex or .html. If “render_inputs” is passed, a dictionary with the entries “body”, “footer” and other information is returned. The entries can be modified by the user ( e.g. change formatting, renameof columns or index, …) and then passed to
render_latex
orrender_html
. Default “dataframe”.render_options (dict) – a dictionary with keyword arguments that are passed to df.style.to_latex or df.style.to_html, depending on the return_type. The default is None.
show_col_names (bool) – If True, the column names are displayed. The default column names are the model names if the model names are unique, otherwise (1), (2), etc.. Default True.
show_col_groups (bool) – If True, the column groups are displayed. The default column groups are the model names if the model names are not unique and undefined otherwise. Default None. None means that the column groups are displayed if they are defined.
show_index_names (bool) – If True, the index names are displayed. Default False. This is mostly relevant when working with estimagic style params DataFrames with a MultiIndex.
show_inference (bool) – If True, inference (standard errors or confidence intervals) are displayed below parameter values. Default True.
show_stars (bool) – a boolean variable for displaying significance stars. Default is True.
show_footer (bool) – a boolean variable for displaying statistics, e.g. R2, Obs numbers. Default is True. Which statistics are displayed and how they are labeled can be determined via
stats_options
.custom_param_names (dict) – Dictionary that is used to rename parameters. The keys are the old parameter names or index entries. The values are the new names. Default None.
custom_col_names (dict or list) – A list of column names or dict to rename the default column names. The default column names are the model names if the model names are unique, otherwise (1), (2), etc..
custom_col_groups (dict or list) – A list of column group or dict to rename the default column groups. The default column groups are the model names if the model names are not unique and undefined otherwise.
custom_index_names (dict or list) – Dictionary or list to set the names of the index levels of the parameters. This is mostly relevant when working with estimagic style params DataFrames with a MultiIndex and only used if “index_names” is set to True in the render_options. Default None.
custom_notes (list) – A list of strings for additional notes. Default is None.
confidence_intervals (bool) – If True, display confidence intervals as inference values. Display standard errors otherwise. Default False.
significance_levels (list) – a list of floats for p value’s significance cut-off values. This is used to generate the significance stars. Default is [0.1,0.05,0.01].
append_notes (bool) – A boolean variable for printing p value cutoff explanation and additional notes, if applicable. Default is True.
notes_label (str) – A sting to print as the title of the notes section, if applicable. Default is ‘Notes’
stats_options (dict) – A dictionary that determines which statistics (e.g. R-Squared, No. of Observations) are displayed and how they are labeled. The keys are the names of the statistics inside the model[‘info’] dictionary or attribute names of a statsmodels results object. The values are the new labels to be displayed for those statistics, i.e. the set of the values is used as row names in the table.
number_format (int, str, iterable or callable) – A callable, iterable, integer or string that is used to apply string formatter(s) to floats in the table. Default (“{0:.3g}”, “{0:.5f}”, “{0:.4g}”).
add_trailing_zeros (bool) – If True, format floats such that they have same number of digits after the decimal point. Default True.
siunitx_warning (bool) – If True, print warning about LaTex preamble to add for proper compilation of when working with siunitx package. Default True.
escape_special_characters (bool) – If True, replaces special characters in parameter and model names with LaTeX or HTML safe sequences.
- Returns
- depending on the rerturn type,
data frame with formatted strings, a string for html or latex tables, or a dictionary with statistics and parameters dataframes, and strings for footers is returned. If the return type is a path, the function saves the resulting table at the given path.
- Return type
res_table (data frame, str or dictionary)
render_html
- estimagic.render_html(body, footer, render_options=None, show_footer=True, append_notes=True, notes_label='Note:', custom_notes=None, significance_levels=(0.1, 0.05, 0.01), show_index_names=False, show_col_names=True, show_col_groups=True, escape_special_characters=True, **kwargs)[source]#
Return estimation table in html format as string.
- Parameters
body (pandas.DataFrame) – DataFrame with formatted strings of parameter values, inferences (standard errors or confidence intervals, if applicable) and significance stars (if applicable).
footer (pandas.DataFrame) – DataFrame with formatted strings of summary statistics (such as number of observations, r-squared, etc.)
notes (str) – The html string with notes with additional information (e.g. mapping from pvalues to significance stars) to append to the footer of the estimation table string with LaTex code for the notes section.
render_options (dict) – A dictionary with custom kwargs to pass to pd.to_latex(), to update the default options. An example is {header: False} that disables displaying column names.
show_footer (bool) – a boolean variable for displaying footer_df. Default True.
append_notes (bool) – A boolean variable for printing p value cutoff explanation and additional notes, if applicable. Default is True.
notes_label (str) – A sting to print as the title of the notes section, if applicable. Default is ‘Notes’
significance_levels (list or tuple) – a list of floats for p value’s significance cutt-off values. Default is [0.1,0.05,0.01].
show_index_names (bool) – If True, display index names in the table.
show_col_names (bool) – If True, the column names are displayed.
show_col_groups (bool) – If True, the column groups are displayed.
escape_special_characters (bool) – If True, replace the characters &, <, >, ‘, and ” in parameter and model names with HTML-safe sequences.
- Returns
The resulting string with html tabular code.
- Return type
html_str (str)
render_latex
- estimagic.render_latex(body, footer, render_options=None, show_footer=True, append_notes=True, notes_label='Note:', significance_levels=(0.1, 0.05, 0.01), custom_notes=None, siunitx_warning=True, show_index_names=False, show_col_names=True, show_col_groups=True, escape_special_characters=True)[source]#
Return estimation table in LaTeX format as string.
- Parameters
body (pandas.DataFrame) – DataFrame with formatted strings of parameter values, inferences (standard errors or confidence intervals, if applicable) and significance stars (if applicable).
footer (pandas.DataFrame) – DataFrame with formatted strings of summary statistics (such as number of observations, r-squared, etc.)
render_options (dict) – A dictionary with custom kwargs to pass to pd.Styler.to_latex(), to update the default options. An example keyword argument is: - siunitx (bool): If True, the table is structured to be compatible with siunitx package. Default is set to True internally. For the list of all possible arguments, see documentation of pandas.io.formats.style.Styler.to_latex.
show_footer (bool) – a boolean variable for displaying footer_df. Default True.
append_notes (bool) – A boolean variable for printing p value cutoff explanation and additional notes, if applicable. Default is True.
notes_label (str) – A sting to print as the title of the notes section, if applicable. Default is ‘Notes’
significance_levels (list or tuple) – a list of floats for p value’s significance cutt-off values. Default is [0.1,0.05,0.01].
custom_notes (list) – A list of strings for additional notes. Default is None.
siunitx_warning (bool) – If True, print warning about LaTex preamble to add for proper compilation of when working with siunitx package. Default True.
show_index_names (bool) – If True, display index names in the table.
show_col_names (bool) – If True, the column names are displayed.
show_col_groups (bool) – If True, the column groups are displayed.
escape_special_characters (bool) – If True, replaces the characters &, %, $, #, _, {, }, ~, ^, and in parameter and model names with LaTeX-safe sequences.
- Returns
The resulting string with Latex tabular code.
- Return type
latex_str (str)
LikelihoodResult
- class estimagic.LikelihoodResult(_params: typing.Any, _internal_estimates: optimagic.parameters.space_conversion.InternalParams, _free_estimates: estimagic.shared_covs.FreeParams, _converter: optimagic.parameters.conversion.Converter, _has_constraints: bool, _optimize_result: typing.Optional[optimagic.optimization.optimize_result.OptimizeResult] = None, _jacobian: typing.Optional[typing.Any] = None, _no_jacobian_reason: typing.Optional[str] = None, _hessian: typing.Optional[typing.Any] = None, _no_hessian_reason: typing.Optional[str] = None, _internal_jacobian: typing.Optional[numpy.ndarray] = None, _internal_hessian: typing.Optional[numpy.ndarray] = None, _design_info: typing.Optional[pandas.core.frame.DataFrame] = None, _cache: typing.Dict = <factory>)[source]#
Likelihood estimation results object.
- se(method='jacobian', n_samples=10000, bounds_handling='clip', seed=None)[source]#
Calculate standard errors.
- Parameters
method (str) – One of “jacobian”, “hessian”, “robust”, “cluster_robust”, “strata_robust”. Default “jacobian”. “cluster_robust” is only available if design_info containts a columns called “psu” that identifies the primary sampling unit. “strata_robust” is only available if the columns “strata”, “fpc” and “psu” are in design_info.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- A pytree with the same structure as params containing standard errors
for the parameter estimates.
- Return type
Any
- cov(method='jacobian', n_samples=10000, bounds_handling='clip', return_type='pytree', seed=None)[source]#
Calculate the variance-covariance (matrix) of the estimated parameters.
- Parameters
method (str) – One of “jacobian”, “hessian”, “robust”, “cluster_robust”, “strata_robust”. Default “jacobian”. “cluster_robust” is only available if design_info containts a columns called “psu” that identifies the primary sampling unit. “strata_robust” is only available if the columns “strata”, “fpc” and “psu” are in design_info.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
return_type (str) – One of “pytree”, “array” or “dataframe”. Default pytree. If “array”, a 2d numpy array with the covariance is returned. If “dataframe”, a pandas DataFrame with parameter names in the index and columns are returned.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- The covariance matrix of the estimated parameters as block-pytree,
numpy.ndarray or pandas.DataFrame.
- Return type
Any
- summary(method='jacobian', n_samples=10000, ci_level=0.95, bounds_handling='clip', seed=None)[source]#
Create a summary of estimation results.
- Parameters
method (str) – One of “jacobian”, “hessian”, “robust”, “cluster_robust”, “strata_robust”. Default “jacobian”. “cluster_robust” is only available if design_info containts a columns called “psu” that identifies the primary sampling unit. “strata_robust” is only available if the columns “strata”, “fpc” and “psu” are in design_info.
ci_level (float) – Confidence level for the calculation of confidence intervals. The default is 0.95.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
The estimation summary as pytree of DataFrames.
- Return type
Any
- ci(method='jacobian', n_samples=10000, ci_level=0.95, bounds_handling='clip', seed=None)[source]#
Calculate confidence intervals.
- Parameters
method (str) – One of “jacobian”, “hessian”, “robust”, “cluster_robust”, “strata_robust”. Default “jacobian”. “cluster_robust” is only available if design_info containts a columns called “psu” that identifies the primary sampling unit. “strata_robust” is only available if the columns “strata”, “fpc” and “psu” are in design_info.
ci_level (float) – Confidence level for the calculation of confidence intervals. The default is 0.95.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- Pytree with the same structure as params containing lower bounds of
confidence intervals.
- Any: Pytree with the same structure as params containing upper bounds of
confidence intervals.
- Return type
Any
- p_values(method='jacobian', n_samples=10000, bounds_handling='clip', seed=None)[source]#
Calculate p-values.
- Parameters
method (str) – One of “jacobian”, “hessian”, “robust”, “cluster_robust”, “strata_robust”. Default “jacobian”. “cluster_robust” is only available if design_info containts a columns called “psu” that identifies the primary sampling unit. “strata_robust” is only available if the columns “strata”, “fpc” and “psu” are in design_info.
ci_level (float) – Confidence level for the calculation of confidence intervals. The default is 0.95.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
Pytree with the same structure as params containing p-values. Any: Pytree with the same structure as params containing p-values.
- Return type
Any
- to_pickle(path)[source]#
Save the LikelihoodResult object to pickle.
- Parameters
path (str, pathlib.Path) – A str or pathlib.path ending in .pkl or .pickle.
MomentsResult
- class estimagic.MomentsResult(_params: typing.Any, _internal_estimates: optimagic.parameters.space_conversion.InternalParams, _free_estimates: estimagic.shared_covs.FreeParams, _weights: typing.Any, _converter: optimagic.parameters.conversion.Converter, _internal_moments_cov: numpy.ndarray, _internal_weights: numpy.ndarray, _internal_jacobian: numpy.ndarray, _empirical_moments: typing.Any, _has_constraints: bool, _optimize_result: typing.Optional[optimagic.optimization.optimize_result.OptimizeResult] = None, _jacobian: typing.Optional[typing.Any] = None, _no_jacobian_reason: typing.Optional[str] = None, _cache: typing.Dict = <factory>)[source]#
Method of moments estimation results object.
- se(method='robust', n_samples=10000, bounds_handling='clip', seed=None)[source]#
Calculate standard errors.
- Parameters
method (str) – One of “robust”, “optimal”. Despite the name, “optimal” is not recommended in finite samples and “optimal” standard errors are only valid if the asymptotically optimal weighting matrix has been used. It is only supported because it is needed to calculate sensitivity measures.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- A pytree with the same structure as params containing standard errors
for the parameter estimates.
- Return type
Any
- cov(method='robust', n_samples=10000, bounds_handling='clip', return_type='pytree', seed=None)[source]#
Calculate the variance-covariance matrix of the estimated parameters.
- Parameters
method (str) – One of “robust”, “optimal”. Despite the name, “optimal” is not recommended in finite samples and “optimal” standard errors are only valid if the asymptotically optimal weighting matrix has been used. It is only supported because it is needed to calculate sensitivity measures.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
return_type (str) – One of “pytree”, “array” or “dataframe”. Default pytree. If “array”, a 2d numpy array with the covariance is returned. If “dataframe”, a pandas DataFrame with parameter names in the index and columns are returned.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- The covariance matrix of the estimated parameters as block-pytree or
numpy array.
- Return type
Any
- summary(method='robust', n_samples=10000, ci_level=0.95, bounds_handling='clip', seed=None)[source]#
Create a summary of estimation results.
- Parameters
method (str) – One of “robust”, “optimal”. Despite the name, “optimal” is not recommended in finite samples and “optimal” standard errors are only valid if the asymptotically optimal weighting matrix has been used. It is only supported because it is needed to calculate sensitivity measures.
ci_level (float) – Confidence level for the calculation of confidence intervals. The default is 0.95.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
The estimation summary as pytree of DataFrames.
- Return type
Any
- ci(method='robust', n_samples=10000, ci_level=0.95, bounds_handling='clip', seed=None)[source]#
Calculate confidence intervals.
- Parameters
method (str) – One of “robust”, “optimal”. Despite the name, “optimal” is not recommended in finite samples and “optimal” standard errors are only valid if the asymptotically optimal weighting matrix has been used. It is only supported because it is needed to calculate sensitivity measures.
ci_level (float) – Confidence level for the calculation of confidence intervals. The default is 0.95.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
- Pytree with the same structure as params containing lower bounds of
confidence intervals.
- Any: Pytree with the same structure as params containing upper bounds of
confidence intervals.
- Return type
Any
- p_values(method='robust', n_samples=10000, bounds_handling='clip', seed=None)[source]#
Calculate p-values.
- Parameters
method (str) – One of “robust”, “optimal”. Despite the name, “optimal” is not recommended in finite samples and “optimal” standard errors are only valid if the asymptotically optimal weighting matrix has been used. It is only supported because it is needed to calculate sensitivity measures.
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
- Returns
Pytree with the same structure as params containing p-values. Any: Pytree with the same structure as params containing p-values.
- Return type
Any
- sensitivity(kind='bias', n_samples=10000, bounds_handling='clip', seed=None, return_type='pytree')[source]#
Calculate sensitivity measures for moments estimates.
The sensitivity measures are based on the following papers:
Andrews, Gentzkow & Shapiro (2017, Quarterly Journal of Economics)
Honore, Jorgensen & de Paula (https://onlinelibrary.wiley.com/doi/full/10.1002/jae.2779)
In the papers the different kinds of sensitivity measures are just called m1, e2, e3, e4, e5 and e6. We try to give them more informative names, but list the original names for references.
- Parameters
kind (str) –
The following kinds are supported:
- ”bias”:
Origally m1. How strongly would the parameter estimates be biased if the kth moment was misspecified, i.e not zero in expectation?
- ”noise_fundamental”:
Originally e2. How much precision would be lost if the kth moment was subject to a little additional noise if the optimal weighting matrix was used?
- ”noise”:
Originally e3. How much precision would be lost if the kth moment was subjet to a little additional noise?
- ”removal”:
Originally e4. How much precision would be lost if the kth moment was excluded from the estimation?
- ”removal_fundamental”:
Originally e5. How much precision would be lost if the kth moment was excluded from the estimation if the asymptotically optimal weighting matrix was used.
- ”weighting”:
Originally e6. How would the precision change if the weight of the kth moment is increased a little?
n_samples (int) – Number of samples used to transform the covariance matrix of the internal parameter vector into the covariance matrix of the external parameters. For background information about internal and external params see How constraints are implemented. This is only used if you are using constraints.
bounds_handling (str) – One of “clip”, “raise”, “ignore”. Determines how bounds are handled. If “clip”, confidence intervals are clipped at the bounds. Standard errors are only adjusted if a sampling step is necessary due to additional constraints. If “raise” and any lower or upper bound is binding, we raise an Error. If “ignore”, boundary problems are simply ignored.
seed (int) – Seed for the random number generator. Only used if there are transforming constraints.
return_type (str) – One of “array”, “dataframe” or “pytree”. Default pytree. If your params or moments have a very nested format, return_type “dataframe” might be the better choice.
- Returns
- The sensitivity measure as a pytree, numpy array or DataFrame.
In 2d formats, the sensitivity measures have one row per estimated parameter and one column per moment.
- Return type
Any
- to_pickle(path)[source]#
Save the MomentsResult object to pickle.
- Parameters
path (str, pathlib.Path) – A str or pathlib.path ending in .pkl or .pickle.
Bootstrap#
bootstrap
- estimagic.bootstrap(outcome, data, *, existing_result=None, outcome_kwargs=None, n_draws=1000, cluster_by=None, seed=None, n_cores=1, error_handling='continue', batch_evaluator=<function joblib_batch_evaluator>)[source]#
Use the bootstrap to calculate inference quantities.
- Parameters
outcome (callable) – A function that computes the statistic of interest.
data (pd.DataFrame) – Dataset.
existing_result (BootstrapResult) – An existing BootstrapResult object from a previous call of bootstrap(). Default is None.
outcome_kwargs (dict) – Additional keyword arguments for outcome.
n_draws (int) – Number of bootstrap samples to draw. If len(existing_outcomes) >= n_draws, a random subset of existing_outcomes is used.
cluster_by (str) – Column name of variable to cluster by or None.
seed (Union[None, int, numpy.random.Generator]) – If seed is None or int the numpy.random.default_rng is used seeded with seed. If seed is already a Generator instance then that instance is used.
n_cores (int) – number of jobs for parallelization.
error_handling (str) – One of “continue”, “raise”. Default “continue” which means that bootstrap estimates are only calculated for those samples where no errors occur and a warning is produced if any error occurs.
batch_evaluator (str or Callable) – Name of a pre-implemented batch evaluator (currently ‘joblib’ and ‘pathos_mp’) or Callable with the same interface as the estimagic batch_evaluators. See Batch evaluators.
- Returns
- A BootstrapResult object storing information on summary
statistics, the covariance matrix, and estimated boostrap outcomes.
- Return type
BootstrapResult
- class estimagic.BootstrapResult(_base_outcome: Any, _internal_outcomes: numpy.ndarray, _internal_cov: numpy.ndarray)[source]#
- property base_outcome#
Returns the base outcome statistic(s).
- Returns
- Pytree of base outcomes, i.e. the outcome statistic(s) evaluated
on the original data set.
- Return type
pytree
- property outcomes#
Returns the estimated bootstrap outcomes.
- Returns
The boostrap outcomes as a list of pytrees.
- Return type
List[Any]
- se()[source]#
Calculate standard errors.
- Returns
- The standard errors of the estimated parameters as a block-pytree,
numpy.ndarray, or pandas.DataFrame.
- Return type
Any
- cov(return_type='pytree')[source]#
Calculate the variance-covariance matrix of the estimated parameters.
- Parameters
return_type (str) – One of “pytree”, “array” or “dataframe”. Default pytree. If “array”, a 2d numpy array with the covariance is returned. If “dataframe”, a pandas DataFrame with parameter names in the index and columns are returned. The default is “pytree”.
- Returns
- The covariance matrix of the estimated parameters as a block-pytree,
numpy.ndarray, or pandas.DataFrame.
- Return type
Any
- ci(ci_method='percentile', ci_level=0.95)[source]#
Calculate confidence intervals.
- Parameters
- Returns
- Pytree with the same structure as base_outcome containing lower
bounds of confidence intervals.
- Any: Pytree with the same structure as base_outcome containing upper
bounds of confidence intervals.
- Return type
Any
- p_values()[source]#
Calculate p-values.
- Returns
- A pytree with the same structure as base_outcome containing p-values
for the parameter estimates.
- Return type
Any