How to calculate first derivatives#

In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts.

import estimagic as em
import numpy as np
import pandas as pd


As in the getting started section, let’s lookt at the sphere function $\(f(x) = x^\top x.\)$

def sphere_scalar(params):
    return params**2

The derivative of \(f\) is given by \(f'(x) = 2 x\). With numerical derivatives, we have to specify the value of \(x\) at which we want to compute the derivative. Let’s first consider two scalar points \(x = 0\) and \(x=1\). We have \(f'(0) = 0\) and \(f'(1) = 2\).

To compute the derivative using estimagicm we simply pass the function sphere and the params to the function first_derivative:

fd = em.first_derivative(func=sphere_scalar, params=0)
fd = em.first_derivative(func=sphere_scalar, params=1)

Notice that the output of first_derivative is a dictionary containing the derivative under the key “derivative”. We discuss the ouput in more detail below.

Gradient and Jacobian#

The scalar case from above extends directly to the multivariate case. Let’s consider two cases:


\(f_1: \mathbb{R}^N \to \mathbb{R}\)


\(f_2: \mathbb{R}^N \to \mathbb{R}^M\)

The first derivative of \(f_1\) is usually referred to as the gradient, while the first derivative of \(f_2\) is usually called the Jacobian.


Let’s again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,).

def sphere(params):
    return params @ params
fd = em.first_derivative(sphere, params=np.arange(4))
array([0., 2., 4., 6.])


As an example, let’s now use the function $\(f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},\)\( with \)f: \mathbb{R}^N \to \mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension.

def sphere_multivariate(params):
    return (params @ params) * np.arange(3)
fd = em.first_derivative(sphere_multivariate, params=np.arange(4))
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  2.,  4.,  6.],
       [ 0.,  4.,  8., 12.]])

The output of first_derivative#

As we have already seen in the introduction, the output of first_derivative is a dictionary. This dictionary always contains an entry “derivative” which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.


If the argument return_func_value is True, the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.


If the argument return_info is True, the output dictionary will contain one to two additional entries. In this case it will always contain the entry “func_evals”, which is a data frame containing all internally executed function evaluations. And if n_steps is larger than 1, it will also contain “derivative_candidates”, which is a data frame containing derivative estimates used in the Richardson extrapolation.

For an explaination of the argument n_steps and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.

The objects returned when return_info is True are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function derivative_plot as explained below. For better understanding, we print each of these additional objects once:

fd = em.first_derivative(
    sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True
assert fd["func_value"] == sphere_scalar(0)
step eval
sign step_number dim_x dim_f
1 0 0 0 1.490116e-09 2.220446e-18
1 0 0 2.980232e-09 8.881784e-18
-1 0 0 0 1.490116e-09 2.220446e-18
1 0 0 2.980232e-09 8.881784e-18
der err
method num_term dim_x dim_f
forward 1 0 0 4.470348e-09 8.467417e-08
backward 1 0 0 -4.470348e-09 8.467417e-08
central 1 0 0 0.000000e+00 0.000000e+00

The params argument#

Above we used a numpy.ndarray as the params argument. In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let’s look at a few cases.


params = pd.DataFrame(
    [["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
    columns=["category", "name", "value"],
).set_index(["category", "name"])

category name
time_pref delta 0.9
beta 0.6
price price 2.0
def sphere_pandas(params):
    return params["value"] @ params["value"]
fd = em.first_derivative(sphere_pandas, params)
category   name 
time_pref  delta    1.8
           beta     1.2
price      price    4.0
dtype: float64

nested dicts#

params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}

{'a': 0,
 'b': 1,
 'c': 0    2
 1    3
 2    4
 dtype: int64}
def dict_sphere(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = em.first_derivative(

{'a': array(0.),
 'b': array(2.),
 'c': 0    4.0
 1    6.0
 2    8.0
 dtype: float64}

Description of the output#

The output of first_derivative when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of first_derivative has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape (len(params),). If, however, the params tree contains non-scalar entries, like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a pandas.Series with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]:


For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the n_cores argument. For example, if we wish to evaluate the function on 2 cores, we simply write

fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)