Numerical differentiation#

Using simple examples, This tutorial shows you how to numerical differentiation with estimagic. More details on the topics covered here can be found in the how to guides.

import estimagic as em
import numpy as np
import pandas as pd

Basic usage of first_derivative#

def sphere(params):
    return params @ params
fd = em.first_derivative(
    func=sphere,
    params=np.arange(5),
)

fd["derivative"]
array([0., 2., 4., 6., 8.])

Basic usage of second_derivative#

sd = em.second_derivative(
    func=sphere,
    params=np.arange(5),
)

sd["derivative"].round(3)
array([[2.001, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

params do not have to be vectors#

In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays and pandas objects.

def dict_sphere(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = em.first_derivative(
    func=dict_sphere,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

fd["derivative"]
{'a': array(0.),
 'b': array(2.),
 'c': 0    4.0
 1    6.0
 2    8.0
 dtype: float64}

Description of the output#

The output of first_derivative when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of first_derivative has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape (len(params),). If, however, the params tree contains non-scalar entries like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a pandas.Series with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]:

fd["derivative"]["c"]
0    4.0
1    6.0
2    8.0
dtype: float64
sd = em.second_derivative(
    func=dict_sphere,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

sd["derivative"]
{'a': {'a': array(2.00072215),
  'b': array(0.),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'b': {'a': array(0.),
  'b': array(1.9999955),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'c': {'a': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'b': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'c':           0         1         2
  0  1.999995  0.000004 -0.000003
  1  0.000004  2.000001  0.000000
  2 -0.000003  0.000000  1.999997}}

Description of the output#

The output of second_derivative when using a general pytrees looks more complex but is easy once we remember that the second derivative is equivalent to applying the first derivative twice. This explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree is a product of the params tree with itself. This is equivalent to the numpy case, where the hessian is a matrix of shape (len(params), len(params)). If, however, the params tree contains non-scalar entries like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a 3-dimensional pandas.Series. Thus, the second derivative output contains the corresponding 3x3-block of the hessian at the position ["c"]["c"]:

sd["derivative"]["c"]["c"].round(3)
0 1 2
0 2.0 0.0 -0.0
1 0.0 2.0 0.0
2 -0.0 0.0 2.0

There are many options#

You can choose which finite difference method to use, whether we should respect parameter bounds, or whether to evaluate the function in parallel. Let’s go through some basic examples.

You can choose the difference method#

fd = em.first_derivative(
    func=sphere, params=np.arange(5), method="backward"  # default: 'central'
)

fd["derivative"]
array([-0.        ,  2.        ,  4.        ,  6.        ,  7.99999994])
sd = em.second_derivative(
    func=sphere, params=np.arange(5), method="forward"  # default: 'central_cross'
)

sd["derivative"].round(3)
array([[2.006, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

You can add bounds#

params = np.arange(5)

fd = em.first_derivative(
    func=sphere,
    params=params,
    lower_bounds=params,  # forces first_derivative to use forward differences
    upper_bounds=params + 1,
)

fd["derivative"]
array([0.        , 2.        , 4.        , 6.        , 8.00000006])
sd = em.second_derivative(
    func=sphere,
    params=params,
    lower_bounds=params,  # forces first_derivative to use forward differences
    upper_bounds=params + 1,
)

sd["derivative"].round(3)
array([[2.006, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

Or use parallelized numerical derivatives#

fd = em.first_derivative(
    func=sphere,
    params=np.arange(5),
    n_cores=4,
)

fd["derivative"]
array([0., 2., 4., 6., 8.])
sd = em.second_derivative(
    func=sphere,
    params=params,
    n_cores=4,
)

sd["derivative"].round(3)
array([[2.001, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])